joke vest amount (jkndrkn) wrote in ruby_lang,
joke vest amount
jkndrkn
ruby_lang

Ruby Newbie: Code Critique Request

Hello Friends

I have played with Ruby tutorials and ploughed through a great deal of the Pickaxe book, but have not really written much of anything on my own.

Finally, an opportunity came up for using Ruby for a practical application: a quick and dirty text-processing script. The script is designed to insert onclick event handlers into HTML anchor tags that point to PDF and other files. Our customer purchased a web analytics package and wanted to be able to track clicks to non-HTML files on their server.

The script uses regular expressions, string manipulation, and basic File IO.

If you have some time, could you review the script? I'm interested in code that is compact, legible, and efficient. Please point out if my approach takes advantage of the correct Ruby concepts. I've already found a place where I might improve efficieny: write a file only if changes were made to the file.

I very much appreciate your time!


  1 require 'cgi'
  2 
  3 files = ARGV
  4 
  5 # Match url of form href="foo/bar/baz/quux.pdf"
  6 regex = /(href\s*=)\s*\"([^\"]+\/)([^\"]+\.(pdf|mdb|wmv|xls|doc|ppt))\"/i
  7 debug = false
  8 
  9 files.each do |infile|
 10     if debug: 5.times {puts ""} end
 11     if debug: puts ">>>" + infile.to_s end
 12 
 13     output = ""
 14 
 15     File.open(infile) do |file|
 16         while line = file.gets
 17             # If a line contains a url of the correct form but NO onclick handler
 18             if line =~ regex and !line.include? 'onclick'
 19                 if debug: puts "<" + line end
 20 
 21                 # Retrieve URL components
 22                 matches = regex.match(line)
 23                 href, dir, name = matches[1..3]
 24 
 25                 # Rebuild URL
 26 
 27                 # No filtering necessary, as URL is enclosed in double-quotes and is assumed to have been tested in the past
 28                 url = href + '"' + dir + name + '"'
 29 
 30                 # Build onclick handler from URL components
 31                 
 32                 # Improve legibility of name to be displayed in Webtrends interface 
 33                 # by stripping out %20 and other characters 
 34                 name_clean = CGI.unescape(name)
 35                 
 36                 # Encode single quotes as they can break the onclick handler
 37                 [href, dir, name, name_clean].each do |str|
 38                     str.gsub!("'", "%27")
 39                 end 
 40                 
 41                 onclick = "onclick=\"dcsMultiTrack(\'DCS.dcsuri\',\'#{dir}#{name}\',\'WT.ti\',\'#{name_clean}\');\""
 42                 
 43                 # Substitute url link with both the original link and the onclick event handler
 44                 line.gsub!(regex, url + " " + onclick)
 45                 
 46                 if debug: puts ">" + line + "\n" end
 47             end 
 48             output << line
 49         end 
 50     end 
 51     
 52     if debug: puts "<<<" + infile.to_s end
 53     
 54     # Write output file 
 55     leading_slash = (debug) ? '_' : ''
 56     
 57     File.open(File.dirname(infile) + '/' + leading_slash + File.basename(infile), "w") do |outfile|
 58         outfile.write(output)
 59     end 
 60 end 

  • Post a new comment

    Error

    default userpic
  • 23 comments