rr – Regex Replace on a File – SotD

I was frustrated with regular expression find/replace programs that only did line processing. This was because often I had find/replace needs that spanned multiple lines. Programs like grep, ack (which I recently found and is really, really very awesome for searching code), and sed were easy enough to use for basic needs. But again, when it came to multiple line pattern matching both fell short of my needs.

My solution was to write my own script to parse an entire file as a single string and do my find/replace bidding. The cons being liberal use of memory and a few hundredths of a second longer then the usual find/replace algorithms seemed insignificant to the pros of a multi-line capable find/replace using a regular expression with the capability of using back references (like \1) to incorporate captured groups from the regex into the replacement text.

So, without further ado I present rr.

I am hopeful for some public criticism to help me bring rr up from its current version of 0.9 to a landmark 1.0. The ruby script weighs in at 100 lines but really under 50 are code and the rest is comments, whitespace, or the usage string. Speaking of usage, here is what it currently [v0.9.0] looks like:

usage: rr [options] find replace filename
  find     - a regular expression to be run on the entire file as one string
  replace  - replacement text, \1-\9 and \n are allowed
  filename - name of the input file to be parsed

options: --line or -l process line by line instead of all at once (not default) --case or -c makes the regular expression case sensitive (not default) --global or -g process all occurrences by default (this is already default)
negated options are done by adding 'not' or 'n' in switches like so: --notline or -nl
example usage: The following takes a file and doubles the last character on each line and turns the newlines into two newlines. rr "(.)\\n" "\\1\\1\\n\\n" file

More then likely this will undergo a lot of changes. A quick list of my current ideas include:

  1. If no filename is provided take input from STDIN. Multiple files can be handled by piping the `cat` of multiple files through rr.
  2. Better switch structure, although right now I don’t have any idea what that is

I’ll throw a test scenario at you. I had tabulated data in a file but each row was split across multiple lines. Now this wasn’t the only data in the file but I’ll present you with a simplier version here: [in.1]

Product A  12.99
           2001
----
Product B   1.99
           1997

Here you can see that I can’t just replace every other newline. What I want to actually do is replace newlines where there was a digit followed by a newline, some whitespace and another digit. I ran this through my script:

> rr "(\d)\n\s+(\d)" "\1  \2" in.1 > out.1

And I got the output I wanted: [out.1]

Product A  12.99  2001
----
Product B   1.99  1997

Even cleaner results can be seen by running a more advanced regex to remove the extra lines:

> rr "(\\d)\\n\\s+(\\d.*?\\n)(-+\\n)?" "\\1  \\2" in.1
Product A  12.99  2001
Product B   1.99  1997

So what are you waiting for? Download the script, add it to your bin directory, give it a test run, and tell me how you want it improved!

rr – Most Recent Version – Download

Thanks!

One Response

1

Joseph Pecoraro on December 3, 2007 at 1:49 am  #

Currently only \n, \t, \r, and the usual backreferences \1 to \9 are the only string metacharacters allowed in the replacement string. I don’t see any reason to include \b, \a, and their ilk but if there is a demand for them they can easily be added.

The reason for this current behavior is because the replacement string (the ‘replace’ command line argument) is fed to ruby as a literal string. So if you passed in “\n” what is actually in Ruby’s ARGV array is “\\n” which I manually map back into “\n” in the script. Any suggestions on what I should do here would be neat.

Add a Comment

search