mgrep – Multiple Regex Grep – SotD

My newest Ruby utility is called mgrep. It is a multiple regular expression version of grep. Input is processed line by line and each regular expression is said by the user to match or not match. All lines that meet the user’s desires for matching/non-matching regular expressions are printed in the following format: “filename [line number]: line of text.”

This can be taken in a number of different directions. I’m thinking “Partial Matches” meaning one regular expression matches one line in the file and eventually a second regular expression matches a totally different line and if all of these conditions succeed by the end of the file then print out the filename. This sounds more useful and will most likely be in version 1.0.

Here is the current usage:

usage: mgrep [-#] ( [-n] regex ) [filenames]
  #         - the number of regular expressions, defaults to 1
  ( ... )   - there should be # of these
  regex     - regular expessions to be checked on the line
  filenames - names of the input files to be parsed, if blank uses STDIN

options:
  --neg, --not, or -n    line must not match this regular expression

special note:
  When using bash, if you want backslashs in the replace portion make sure
  to use the multiple argument usage with single quotes for the replacement.

The usage is a little confusing seeing as the number of regular expressions on the command line are variable based on another command line switch. All in all though it is rather clear. Options will likely come in the future, much like grep or awk if I get around to it.

Here is an example probably not useful but at least it shows functionality, the line must contain a number, a double letter, and end with a !: [input]

1 ab !
- aa !
1 aa -
1 aa !
oh yah! 1

And when I run my script, I’ll put all the regular expressions in /here/ to make it clearer, this syntax is allowed by mgrep for convenience. Here is what it looks like:

joe[~/sandbox]$ mgrep -3 '/\\d/' '/(\\w)\\1/' '/!$/' input
input [4]: 1 aa !

To show of the –neg or -n option this command will show all the lines that do not have a hypen and still end with a !:

joe[~/sandbox]$ mgrep -2 -n '/-/' '/!$/' input
input [1]: 1 ab !
input [4]: 1 aa !

The script surely be updated soon, but grab it now and try it out:
mgrep – Most Recent Version – Download
mgrep – changelog

Regular Expression Examples

A number of visitors have come to my website using the search terms regex replace. So I thought I would devote an entire article on how to use regular expressions to do a find and replace on a string in some popular languages. Example code is always attractive so lets get to the point! There is example code in Ruby, Perl, Python, Javascript, and Java. [If you have other suggestions let me know or show me in your comments!]

All of the basic examples:

  1. put the string “one two three” into a variable
  2. then use a regular expression and a native function to the language to
  3. transform the original variable’s value to the new string “one 2 three”

Click Here For the Basic Examples

Now you may recognize that in the above examples that regular expressions where not even needed. All we did was find and replace a string and that simple task can be done without regular expressions! So here is a more advanced example without the training wheels.

In the advanced examples:

  1. the string “a1b2c3” [may not need to be stored in a variable] is
  2. manipulated by a [globally replacing] regular expression
  3. resulting in “a11b22c33” [where all numbers, but not letters, are duplicated]
  4. which is stored in a variable

Click Here To Toggle the Advanced Examples

Ruby:

result = 'a1b2c3'.gsub( /(\d)/, '\1\1' )

Perl:

$result = 'a1b2c3';
$result =~ s/(\d)/\1\1/g;

Python:

import re
result = re.sub(r'(\d)', '\\1\\1', 'a1b2c3')

Javascript:

var result = 'a1b2c3'.replace( /(\d)/g, "$1$1" );

Java:

public class RegexTest {
  public static void main(String args[]) {
    String str = "a1b2c3";
    String result = str.replaceAll("(\\d)", "$1$1");
  }
}

Pay strict attention to the number of backslashes required in python, the $1 used in Java and Javascript (however these are also global variables found in Ruby and Perl), and the trailing /g option required in Perl and Javascript for the global replacement. Each language has its own little spin on things.

I hope this helped answer your questions on regular expressions. In case I whet your appetite on Regular Expressions I can point you to my Introductory Article on Regular Expressions and my command line utility rr that allows you to run Ruby regular expression find and replace commands on files, standard input, and even piped input.

rr – 1.1 – In Place Edits and Multiple Files

Less then 48 hours after rr becomes 1.0 it gets a few very handy improvements!

In place modification of files is activated via the –modify (or shorthand -m) option. This means that you can bypass any output redirection and just go straight to modifying the original file. This feature does use filename.tmp as a temp file which it later renames to the original filename. Again if no filenames are specified then input is expected to come from STDIN and therefore the new –modify option will be ignored in this special case.

Another original goal of mine was adding support for multiple filenames. Specifically so that useful shell tricks like *.txt file globbing would work nicely with rr. Well support has been added and it works great with the new –modify option.

The usage message has been cleaned up a bit but here is the very basic usage for all new people.

usage: rr [options] find replace [filenames]
       rr [options] s/find/replace/ [filenames]

I wanted to point out a rather hidden feature. The way I implemented the options is that the ARGV array is actually parsed first for all options and then removes the options before going on to parse the find, replace, and filename arguments. This means that your options can go anywhere on the command line so long as they start with a -.

This presents 1 problem, a workaround, and a question for users. Using the second form of usage, where the find and replace portions are separate argument if your regex or replacement text starts with a “-” the script will interpret it as an option. You can avoid this by using the s/find/replace/ usage (or putting the regex in /regex/ format, which is allowed). But really this boils down to deciding whether or not I am being too liberal with my command line arguments. Since this is a very big fringe condition with a workaround I am going to allow options to be placed anywhere, allowing you to bring up the last command in bash with the up arrow and adding an option to the end of your rr command (like the new -m) to repeat your last command with an option much easier.

rr is always free, Try It Out:
rr – Current Version Download
rr – changelog.txt – Click Here

$ gem install regex_replace

rr – 1.0 – Now a Pipe Friendly Filter

rr has reached the 1.0 milestone! The obvious improvement over the last version is that input is allowed from standard input. It seemed silly to always require a filename and the option of having standard input was always on my to do list. Usage is now:

usage: rr [options] find replace [filename]
       rr [options] s/find/replace/ [filename]

Now you can use rr as a filter and happily make find replace changes by piping input into it or out of it! I already have a script that runs a file through 4 rr commands to produce much nicer and cleaner output. Wrap that up in a shell/ruby/perl script and you have a useful tool.

Enjoy. Again its all free!
rr – Current Version Download
rr – changelog.txt – Click Here

$ gem install regex_replace

rr – Updated to 0.9.1

rr now has some improvements, including a new style of usage. Both this new style and the original style usage are available to you.

rr [options] s/find/replace/ filename

The new s/find/replace/ syntax is still weak with respect to the forward slash character in either the find or replace, but works for everything else so far. Of course if you want to include any whitespace then you should wrap the entire argument in quotes. Also, because the strings are coming from the command line, if you want to have literal backslashes then use single quotes around your string so the shell doesn’t escape them itself before sending it to Ruby.

Another highlight is that all escape sequences should now work. That means your typical \n, \t, and all the obscure even including \a (system bell). Check out this example, you will hear two system bells once this has been run:

$ echo "aba" > in.1; rr a "\a" in.1; rm in.1

Also I was considering renaming to fr for “Find/Replace” however I am keeping rr. rr can be interpreted as “Run Regex” with the s/find/replace/ syntax, or “Regex Replace” for the normal 3 argument usage. If you like fr you can easily make an alias like so: Want to know how to always load the alias?

$ alias fr="rr"

So have fun, of course everything is free and available right here:
rr – Current Version Download
rr – changelog.txt – Click Here

rr – Regex Replace on a File – SotD

I was frustrated with regular expression find/replace programs that only did line processing. This was because often I had find/replace needs that spanned multiple lines. Programs like grep, ack (which I recently found and is really, really very awesome for searching code), and sed were easy enough to use for basic needs. But again, when it came to multiple line pattern matching both fell short of my needs.

My solution was to write my own script to parse an entire file as a single string and do my find/replace bidding. The cons being liberal use of memory and a few hundredths of a second longer then the usual find/replace algorithms seemed insignificant to the pros of a multi-line capable find/replace using a regular expression with the capability of using back references (like \1) to incorporate captured groups from the regex into the replacement text.

So, without further ado I present rr.

I am hopeful for some public criticism to help me bring rr up from its current version of 0.9 to a landmark 1.0. The ruby script weighs in at 100 lines but really under 50 are code and the rest is comments, whitespace, or the usage string. Speaking of usage, here is what it currently [v0.9.0] looks like:

usage: rr [options] find replace filename
  find     - a regular expression to be run on the entire file as one string
  replace  - replacement text, \1-\9 and \n are allowed
  filename - name of the input file to be parsed

options: --line or -l process line by line instead of all at once (not default) --case or -c makes the regular expression case sensitive (not default) --global or -g process all occurrences by default (this is already default)
negated options are done by adding 'not' or 'n' in switches like so: --notline or -nl
example usage: The following takes a file and doubles the last character on each line and turns the newlines into two newlines. rr "(.)\\n" "\\1\\1\\n\\n" file

More then likely this will undergo a lot of changes. A quick list of my current ideas include:

  1. If no filename is provided take input from STDIN. Multiple files can be handled by piping the `cat` of multiple files through rr.
  2. Better switch structure, although right now I don’t have any idea what that is

I’ll throw a test scenario at you. I had tabulated data in a file but each row was split across multiple lines. Now this wasn’t the only data in the file but I’ll present you with a simplier version here: [in.1]

Product A  12.99
           2001
----
Product B   1.99
           1997

Here you can see that I can’t just replace every other newline. What I want to actually do is replace newlines where there was a digit followed by a newline, some whitespace and another digit. I ran this through my script:

> rr "(\d)\n\s+(\d)" "\1  \2" in.1 > out.1

And I got the output I wanted: [out.1]

Product A  12.99  2001
----
Product B   1.99  1997

Even cleaner results can be seen by running a more advanced regex to remove the extra lines:

> rr "(\\d)\\n\\s+(\\d.*?\\n)(-+\\n)?" "\\1  \\2" in.1
Product A  12.99  2001
Product B   1.99  1997

So what are you waiting for? Download the script, add it to your bin directory, give it a test run, and tell me how you want it improved!

rr – Most Recent Version – Download

Thanks!

search