The ARGFy Experiment

I wrote an earlier article that talked about Ruby’s global ARGF variable. I mentioned that I took that a step further, to experiment and learn a number of aspects about Ruby development. Those included:

  1. RDoc – Ruby Autogenerated Documentation
  2. RSpec – Ruby Test Framework aiding Behavior Driven Development
  3. General Familiarity with Ruby Classes
  4. General Familiarity with GitHub

I have to say that I was really impressed with how strikingly natural, easy, and fun it was to work with these tools. I already wrote about RDoc, hopefully to cover a “void” that I saw in the online documentation for it. I may look into writing about RSpec, however the current RSpec documentation was quite good so I may focus elsewhere. Finally GitHub and Ruby are mostly things that you have to personally practice with to get good at, and there are already plenty of great resources for them. The Ruby community has done a very good job!

The ARGFy Results

So, here are the results of my experiment:

The GitHub README is very similar to the RDoc, but it goes in more depth by showing the output of the sample.rb script included. Its not too exciting, but here is what ARGFy does.

What ARGFy Does

ARGFy is a class. In the constructor it takes an Array of filenames. It then treats those files as one continuous stream of lines. If no filenames are provided, or if “-” is provided as a filename, that input is treated as STDIN. Everything so far makes ARGFy look and act just like ARGF except you can specify your own files instead of only relying on the command line arguments.

Using ARGFy is mostly like ARGF. If you call the ARGFy#each method (note that this allows for any Enumerable method!) it will exhaust all the lines of input from all the files as a single stream. At each line you can check the states of the ARGFy object itself. The states include filename and lineno like the normal ARGF, but they also include filelineno. Because there is a filelineno there is a guaranteed way to know if under-the-hood the stream is now processing a different file. Since this might be a common thing to check there is a ARGFy#new_file? helper method that does just that.

Finally, because its an object you can add a file to the list at any time. Although removing didn’t seem to make much sense considering what its purpose was. Just make use of ARGFy#add_file to add a file to the end of the sequence of input files to the stream.

In the background ARGFy is really just reading and buffering the files one at a time and returning the lines. Its nothing too exciting, just a little fun working with Ruby. The example nicely displays how ARGFy works:

# sample.rb
require 'ARGFy'

argf = ARGFy.new(ARGV)
argf.each do |line|

  # Per File Header
  if argf.new_file?
    filename = argf.filename
    filename = "STDIN" if filename == "-"
    puts '', filename, "-"*filename.length
  end

  # Print out the line with line numbers
  puts "%3d: %s" % [argf.lineno, line]

end
puts

Calling sample.rb with a few small input files creates some nicely formatted output:

shell> ruby sample.rb in1.txt in2.txt 

in1.txt
-------
  1: one
  2: two

in2.txt
-------
  1: alpha, beta, gamma
  2: 0987654321
  3: 
  4: NOT BLANK!

Nothing complex. It works like you would expect it too. For more sample usage you can scan the RSpec test cases in the GitHub repository.

DATA and ARGF in Ruby

Of all the “superglobals” in Ruby these seemed to be the least documented. It only takes a quick example to understand them. I had some fun and decided to play around with these variables and more.

DATA

Although this is mostly useless, its a neat trick. In any Ruby script, as soon as the __END__ symbol is matched, then the rest of the text in the file is no longer parsed by the interpreter. Whatever is after __END__ can be accessed via DATA. DATA acts like a File Object, so its like you’re reading the current script as though you’re reading from a File.

# DATA is a global that is actually a File object
# containing the data after __END__ in the current
# script file.
puts DATA.read

__END__
I can put anything I want
after the __END__ symbol
and access it with the
DATA global.  Whoa!

ARGF

ARGF takes each of the elements in ARGV, assumes they are filenames, and allows you to process these files as single stream of input. This is common with shell programs. Its a lot like cat. cat takes multiple files on the command line, and outputs them as a single stream. If you want to force input to come from STDIN then just provide a hypen “-“. Finally, if there is nothing in ARGV then ARGF defaults to STDIN.

Here is a simple example of ARGF mimicking cat.

shell> echo "inside a.txt" > a.txt
shell> echo "inside b.txt" > b.txt
shell> cat a.txt b.txt 
inside a.txt
inside b.txt

Here is a Ruby script that can do just that:

# cat.rb
ARGF.each do |line|
  puts line
end

Example usage:

shell> ruby cat.rb a.txt b.txt 
inside a.txt
inside b.txt

ARGF Confusion

What confused me when I first used ARGF was that it has no special class. It claims it is an Object. Take a look:

>> ARGF.class
# => Object

But at the same time it has so much more then a regular Object:

>> ARGF.methods - Object.methods
# => ["select", "lineno", "readline", "eof", "each_byte", "partition", "lineno=", "read", "fileno", "grep", "to_i", "filename", "reject", "readlines", "getc", "member?", "find", "to_io", "each_with_index", "eof?", "collect", "path", "all?", "close", "entries", "tell", "detect", "zip", "rewind", "map", "file", "any?", "sort", "min", "seek", "binmode", "find_all", "each_line", "gets", "each", "pos", "closed?", "skip", "inject", "readchar", "pos=", "sort_by", "max"]

The important things to note are accessors like lineno and filename. They can give you some information while you read the lines. Such as if you’re reading from a file or STDIN. You can easily give line numbers to everything being read. Like so:

# linenum.rb
ARGF.each do |line|
  puts "%3d: %s" % [ARGF.lineno, line]
end

Produces:

shell> ruby linenum.rb a.txt b.txt 
  1: inside a.txt
  2: inside b.txt

ARGFy

As an exercise I wrote a little class to emulate what ARGF does and to make it more useful to me. For instance ARGF can’t tell you when it changes files. You can try and catch when the filename changes but what if the same file is repeated twice in a row? ARGFy has both a global lineno and a per file filelineno I’ll talk more about ARGFy later.

search