DATA and ARGF in Ruby

Of all the “superglobals” in Ruby these seemed to be the least documented. It only takes a quick example to understand them. I had some fun and decided to play around with these variables and more.

DATA

Although this is mostly useless, its a neat trick. In any Ruby script, as soon as the __END__ symbol is matched, then the rest of the text in the file is no longer parsed by the interpreter. Whatever is after __END__ can be accessed via DATA. DATA acts like a File Object, so its like you’re reading the current script as though you’re reading from a File.

# DATA is a global that is actually a File object
# containing the data after __END__ in the current
# script file.
puts DATA.read

__END__
I can put anything I want
after the __END__ symbol
and access it with the
DATA global.  Whoa!

ARGF

ARGF takes each of the elements in ARGV, assumes they are filenames, and allows you to process these files as single stream of input. This is common with shell programs. Its a lot like cat. cat takes multiple files on the command line, and outputs them as a single stream. If you want to force input to come from STDIN then just provide a hypen “-“. Finally, if there is nothing in ARGV then ARGF defaults to STDIN.

Here is a simple example of ARGF mimicking cat.

shell> echo "inside a.txt" > a.txt
shell> echo "inside b.txt" > b.txt
shell> cat a.txt b.txt 
inside a.txt
inside b.txt

Here is a Ruby script that can do just that:

# cat.rb
ARGF.each do |line|
  puts line
end

Example usage:

shell> ruby cat.rb a.txt b.txt 
inside a.txt
inside b.txt

ARGF Confusion

What confused me when I first used ARGF was that it has no special class. It claims it is an Object. Take a look:

>> ARGF.class
# => Object

But at the same time it has so much more then a regular Object:

>> ARGF.methods - Object.methods
# => ["select", "lineno", "readline", "eof", "each_byte", "partition", "lineno=", "read", "fileno", "grep", "to_i", "filename", "reject", "readlines", "getc", "member?", "find", "to_io", "each_with_index", "eof?", "collect", "path", "all?", "close", "entries", "tell", "detect", "zip", "rewind", "map", "file", "any?", "sort", "min", "seek", "binmode", "find_all", "each_line", "gets", "each", "pos", "closed?", "skip", "inject", "readchar", "pos=", "sort_by", "max"]

The important things to note are accessors like lineno and filename. They can give you some information while you read the lines. Such as if you’re reading from a file or STDIN. You can easily give line numbers to everything being read. Like so:

# linenum.rb
ARGF.each do |line|
  puts "%3d: %s" % [ARGF.lineno, line]
end

Produces:

shell> ruby linenum.rb a.txt b.txt 
  1: inside a.txt
  2: inside b.txt

ARGFy

As an exercise I wrote a little class to emulate what ARGF does and to make it more useful to me. For instance ARGF can’t tell you when it changes files. You can try and catch when the filename changes but what if the same file is repeated twice in a row? ARGFy has both a global lineno and a per file filelineno I’ll talk more about ARGFy later.

search