DATA and ARGF in Ruby

Of all the “superglobals” in Ruby these seemed to be the least documented. It only takes a quick example to understand them. I had some fun and decided to play around with these variables and more.

DATA

Although this is mostly useless, its a neat trick. In any Ruby script, as soon as the __END__ symbol is matched, then the rest of the text in the file is no longer parsed by the interpreter. Whatever is after __END__ can be accessed via DATA. DATA acts like a File Object, so its like you’re reading the current script as though you’re reading from a File.

# DATA is a global that is actually a File object
# containing the data after __END__ in the current
# script file.
puts DATA.read

__END__
I can put anything I want
after the __END__ symbol
and access it with the
DATA global.  Whoa!

ARGF

ARGF takes each of the elements in ARGV, assumes they are filenames, and allows you to process these files as single stream of input. This is common with shell programs. Its a lot like cat. cat takes multiple files on the command line, and outputs them as a single stream. If you want to force input to come from STDIN then just provide a hypen “-“. Finally, if there is nothing in ARGV then ARGF defaults to STDIN.

Here is a simple example of ARGF mimicking cat.

shell> echo "inside a.txt" > a.txt
shell> echo "inside b.txt" > b.txt
shell> cat a.txt b.txt 
inside a.txt
inside b.txt

Here is a Ruby script that can do just that:

# cat.rb
ARGF.each do |line|
  puts line
end

Example usage:

shell> ruby cat.rb a.txt b.txt 
inside a.txt
inside b.txt

ARGF Confusion

What confused me when I first used ARGF was that it has no special class. It claims it is an Object. Take a look:

>> ARGF.class
# => Object

But at the same time it has so much more then a regular Object:

>> ARGF.methods - Object.methods
# => ["select", "lineno", "readline", "eof", "each_byte", "partition", "lineno=", "read", "fileno", "grep", "to_i", "filename", "reject", "readlines", "getc", "member?", "find", "to_io", "each_with_index", "eof?", "collect", "path", "all?", "close", "entries", "tell", "detect", "zip", "rewind", "map", "file", "any?", "sort", "min", "seek", "binmode", "find_all", "each_line", "gets", "each", "pos", "closed?", "skip", "inject", "readchar", "pos=", "sort_by", "max"]

The important things to note are accessors like lineno and filename. They can give you some information while you read the lines. Such as if you’re reading from a file or STDIN. You can easily give line numbers to everything being read. Like so:

# linenum.rb
ARGF.each do |line|
  puts "%3d: %s" % [ARGF.lineno, line]
end

Produces:

shell> ruby linenum.rb a.txt b.txt 
  1: inside a.txt
  2: inside b.txt

ARGFy

As an exercise I wrote a little class to emulate what ARGF does and to make it more useful to me. For instance ARGF can’t tell you when it changes files. You can try and catch when the filename changes but what if the same file is repeated twice in a row? ARGFy has both a global lineno and a per file filelineno I’ll talk more about ARGFy later.

4 Responses

1

BogoJoker » The ARGFy Experiment on December 5, 2008 at 10:38 pm  #

[…] wrote an earlier article that talked about Ruby’s global ARGF variable. I mentioned that I took that a step further, […]

2

Tomasz Wegrzanowski on July 26, 2010 at 6:05 pm  #

> For instance ARGF can’t tell you when it changes files.

ARGF.file.lineno == 1

Here, I fixed it for you. ARGF already does everything you want, and ARGFy seems to be rather pointless.

3

Joseph Pecoraro on July 27, 2010 at 1:03 am  #

@Tomasz thanks. I didn’t know about that and I overlooked it above. That would indeed be the preferred solution. ARGFy is kinda pointless, and as I mentioned above it was an exercise. But you could use ARGFy to process files of your choosing instead of just command line arguments like ARGF. Cheers.

4

richard bucker on July 9, 2012 at 3:25 pm  #

ARGF.skip is not supposed to close the file… or rather the DOC says there are no side effects. Just skip to the next file in the list. I wish it worked as advertised because I need/want the File object but not closed. I suppose the fileno would also be useful.

Add a Comment

search