RDoc Introduction

Automatically generating documentation from source code has been available as far back as 1993. Its so common now that its expected to be available in any mainstream programming languages. I’ve seen it most commonly in Object Oriented languages offering nicely formatted descriptions of classs and their public methods/attributes.

Consistency is Nice

The main advantage I see with automatically generated documentation is that it is consistent. Take Javadocs for instance. They are all the same. When a developer wants to work with a Java library, they expect Javadocs. Why? Because they are familiar with them. They can easily navigate them and quickly find whatever it is that they are looking for. Documentation in any other way would require wasting time learning how to use/navigate it searching for what you want to know.

RDoc is Ruby’s documentation generator. You see RDoc generated documentation all over the place in Ruby. See YAML, Hpricot, or even core classes like Array.

So, I felt if I want to continue using Ruby I should at least learn how its handled. It turns out that its easier then I thought. I’m a huge fan of Markdown syntax and RDoc turns out to be pretty close to that. So, here is what I think is all you need to know to handle producing some simple, yet thorough, documentation for a class.

RDoc Resources

Start by updating your rdoc. The latest version at the time of writing is 2.2.1. The gem provides you with the rdoc and ri tools so that you can both generate and display documentation from the command line. Here is how you can install them:

shell> sudo gem install rdoc

The best online resources I found were not surprisingly:


Here is a basic example that shows the structure of the RDoc as it describes a File, Class, Attributes, and Methods. The placement of the comments is important. RDoc comments are always on top of what they are documenting:

# Documentation for the file itself
# There should be a blank line between this and any class
# definition to separate the documentation about the file
# and the class.  If there is no space then the entire text
# is used for both the file and the class, no different.

# Documentation for the class itself.
# This will appear at the top of the page specific to this
# class, before any other content.
class Dice
  # Documentation for an attribute
  # To documentation each attribute you must make individual
  # calls to attr_accessor, attr_reader, and attr_writer.
  # Appears next to the attribute name in the attrs section
  attr_accessor :sides

  # Documentation for the constructor
  # Corresponds to the `new` method
  def initialize(sides)
    @sides = sides

  # Documentation for a method
  def roll(times)
    Array.new(times).map { 1+rand(@sides) }

  # Documentation for a method
  def beat(num)
    roll(1).first > num


Running `rdoc` on that file creates this documentation.


Rich documentation makes the important parts stand out. It makes use of HTML’s expressive power and enables lists, headers, links, bold/italics, code, and other presentation helpers. I’ll now document the Dice class and add some style and realistic content.

# == sample.rb
# This file contains the Dice class definition and it runs
# some simple test code on a 16 sided dice.  A 20 dice
# roll fight again the COMPUTER who always rolls 10s!

# Multi-sided dice class.  The number of sides is determined
# in the constructor, or later on by accessing the _sides_
# attribute.
# == Summary
# A #single_roll returns a single integer from 1 to the
# number of sides, _inclusive_.  However, if you want to
# roll multiple times you can can use the #roll method,
# specifying the number of rolls you want, and you will
# get an Array with the values of all the rolls!
# == Example
#    dice = Dice.new(8)   # An eight sided dice
#    four = dice.roll(4)  # An Array containing 4 rolls
#    sum  = four.inject(0) { |mem,i| mem+i } # Sum of rolls
# == Contact
# Author::  Joseph Pecoraro (mailto:joepeck02@gmail.com)
# Website:: http://blog.bogojoker.com
# Date::    Saturday November 29, 2008
class Dice

  # Number of sides on the dice
  attr_accessor :sides

  # Create a dice with `sides` of dice.
  # Defaults to 6.
  def initialize(sides=6)
    @sides = sides

  # Returns an array of size `times` containing
  # a group of dice rolls.
  def roll(times)
    Array.new(times).map { single_roll }

  # Returns the value of a single dice roll.  The
  # values are from 1 to @sides _inclusive_.
  def single_roll

  # A single roll challenge:
  # * makes a single_roll
  # * returns true if the roll was strictly greater
  #   then the given number
  # * returns false otherwise
  def beat(num)
    single_roll > num


# Note that this is a constant, which is special
# and it is documented like a Class Attribute.
# This is in the RDoc generated documentation for
# the file.

# Note that these comments, for generic code
# are not in the RDoc generated documentation.
dice = Dice.new(16)
winCount = loseCount = 0
20.times do
  if dice.beat(COMPUTER)
    winCount += 1
    loseCount += 1

# Output
puts "You won #{winCount} times and lost #{loseCount} times!"
puts "Muhahah.  Try again later!!"           if winCount < loseCount
puts "Well Played.  I'll get you next time." if winCount > loseCount
puts "What a match!  Boy that was fun."      if winCount == loseCount

That generates this documentation.


There are some subtle points that make this documentation format nicely. I’ll point them out and explain them. Most of this is straight from the above resources, however some of it I could not find documented anywhere.

  • The file documentation links to the Dice class. Furthermore the Class documentation links down to the single_roll and roll methods. This is because:

    Names of classes, source files, and any method names containing an underscore or preceded by a hash character are automatically hyperlinked from comment text to their description.

    1. sample.rb was a filename and so it was automatically linked.
    2. Dice was the name of a class and so it was automatically linked.
    3. single_roll had an underscore and happened to be a method name so it was automatically linked in a few places.
    4. #roll had a hash character signifying that it should be linked.
  • Sections begin with a “=” or a “==”. I prefer to use double, because it stands out more in the source code. Technically a single “=” becomes a level 1 header, and a double becomes a level 2 header. However, they both display the same.
  • URIs like http://blog.bogojoker.com and mailto:email are automatically turned into links and formatted nicely.
  • Bold, Italics, and Typewriter Text can be quickly formated much like Markdown:

    _italic_ or <em>italic</em>
    *bold* or <b>bold</b>
    +typewriter+ or <tt>typewriter</tt>

  • Code is displayed if each line
  • Tabular Labeled List, like the Contact information, are formatted like:

    label:: description 1
    label2:: both descriptions will line up

  • Formatting source code is like Markdown. The code that you want formatted must be indented with a few spaces. As long as the indention is maintained the text will display as source code in the HTML documentation.
  • Formatting lists is again like Markdown. Just use *’s or -‘s and they will turn into bullet points. For numbered lists just use numbers followed by a dot and they will be formatted automatically.

Final Notes on `rdoc` itself

When I created the final documentation above I used a few of rdoc’s command line switches to customize the output. What I actually used was:

shell> rdoc --title="Dice Documentation" --line-numbers --tab-width=2

The title switch changed the <title> for the documentation page, and the other two deal with formatting the htmlized source code that RDoc shows when you click on the function name to view the source in the documentation. There are plenty of command line switches. To view the full list do:

shell> rdoc --help

A few useful switches are “–ri” to create ri documentation so you can access your classes from the command line. Also you can output to several formats. For instance you can make a PDF using “–format=texinfo” then using `texi2pdf` on the texinfo file. The PDF doesn’t look that bad, here is my example as a PDF.

NOTE: Finding the generators was tricky. I had to check out the rdoc source code and find the different generators. If anyone knows an easier way to check what generators are available, please let me know.

I hope this helps some people using RDoc for their classes. Enjoy.

DATA and ARGF in Ruby

Of all the “superglobals” in Ruby these seemed to be the least documented. It only takes a quick example to understand them. I had some fun and decided to play around with these variables and more.


Although this is mostly useless, its a neat trick. In any Ruby script, as soon as the __END__ symbol is matched, then the rest of the text in the file is no longer parsed by the interpreter. Whatever is after __END__ can be accessed via DATA. DATA acts like a File Object, so its like you’re reading the current script as though you’re reading from a File.

# DATA is a global that is actually a File object
# containing the data after __END__ in the current
# script file.
puts DATA.read

I can put anything I want
after the __END__ symbol
and access it with the
DATA global.  Whoa!


ARGF takes each of the elements in ARGV, assumes they are filenames, and allows you to process these files as single stream of input. This is common with shell programs. Its a lot like cat. cat takes multiple files on the command line, and outputs them as a single stream. If you want to force input to come from STDIN then just provide a hypen “-“. Finally, if there is nothing in ARGV then ARGF defaults to STDIN.

Here is a simple example of ARGF mimicking cat.

shell> echo "inside a.txt" > a.txt
shell> echo "inside b.txt" > b.txt
shell> cat a.txt b.txt 
inside a.txt
inside b.txt

Here is a Ruby script that can do just that:

# cat.rb
ARGF.each do |line|
  puts line

Example usage:

shell> ruby cat.rb a.txt b.txt 
inside a.txt
inside b.txt

ARGF Confusion

What confused me when I first used ARGF was that it has no special class. It claims it is an Object. Take a look:

>> ARGF.class
# => Object

But at the same time it has so much more then a regular Object:

>> ARGF.methods - Object.methods
# => ["select", "lineno", "readline", "eof", "each_byte", "partition", "lineno=", "read", "fileno", "grep", "to_i", "filename", "reject", "readlines", "getc", "member?", "find", "to_io", "each_with_index", "eof?", "collect", "path", "all?", "close", "entries", "tell", "detect", "zip", "rewind", "map", "file", "any?", "sort", "min", "seek", "binmode", "find_all", "each_line", "gets", "each", "pos", "closed?", "skip", "inject", "readchar", "pos=", "sort_by", "max"]

The important things to note are accessors like lineno and filename. They can give you some information while you read the lines. Such as if you’re reading from a file or STDIN. You can easily give line numbers to everything being read. Like so:

# linenum.rb
ARGF.each do |line|
  puts "%3d: %s" % [ARGF.lineno, line]


shell> ruby linenum.rb a.txt b.txt 
  1: inside a.txt
  2: inside b.txt


As an exercise I wrote a little class to emulate what ARGF does and to make it more useful to me. For instance ARGF can’t tell you when it changes files. You can try and catch when the filename changes but what if the same file is repeated twice in a row? ARGFy has both a global lineno and a per file filelineno I’ll talk more about ARGFy later.

AtomPub Overview and Curl Reference

Not long ago I had to learn about the Atom Publishing Protocol for my job. I spent about a week learning on my own time all about XML, AtomPub, and even the basics of HTTP. After that week I decided to write down my own personal overview and example code to try and “visually” explain AtomPub as best I could. The result was (and is):

My Visual Guide to AtomPub

Now keep in mind that I wrote that only a few weeks after learning it. The process of writing that guide forced myself to study it in greater detail than normal, actually run tests, and produce realistic output and examples. I know its not perfect (I’d probably be slaughtered for my definition of REST) but over time I’ll be happy to improve and update it. I think the design really improves the content making it readable, fun, and useful to refer to.

I’m linking to it now because I’ve done a number of projects like this (my Unix Tutorial) because I like sites that are strictly focused on one thing and do that one thing very well. I’ll probably spend a little bit of time on remainder of my break from school by cleaning up these small “brain dump” websites. I wanted to make sure they were mentioned and linked to from my blog. Clearly they will be of no use to anyone if they are never linked to!

I decided to include a small `curl` reference on my AtomPub guide. This is because its a very nice tool when working with HTTP requests and an overall generally useful shell program. I think people might find the curl reference useful.

I hope you enjoy this. I’ll be linking to these occasionally as they grow.

Helpful Configure Options For Development

Recently I decided to install Ruby 1.9. I’ve been compiling programs from source more and more often and I am getting used to the normal workflow: configure, make, make install. But Ruby 1.9 was different. I wanted to easily reference both the Ruby 1.8 and Ruby 1.9 so I could easily work with both. I came across a neat switch that I had never cared to use before but made perfect sense for my situation:

shell> ./configure --program-suffix=WHATEVER ...

That neat switch will make it so when you finally `make install` the output binaries are named rubyWHATEVER, irbWHATEVER, etc. Obviously you should choose something better then WHATEVER. I went with ‘–program-suffix=19’ so that all my Ruby 1.9 binaries are exactly the same as the 1.8 binaries but with “19” on the end. Such as ruby19, irb19, etc. This will really save me some time as I experiment with Ruby 1.9.

Note when installing Ruby from source on Mac OS X you may run into problems, check out this article or the one linked above for the fixes.

Also, since I had now had a ruby19 I figured I’d make a ruby18 as well. I already have my own ‘~/bin’ directory in my shell’s path. So I added the following symbolic link which makes `ruby18` reference my default Ruby 1.8 interpreter:

shell> ln -s `which ruby` ruby18

The `which ruby` in my case evaluates to the default installation of Ruby. It may not be the case on your computer. That depends on how you’ve modified your path. So the explicit version that should work on any mac is:

shell> ln -s /usr/bin/ruby ruby18

Here the symbol link means I can use `ruby18` if I want to be dead sure I’m using Ruby 1.8, and likewise `ruby19` when I want to use Ruby 1.9. I know I’m not the first person to do this, but I’m happy and I want to spread the joy.

Other notable configure options are:

shell> ./configure --prefix=PREFIX ...

This makes it so when you run `make install` it will install the files to PREFIX/(here). An example would typically be ‘–prefix=/usr/local’ to install the files into the /usr/local dir. However, that is normally the default installation directory and therefore most people don’t bother changing it. I’ve used this prefix to install temporary things directly into my home directory. This made it easy to find, work with, and delete when I was done.

The Big Picture Analyzer

One of my favorite blogs that I follow is The Big Picture. Its a collection of amazing pictures. Make sure that you go into each article, that single picture is only a teaser and often collections have over 30 images.

(JOERG KOCH/AFP/Getty Images)

At the Big Picture it is very common for commenters to list the numbers of the images that they really liked. This intrigued me a little bit. Also, the comments on the main page are limited to just
100 comments, when there might actually be over 2000 comments on some collections.

So, I wrote a Ruby Script that I call bigpicvotes that takes in the URL to a Big Picture site and analyzes the comments to make a guess at what pictures were voted for the most!

bigpicvotes usage

Above is a sample usage on an article with now over 2600 comments. Notice that the URL you provide can be either the “all comments” URL or just the URL with the pictures that you are likely to be looking at. Also, the usage allows for a second optional parameter. This is to show the top N images instead of defaulting to the top 10.

One last thing. To use it you will need the Hpricot gem. That is simple enough to install:

shell> gem install hpricot

Again, the script is available here: bigpicvotes