Markdown => Tutorial in 1 Step

When I wrote my Ruby Readline tutorial I felt I came up with a cool concept. I started with a Markdown file, translated it to html, and I used the headers to generate a Table of Contents on the fly.

table of contents

It didn’t take me long to realize that I could turn this into a framework where I could turn any Markdown file into a tutorial exactly like this one. So with surprisingly little work I modified the scripts to work with any markdown file and the html automatically generated from the standard Markdown.pl script.

I called this markdownorial. Laugh all you want at the name, but I still think the concept is very cool. I’ll probably be using this more and more to automatically generate and format a pretty cool looking tutorial from a single markdown file. The start to finish time for a project like this has instantly dropped to just the raw content part, no design or coding needed!

Advantages include:

  • Writing Markdown is very fast and efficient.

  • Time is spent writing the content. Not messing with design,

  • Table of Contents is automatically built for you.

  • Useful permalinks are automatically generated. Very useful when passing around links.

  • Clean user interface that focuses entirely on the content but the Table of Contents is always available!

  • Git Repository means if I update the design its just a `git pull` away to get the update.

Right now the tutorials shows up elegantly in all standards compliant browsers. Safari/Webkit and Chrome display it perfectly. Opera has some very minor Unicode issues but displays everything perfectly. Firefox has some separate Unicode issues and if you don’t have the latest version it has some working but slow animation. Overall, its entirely usable for people using decent browsers.

Let me know what you think. Feel free to use it and improve it. Its all up on Github.

First Nettuts Tutorial – .htaccess

My First Nettuts Tutorial was published today! For those who follow my blog every week, this is the reason why I haven’t been able to post the last few weeks… because I’ve been putting all the time I would normally be blogging into a series of Nettuts tutorials.


htaccess examples

This tutorial covers the basics of .htaccess, Apache’s Per Directory Configuration Files. Fortunately, with the basics out of the way, I can move on to the cooler features such as GZip encoding and mod_rewrite for the next article.

Also the examples can be viewed here:
http://bogojoker.com/htaccess/part1/

And the examples can be downloaded here:
http://bogojoker.com/htaccess/part1_examples.zip

Enjoy!

Ruby Readline Documentation

Before today the Ruby readline library lacked documentation! In order to find some decent documentation you would have to read the README packaged with the source. Michael Fellinger (manveru) from #ruby-lang offered to help me get some decent documentation online. Take a look!

readline

Debugging HTTP Headers

While writing the skreemr shell the other week I ran into an issue that required me to dig down to one of the lowest levels of web communication… HTTP Headers. I’m always shocked to learn that so many web developers don’t know much about headers. So I thought I would try to reenforce the point that HTTP Headers matter… and that knowing your stuff can help you debug and solve problems. Here is the story of my real world example.

I’m not going to go over HTTP Headers. thats already been done. Instead I’m going to focus on debugging and working with them a little bit. I’m going to assume you have already the general concepts.

The Problem

I wanted to add pagination to the skreemr shell. So you could run a search, then get the next page of results, etc. So a little experimentation with skreemr in my browser produced the following URLs:

http://skreemr.com/results.jsp?q=test
http://skreemr.com/results.jsp?q=test&l=10&s=10
http://skreemr.com/results.jsp?q=test&l=10&s=20

A simple pattern! “q” is the query string, “l” is the number per page, “s” is the number to start at, indexed from 0. So that last URL would produce results 21-30. This all worked well in my browsers, but it wasn’t working in Ruby:

require 'open-uri'

# Grab the pages and read the content
str1 = open( 'http://skreemr.com/results.jsp?q=test'      ).read  # 1-10
str2 = open( 'http://skreemr.com/results.jsp?q=test&s=10' ).read  # 11-20

#=> Should print false... but its printing true!
puts str1==str2

What this was saying was that the content being returned from both of those urls is EXACTLY the same. That couldn’t be… could it?

More Investigation

At this point I thought it was a problem in Ruby. I figured Ruby was doing some caching in the background that I was going to have to disable or work around. (I now know this is not true, but that was my first guess). To test that hypothesis I turned to my trusty friend curl and checked to see if that showed the proper behavior:

shell> curl 'http://skreemr.com/results.jsp?q=test'      -o 1.html
shell> curl 'http://skreemr.com/results.jsp?q=test&s=10' -o 2.html
shell> diff -q -s 1.html 2.html
Files 1.html and 2.html are identical

What!?! That stunned me. Curl was getting the exact same results as Ruby. I took a look at the html files, and indeed 2.html, which should have contained results 11-20 held 1-10. I opened both urls in my browser… they showed the correct results. Something weird was happening!

Take A Step Back

At this point you’ve got to know what is happening. Curl and Ruby (using Ruby’s Net:HTTP under the hood) are just making a simple GET request. Both my browsers Safari and Firefox are sending far more then just a GET request. They are sending a bunch of other headers. Lets take a look at what Firebug says Firefox sent:

firebug

Thats a mighty long list of headers! Its entirely possible that one of those might be influencing the server. Onto the drama!

Time to Act – Literally

The only difference between what curl sent in its request and what Firefox was sending is that header information and possibly any information contained in those cookies (unlikely in this case). So lets make curl act as if it is Firefox by send the same headers with curl! I took a quick peek at my curl reference for the proper switches/formatting and I was ready:

shell> curl 'http://skreemr.com/results.jsp?q=test' -o 1.html
shell> curl 'http://skreemr.com/results.jsp?q=test&s=10'            \
            -H 'Host: skreemr.com'                                  \
            -H 'User-Agent: Mozilla/5.0 (...) Firefox/3.0.6'        \
            -H 'Accept: text/html,application/xml;q=0.9,*/*;q=0.8'  \
            -H 'Accept-Language: en-us,en;q=0.5'                    \
            -H 'Accept-Encoding: gzip,deflate'                      \
            -H 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7'     \
            -H 'Keep-Alive: 300'                                    \
            -H 'Connection: keep-alive'                             \
            -H 'Cache-Control: max-age=0'                           \
            -o 2.html
shell> diff -q -s 1.html 2.html 
Files 1.html and 2.html differ

Jackpot! Some subset of those headers is indeed fixing my problem, because now it properly fetched the second page of results! Translating that back into Ruby works as well:

require 'open-uri'

# Headers
HEADERS = {
  'Host'            => 'skreemr.com',
  'User-Agent'      => 'Mozilla/5.0 (...) Firefox/3.0.6',
  'Accept'          => 'text/html,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Language' => 'en-us,en;q=0.5',
  'Accept-Encoding' => 'gzip,deflate',
  'Accept-Charset'  => 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
  'Keep-Alive'      => '300',
  'Connection'      => 'keep-alive',
  'Cache-Control'   => 'max-age=0'
}

# Get the Pages
str1 = open( 'http://skreemr.com/results.jsp?q=test', HEADERS ).read
str2 = open( 'http://skreemr.com/results.jsp?q=test&s=10', HEADERS ).read

# Correctly Prints false
puts str1==str2

Problem Solved

Sure, we have a solution but its not pretty. How much is really needed? Finding that out is just a matter of eliminating the headers that do nothing. Remove one, test, remove another, test, remove another test… As long as it still works after you’ve removed an individual header you know that header wasn’t needed. It took me a few minutes but I narrowed it down to just a single header:

MAGIC = { 'Accept-Language' => 'en-us' }

This was a rather unique problem. It is rare for me to have to dip down to the HTTP Headers to see what is actually happening for such a high level problem. However, as this problem shows, its important to know what is going on under the hood. If I didn’t know about headers, I would not have been able to solve it.

RDoc Introduction

Automatically generating documentation from source code has been available as far back as 1993. Its so common now that its expected to be available in any mainstream programming languages. I’ve seen it most commonly in Object Oriented languages offering nicely formatted descriptions of classs and their public methods/attributes.

Consistency is Nice

The main advantage I see with automatically generated documentation is that it is consistent. Take Javadocs for instance. They are all the same. When a developer wants to work with a Java library, they expect Javadocs. Why? Because they are familiar with them. They can easily navigate them and quickly find whatever it is that they are looking for. Documentation in any other way would require wasting time learning how to use/navigate it searching for what you want to know.

RDoc is Ruby’s documentation generator. You see RDoc generated documentation all over the place in Ruby. See YAML, Hpricot, or even core classes like Array.

So, I felt if I want to continue using Ruby I should at least learn how its handled. It turns out that its easier then I thought. I’m a huge fan of Markdown syntax and RDoc turns out to be pretty close to that. So, here is what I think is all you need to know to handle producing some simple, yet thorough, documentation for a class.

RDoc Resources

Start by updating your rdoc. The latest version at the time of writing is 2.2.1. The gem provides you with the rdoc and ri tools so that you can both generate and display documentation from the command line. Here is how you can install them:

shell> sudo gem install rdoc

The best online resources I found were not surprisingly:

Structure

Here is a basic example that shows the structure of the RDoc as it describes a File, Class, Attributes, and Methods. The placement of the comments is important. RDoc comments are always on top of what they are documenting:

# Documentation for the file itself
# There should be a blank line between this and any class
# definition to separate the documentation about the file
# and the class.  If there is no space then the entire text
# is used for both the file and the class, no different.

# Documentation for the class itself.
# This will appear at the top of the page specific to this
# class, before any other content.
class Dice
  
  # Documentation for an attribute
  # To documentation each attribute you must make individual
  # calls to attr_accessor, attr_reader, and attr_writer.
  # Appears next to the attribute name in the attrs section
  attr_accessor :sides

  # Documentation for the constructor
  # Corresponds to the `new` method
  def initialize(sides)
    @sides = sides
  end

  # Documentation for a method
  def roll(times)
    Array.new(times).map { 1+rand(@sides) }
  end

  # Documentation for a method
  def beat(num)
    roll(1).first > num
  end

end

Running `rdoc` on that file creates this documentation.

Style

Rich documentation makes the important parts stand out. It makes use of HTML’s expressive power and enables lists, headers, links, bold/italics, code, and other presentation helpers. I’ll now document the Dice class and add some style and realistic content.

# == sample.rb
# This file contains the Dice class definition and it runs
# some simple test code on a 16 sided dice.  A 20 dice
# roll fight again the COMPUTER who always rolls 10s!

#
# Multi-sided dice class.  The number of sides is determined
# in the constructor, or later on by accessing the _sides_
# attribute.
#
# == Summary
# 
# A #single_roll returns a single integer from 1 to the
# number of sides, _inclusive_.  However, if you want to
# roll multiple times you can can use the #roll method,
# specifying the number of rolls you want, and you will
# get an Array with the values of all the rolls!
#
# == Example
#
#    dice = Dice.new(8)   # An eight sided dice
#    four = dice.roll(4)  # An Array containing 4 rolls
#    sum  = four.inject(0) { |mem,i| mem+i } # Sum of rolls
#
# == Contact
#
# Author::  Joseph Pecoraro (mailto:joepeck02@gmail.com)
# Website:: http://blog.bogojoker.com
# Date::    Saturday November 29, 2008
#
class Dice

  # Number of sides on the dice
  attr_accessor :sides

  # Create a dice with `sides` of dice.
  # Defaults to 6.
  def initialize(sides=6)
    @sides = sides
  end

  # Returns an array of size `times` containing
  # a group of dice rolls.
  def roll(times)
    Array.new(times).map { single_roll }
  end

  # Returns the value of a single dice roll.  The
  # values are from 1 to @sides _inclusive_.
  def single_roll
    1+rand(@sides)
  end

  # A single roll challenge:
  # * makes a single_roll
  # * returns true if the roll was strictly greater
  #   then the given number
  # * returns false otherwise
  def beat(num)
    single_roll > num
  end

end


# Note that this is a constant, which is special
# and it is documented like a Class Attribute.
# This is in the RDoc generated documentation for
# the file.
COMPUTER = 10

# Note that these comments, for generic code
# are not in the RDoc generated documentation.
dice = Dice.new(16)
winCount = loseCount = 0
20.times do
  if dice.beat(COMPUTER)
    winCount += 1
  else
    loseCount += 1
  end
end

# Output
puts "You won #{winCount} times and lost #{loseCount} times!"
puts "Muhahah.  Try again later!!"           if winCount < loseCount
puts "Well Played.  I'll get you next time." if winCount > loseCount
puts "What a match!  Boy that was fun."      if winCount == loseCount

That generates this documentation.

Specifics

There are some subtle points that make this documentation format nicely. I’ll point them out and explain them. Most of this is straight from the above resources, however some of it I could not find documented anywhere.

  • The file documentation links to the Dice class. Furthermore the Class documentation links down to the single_roll and roll methods. This is because:

    Names of classes, source files, and any method names containing an underscore or preceded by a hash character are automatically hyperlinked from comment text to their description.

    1. sample.rb was a filename and so it was automatically linked.
    2. Dice was the name of a class and so it was automatically linked.
    3. single_roll had an underscore and happened to be a method name so it was automatically linked in a few places.
    4. #roll had a hash character signifying that it should be linked.
  • Sections begin with a “=” or a “==”. I prefer to use double, because it stands out more in the source code. Technically a single “=” becomes a level 1 header, and a double becomes a level 2 header. However, they both display the same.
  • URIs like http://blog.bogojoker.com and mailto:email are automatically turned into links and formatted nicely.
  • Bold, Italics, and Typewriter Text can be quickly formated much like Markdown:

    _italic_ or <em>italic</em>
    *bold* or <b>bold</b>
    +typewriter+ or <tt>typewriter</tt>

  • Code is displayed if each line
  • Tabular Labeled List, like the Contact information, are formatted like:

    label:: description 1
    label2:: both descriptions will line up

  • Formatting source code is like Markdown. The code that you want formatted must be indented with a few spaces. As long as the indention is maintained the text will display as source code in the HTML documentation.
  • Formatting lists is again like Markdown. Just use *’s or -‘s and they will turn into bullet points. For numbered lists just use numbers followed by a dot and they will be formatted automatically.

Final Notes on `rdoc` itself

When I created the final documentation above I used a few of rdoc’s command line switches to customize the output. What I actually used was:

shell> rdoc --title="Dice Documentation" --line-numbers --tab-width=2

The title switch changed the <title> for the documentation page, and the other two deal with formatting the htmlized source code that RDoc shows when you click on the function name to view the source in the documentation. There are plenty of command line switches. To view the full list do:

shell> rdoc --help

A few useful switches are “–ri” to create ri documentation so you can access your classes from the command line. Also you can output to several formats. For instance you can make a PDF using “–format=texinfo” then using `texi2pdf` on the texinfo file. The PDF doesn’t look that bad, here is my example as a PDF.

NOTE: Finding the generators was tricky. I had to check out the rdoc source code and find the different generators. If anyone knows an easier way to check what generators are available, please let me know.

I hope this helps some people using RDoc for their classes. Enjoy.

AtomPub Overview and Curl Reference

Not long ago I had to learn about the Atom Publishing Protocol for my job. I spent about a week learning on my own time all about XML, AtomPub, and even the basics of HTTP. After that week I decided to write down my own personal overview and example code to try and “visually” explain AtomPub as best I could. The result was (and is):

My Visual Guide to AtomPub

Now keep in mind that I wrote that only a few weeks after learning it. The process of writing that guide forced myself to study it in greater detail than normal, actually run tests, and produce realistic output and examples. I know its not perfect (I’d probably be slaughtered for my definition of REST) but over time I’ll be happy to improve and update it. I think the design really improves the content making it readable, fun, and useful to refer to.

I’m linking to it now because I’ve done a number of projects like this (my Unix Tutorial) because I like sites that are strictly focused on one thing and do that one thing very well. I’ll probably spend a little bit of time on remainder of my break from school by cleaning up these small “brain dump” websites. I wanted to make sure they were mentioned and linked to from my blog. Clearly they will be of no use to anyone if they are never linked to!

I decided to include a small `curl` reference on my AtomPub guide. This is because its a very nice tool when working with HTTP requests and an overall generally useful shell program. I think people might find the curl reference useful.

I hope you enjoy this. I’ll be linking to these occasionally as they grow.

Regular Expression Examples

A number of visitors have come to my website using the search terms regex replace. So I thought I would devote an entire article on how to use regular expressions to do a find and replace on a string in some popular languages. Example code is always attractive so lets get to the point! There is example code in Ruby, Perl, Python, Javascript, and Java. [If you have other suggestions let me know or show me in your comments!]

All of the basic examples:

  1. put the string “one two three” into a variable
  2. then use a regular expression and a native function to the language to
  3. transform the original variable’s value to the new string “one 2 three”

Click Here For the Basic Examples

Now you may recognize that in the above examples that regular expressions where not even needed. All we did was find and replace a string and that simple task can be done without regular expressions! So here is a more advanced example without the training wheels.

In the advanced examples:

  1. the string “a1b2c3” [may not need to be stored in a variable] is
  2. manipulated by a [globally replacing] regular expression
  3. resulting in “a11b22c33” [where all numbers, but not letters, are duplicated]
  4. which is stored in a variable

Click Here To Toggle the Advanced Examples

Ruby:

result = 'a1b2c3'.gsub( /(\d)/, '\1\1' )

Perl:

$result = 'a1b2c3';
$result =~ s/(\d)/\1\1/g;

Python:

import re
result = re.sub(r'(\d)', '\\1\\1', 'a1b2c3')

Javascript:

var result = 'a1b2c3'.replace( /(\d)/g, "$1$1" );

Java:

public class RegexTest {
  public static void main(String args[]) {
    String str = "a1b2c3";
    String result = str.replaceAll("(\\d)", "$1$1");
  }
}

Pay strict attention to the number of backslashes required in python, the $1 used in Java and Javascript (however these are also global variables found in Ruby and Perl), and the trailing /g option required in Perl and Javascript for the global replacement. Each language has its own little spin on things.

I hope this helped answer your questions on regular expressions. In case I whet your appetite on Regular Expressions I can point you to my Introductory Article on Regular Expressions and my command line utility rr that allows you to run Ruby regular expression find and replace commands on files, standard input, and even piped input.

search