Debugging HTTP Headers

While writing the skreemr shell the other week I ran into an issue that required me to dig down to one of the lowest levels of web communication… HTTP Headers. I’m always shocked to learn that so many web developers don’t know much about headers. So I thought I would try to reenforce the point that HTTP Headers matter… and that knowing your stuff can help you debug and solve problems. Here is the story of my real world example.

I’m not going to go over HTTP Headers. thats already been done. Instead I’m going to focus on debugging and working with them a little bit. I’m going to assume you have already the general concepts.

The Problem

I wanted to add pagination to the skreemr shell. So you could run a search, then get the next page of results, etc. So a little experimentation with skreemr in my browser produced the following URLs:

http://skreemr.com/results.jsp?q=test
http://skreemr.com/results.jsp?q=test&l=10&s=10
http://skreemr.com/results.jsp?q=test&l=10&s=20

A simple pattern! “q” is the query string, “l” is the number per page, “s” is the number to start at, indexed from 0. So that last URL would produce results 21-30. This all worked well in my browsers, but it wasn’t working in Ruby:

require 'open-uri'

# Grab the pages and read the content
str1 = open( 'http://skreemr.com/results.jsp?q=test'      ).read  # 1-10
str2 = open( 'http://skreemr.com/results.jsp?q=test&s=10' ).read  # 11-20

#=> Should print false... but its printing true!
puts str1==str2

What this was saying was that the content being returned from both of those urls is EXACTLY the same. That couldn’t be… could it?

More Investigation

At this point I thought it was a problem in Ruby. I figured Ruby was doing some caching in the background that I was going to have to disable or work around. (I now know this is not true, but that was my first guess). To test that hypothesis I turned to my trusty friend curl and checked to see if that showed the proper behavior:

shell> curl 'http://skreemr.com/results.jsp?q=test'      -o 1.html
shell> curl 'http://skreemr.com/results.jsp?q=test&s=10' -o 2.html
shell> diff -q -s 1.html 2.html
Files 1.html and 2.html are identical

What!?! That stunned me. Curl was getting the exact same results as Ruby. I took a look at the html files, and indeed 2.html, which should have contained results 11-20 held 1-10. I opened both urls in my browser… they showed the correct results. Something weird was happening!

Take A Step Back

At this point you’ve got to know what is happening. Curl and Ruby (using Ruby’s Net:HTTP under the hood) are just making a simple GET request. Both my browsers Safari and Firefox are sending far more then just a GET request. They are sending a bunch of other headers. Lets take a look at what Firebug says Firefox sent:

firebug

Thats a mighty long list of headers! Its entirely possible that one of those might be influencing the server. Onto the drama!

Time to Act – Literally

The only difference between what curl sent in its request and what Firefox was sending is that header information and possibly any information contained in those cookies (unlikely in this case). So lets make curl act as if it is Firefox by send the same headers with curl! I took a quick peek at my curl reference for the proper switches/formatting and I was ready:

shell> curl 'http://skreemr.com/results.jsp?q=test' -o 1.html
shell> curl 'http://skreemr.com/results.jsp?q=test&s=10'            \
            -H 'Host: skreemr.com'                                  \
            -H 'User-Agent: Mozilla/5.0 (...) Firefox/3.0.6'        \
            -H 'Accept: text/html,application/xml;q=0.9,*/*;q=0.8'  \
            -H 'Accept-Language: en-us,en;q=0.5'                    \
            -H 'Accept-Encoding: gzip,deflate'                      \
            -H 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7'     \
            -H 'Keep-Alive: 300'                                    \
            -H 'Connection: keep-alive'                             \
            -H 'Cache-Control: max-age=0'                           \
            -o 2.html
shell> diff -q -s 1.html 2.html 
Files 1.html and 2.html differ

Jackpot! Some subset of those headers is indeed fixing my problem, because now it properly fetched the second page of results! Translating that back into Ruby works as well:

require 'open-uri'

# Headers
HEADERS = {
  'Host'            => 'skreemr.com',
  'User-Agent'      => 'Mozilla/5.0 (...) Firefox/3.0.6',
  'Accept'          => 'text/html,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Language' => 'en-us,en;q=0.5',
  'Accept-Encoding' => 'gzip,deflate',
  'Accept-Charset'  => 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
  'Keep-Alive'      => '300',
  'Connection'      => 'keep-alive',
  'Cache-Control'   => 'max-age=0'
}

# Get the Pages
str1 = open( 'http://skreemr.com/results.jsp?q=test', HEADERS ).read
str2 = open( 'http://skreemr.com/results.jsp?q=test&s=10', HEADERS ).read

# Correctly Prints false
puts str1==str2

Problem Solved

Sure, we have a solution but its not pretty. How much is really needed? Finding that out is just a matter of eliminating the headers that do nothing. Remove one, test, remove another, test, remove another test… As long as it still works after you’ve removed an individual header you know that header wasn’t needed. It took me a few minutes but I narrowed it down to just a single header:

MAGIC = { 'Accept-Language' => 'en-us' }

This was a rather unique problem. It is rare for me to have to dip down to the HTTP Headers to see what is actually happening for such a high level problem. However, as this problem shows, its important to know what is going on under the hood. If I didn’t know about headers, I would not have been able to solve it.

2 Responses

1

adam on March 12, 2010 at 4:20 pm  #

I want to suggest this free http testing tool using this tool you can send different http headers and view the Response header from any URL.
http://soft-net.net/SendHTTPTool.aspx
Best
Adam

2

Xavi on August 9, 2010 at 1:51 pm  #

Thanks! I couldn’t figure this out for the life of me.

Add a Comment

search