My Blog Is

tag cloud

That right there is the only tag cloud that I have ever found interesting. The cool part is that you can generate those tag clouds and with a lot more jazz at this really great service called Wordle.

How I Did It

For those interested on how I got all the tags for the above cloud (with weight) I decided to grab them myself. Here were the simple 4 steps I took.

How to Build the Tag List

The SQL grabs all of the tags used on blog posts (so not categories) at least once. It also grabs the number of times the tag appears on a post. Using a simple ruby script I take the tags (all single words) and multiply them by the number of times they appear. So a “ruby 3” turns into “ruby ruby ruby.” Once that multiplication has taken place I can just plop them into Wordle and make the magic happen!

More clouds here!

SotD – Remove the First N Characters From a Line

I was asked today to remove the first 6 characters from every line in a document. It was known that each line contained at least 6 characters. For me, that means a find a replace. Here is how I did it (in a number of different ways). Can you think of any other ways to do it?

cut -c7-
rr ^.{6}
colrm 1 6
rr s/^.{6}//
sed 's/^.\{6\}//'
perl -pe 's/^.{6}//'
ruby -pe 'sub /^.{6}/,""'
gawk 'BEGIN{FIELDWIDTHS="6 999"}{print$2}'

I was pretty happy with rr taking the prize for least characters. I do want to point out that since rr defaults to multi-line and global replacements the “^” is required. If you wanted to remove the “^” (like you could with sed, perl, and ruby) then you would have to use the “–line” or “-l” and “–notglobal” or “-ng” options. So a single character replaces 7! If you want to grab rr just do:

$ sudo gem install regex_replace

Also, here is a link to gawk – GNU’s awk. This has the very nice FIELDWIDTHS variable, which is extremely useful!

Also keep in mind that the regular expressions above ^.{6} only work because the lines were known to have at least 6 characters. If we didn’t know that we would have had to use something like ^.{0,6} to allow up to 6 characters (and even that could be ^.{1.6} ignoring blank lines which can’t change). So again the requirements for this challenge were important.

Update!

I came across a neat little unix utility I didn’t know about called “cut”. As you can see, the new command tops the list quite handily too. “cut -c 7-” is actually cutting from each line the 7th character onwards (counting starts from 1). In this case once it spits what it cuts out to stdout; so it leaves behind those first 6 characters and therefore accomplishes our goal. In the above list I removed the optional space after “-c” to make it just a tad shorter. Pretty neat.

Then I found “colrm” which is the most straightforward. This one wins in simplicity. No hacks, just straightforward does exactly what you think it does. Very cool.

Double Update!

I figured since quite a bit of my rr usage has an empty string for the second argument, I figured it would be okay to throw that in as the default case. So now there is a third usage for rr, which is great for pipes, that just strips something out. That brings rr back up to a tie for first place. Very cool.

Why CDATA Matters in XML

You’ve seen it before, but you may not know what it means. Wikipedia describes CDATA as meaning “Character Data” which makes sense. w3school goes one step further and points out that this text should not be parsed by the XML Parser. The general idea is when you want to display straight textual data, without needing to encode characters or wanting them interpreted by the parser, you can just wrap that data inside of a CDATA tag.

Needless to say this is clearly the ugliest tag currently in existence (lets leave room for the future though):

<![CDATA[ ... ]]>

I promised to tell you why it mattered

Yes, words in the title become a promise. You can hold me to that in the future. Why does CDATA matter? Well, I’ll actually side-step the question for a minute and show you what looks like a perfectly fine looking XHTML document (keep in mind that XHTML is a subset of XML):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
  <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
  <title>Simple Example</title>
</head>
<body>
  <h1>Welcome</h1>
  <script type="text/javascript">
    document.write('<p>Hello, World</p>');
  </script>
</body>
</html>

Looks simple enough. We can ignore the fact that its not really the best way of doing things, but who cares… its a Hello World example right? Well, technically this is not Valid XHTML. The validator shows we’ve got a single error:

Line 11, Column 20: document type does not allow element "p" here.
document.write('<p>Hello, World</p>');

The element named above was found in a context where it is not allowed. This could mean that you have incorrectly nested elements — such as a “style” element in the “body” section instead of inside “head” — or two elements that overlap (which is not allowed).

Well, the validator hints at the cause of the error, but it is hard to understand unless you really know your XML! We are inside a “<script>” tag, we’ve got some text, and all of a sudden a “<p>” pops up! XML Parsers don’t care that its in the middle of a string, it sees another tag and that tag doesn’t make sense.

So, here is why CDATA matters. You want your Javascript to be left alone by the XML Parser. In HTML, Javascript is interpreted as text, so it can just be left alone as plain text by wrapping it in a CDATA tag:

<script type="text/javascript">
  <![CDATA[
    document.write('<p>Hello, World</p>');
  ]]>
</script>

Don’t get excited yet. Yes, that passes the W3C validator but the javascript fails to run. Why? Well, I actually haven’t got a clue. My guess would be its not stripped out of the javascript and invalidates the javascript when it tries to run. In any event, lets check our steps… Is it valid xml? Check. Is it being served as xml? Well, actually I don’t think so.

Here is a nice resource that talks about Understanding HTML, XML, and XHTML. If you haven’t read that article either read it now or once you’re done here; its important, no matter how old it is. To pull a quote:

to really send xhtml, an xhtml page must be served as xml and therefore have one of the following Content-Type’s (text/xml, application/xml, application/xhtml+xml) to a browser.

This is a simple one liner in php. I added the following code to the top of the page, and resent it to my browser:

<?php header('Content-type: application/xhtml+xml'); ?>

Doh. Well, we’ve covered all the bases and it still doesn’t work. Its valid, its sent as xhtml, now everything is left up to the browser and it doesn’t seem to work. If you know why, drop a comment. Again my suspicion is that the browsers don’t properly handle XHTML and CDATA completely. However, there is a pretty nice trick that we can make use of to get this to work and validate (even sent as text/html):

<script type="text/javascript">
  // <![CDATA[
    document.write('<p>Hello, World</p>');
  // ]]>
</script>

Well there you go. A 100% valid page, that runs in all browsers, that properly tells the XML Parser “hey, leave these characters alone” and it works. The problem is identifying when this is necessary. For most people, having the original page, which rendered correctly but didn’t validate would be enough. Browser developers are watching out for you and working around mistakes in HTML and XHTML. However, that isn’t always the case.

Real World Example

Here I’ll pull a real world example. Some XML Specifications allow the ability to send XML under a different namespace as content inside of an existing XML tag. Some do so in a “psuedo” way. Take a look at the Atom Publishing Protocol (commonly referred to as Atompub or APP for short). Here is a snippet from the RFC describing the Atom Syndication Format, specifically the structure for an atom:title tag with type=”html” within of an atom:entry:

...
<title type="html">
  Less: &lt;em> &amp;lt; &lt;/em>
</title>
...

If the value of "type" is "html", the content of the Text construct MUST NOT contain child elements and SHOULD be suitable for handling as HTML [HTML]. Any markup within MUST be escaped; for example, "<br>" as "&lt;br>". HTML markup within SHOULD be such that it could validly appear directly within an HTML <DIV> element, after unescaping. Atom Processors that display such content MAY use that markup to aid in its display.

Okay, sorry for the long setup, but we have finally arrived at the point of this post. That type=”html” element cannot have child elements. The XML parser will identify child elements based on a “<" character. Assuming whatever project you would be working on takes that input from the user that means you would have to pass it through a filter, encoding HTML characters like ampersands, less than and greater than signs, the list goes on. That operation is expensive and may even cause problems in itself. I ran into a situation just the other day where an ampersand for an encoded character (like the & inside of an &amp;) was causing errors by itself. The solution is to make the XML Parsers ignore the data by wrapping it in a CDATA tag. Lets take the above example and show how it could be done much easier:

<title type=html>
  <![CDATA[ Less: <em> &lt; </em> ]]>
</title>

Easier to understand? You betcha. Less costly for developers? Of course. So CDATA is there to help, not hurt. Don’t look at its ugly face and think of it as a hack, look deeper and you will see its purpose and power. Okay, I admit that sounds a little corny, but it could have been worse.

Side note, Javascript

As managers everywhere throw out buzz words like AJAX and encourage you to participate in new web 2.0 project ideas you’re going to end up sending and receiving XML requests with a server using the good old XMLHttpRequest object. Well if encoding isn’t enough of a problem (and I’m still wrapping my head around it) you might get struck with a problem like the above case and want to make use of your knowledge with CDATA.

Well, you’re in luck. xmlDocument.createCDATASection(…) is part of the Level 2 DOM ECMAScript Spec. Use it just like a createTextNode():

//
// Create an atom:title element with html content
// assume xmlDocument is already an XML Document object
// and entry is an atom:entry element in that document
//
// <entry>
//   <title type="html"><![CDATA[<em> &lt; </em>]]></title>
// <entry>
//
ATOM_NS = "http://www.w3.org/2005/Atom";
var node = xmlDocument.createElementNS(ATOM_NS, "title");
node.setAttribute("type", "html");
var cdata = xmlDocument.createCDATASection("<em> &lt; </em>");
node.appendChild(cdata);
entry.appendChild(node);

Now all I have to learn is encoding, and how each browser deals with it differently. That is an entirely new realm that I don’t expect to cover in single week, but I’ll report back with my findings. Until then, don’t get caught up on the little things!

Installing Ruby 1.8.7 on Mac OS X 10.5.3

Yah, this week is all about solving problems. I run into a lot of them, but it gives me something to blog about. But hey, if I can spend 20 minutes writing down a solution to a couple hour problem that I had, then I could save you a few hours too.

Problem

Trying to install Ruby 1.8.7 (or even 1.9) on my Leopard machine produces warning messages on “make” that look like this:

gcc -I. -I../../.ext/include/i686-darwin9.1.0 -I../.././include
-I../.././ext/readline -DRUBY_EXTCONF_H=\"extconf.h\"    -fno-common
-g -O2 -pipe -fno-common   -o readline.o -c readline.c
readline.c: In function 'filename_completion_proc_call':
readline.c:659: error: 'filename_completion_function' undeclared
(first use in this function)
readline.c:659: error: (Each undeclared identifier is reported only once
readline.c:659: error: for each function it appears in.)
readline.c:659: warning: assignment makes pointer from integer without a 
cast
readline.c: In function 'username_completion_proc_call':
readline.c:684: error: 'username_completion_function' undeclared
(first use in this function)
readline.c:684: warning: assignment makes pointer from integer without a 
cast
make[1]: *** [readline.o] Error 1
make: *** [all] Error 1

Solution

I originally came across this problem right around new years and eventually found a solution thanks to Han Kessels on the Ruby Forums. It involved the following steps:

  1. Download the newest version of readline (version 5.2 at the time of writing) at GNU.org. You may have to apply the following patch. Thanks to Michael Biven for showing a nice simple way to do this from the command line:

    $ curl -O ftp://ftp.gnu.org/gnu/readline/readline-5.2.tar.gz
    $ tar xzvf readline-5.2.tar.gz
    $ cd readline-5.2
    $ curl -O http://ftp.gnu.org/gnu/readline/readline-5.2-patches/readline52-012
    $ patch -p0 < readline52-012
    $ ./configure --prefix=/usr/local
    $ make
    $ sudo make install
    $ cd ..
  2. Now you can download the 1.8.7 version of ruby, and install it but you should point to the version of readline that you just installed (to /usr/local) like so:

    $ curl -O ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.7.tar.gz
    $ tar xzvf ruby-1.8.7.tar.gz 
    $ cd ruby-1.8.7
    $ ./configure --prefix=/usr/local/ruby1.8.7 --with-readline-dir=/usr/local
    $ make
    $ sudo make install
    $ cd /usr/local/ruby1.8.7/bin
    $ ./ruby -v
    ruby 1.8.7 (2008-05-31 patchlevel 0) [i686-darwin9.3.0]

The End Result

If you followed my terminal commands from above you should have a working Ruby 1.8.7 version installed in your "/usr/local/ruby1.8.7/bin". If you wanted ruby in a different directory then change the "--prefix=/usr/local/ruby1.8.7" option on ./configure to instead point to the directory you wanted it.

Why put it in its own directory at all? Well, I happen to have multiple version of ruby. By default I have 1.8.6, I have this version of 1.8.7, I even have a 1.9 version I installed a while ago. Its easy to remember where each is if the directory they reside in clearly states the name. This way I can test a script, maybe even run benchmarks, in each version of ruby. However, you might not want to take this extra measure.

A “blog.domain.com” That Works

I seem to have had some really bad luck doing what I thought would be very simple. I finally arrived at a solution that I really don’t feel is optimal, but it works and I’ll monitor it for a while and hope that it continues working. (You’ll know if something ended up changing).

Problem

I own “alpha.com” and I want to create a “blog.alpha.com” that actually pulls pages from “alpha.com/blog/” on the web server but the user should still see “blog.alpha.com” in their browser’s address bar.

How I did it with this WordPress 2.5.1 Blog

Hopefully if you look up at your address bar right now you see blog.bogojoker.com. If not, then this failed and completely ignore me! I ended up having to do three things:

  1. I Setup redirection for a subdomain. I used my hosting company’s Control Panel to create “blog.bogojoker.com” and point it to “http://bogojoker.com/blog/”. Please note this change does not start working immediately. It took a few hours for the redirection to be broadcast over DNS Name Servers. Once it did work I went to the next steps. I wouldn’t recommend going to the next steps until this starts working because I believe they depend on this!

  2. I logged into the WordPress Admin Dashboard. I went to “Settings” and put “http://blog.bogojoker.com” into both the “WordPress address” and “Blog address” fields. This put my browser into an infinite loop which I fixed in the final step.

  3. I FTP’d into my server, went to the “public_html/blog/” directory and commented out these lines from the hidden “.htaccess” file: (Note that adding a “#” to the start of the line comments that line out)

    #RewriteCond %{HTTP_HOST} ^blog.bogojoker.com$ [OR]
    #RewriteCond %{HTTP_HOST} ^www.blog.bogojoker.com$
    #RewriteRule ^(.*)$ http://bogojoker.com/blog/ [R=301,L]

Voila. That is how I did it. I ended up spending far too much time on what I considered “better” solutions that didn’t seem to work. In doing so I found out some things, but I can’t really be sure what I learned anything valuable from it.

Troubleshooting Tricks

If you end up having the same trouble, if you want to experiment on your own, or if you tried something that totally failed hopefully this can help.

At one point during the week everything crashed on me. Not only did the website not redirect properly but the direct paths weren’t working. I chalked it up to me turning the redirection (step 1 above) off. I stumbled upon a temporary solution to fix WordPress in the few hours it would take for that redirection to be reborn through DNS. It involves a little familiarity with mysql or phpMyAdmin and knowing your database schemas.

For those with phpMyAdmin access:

  1. Log into your phpMyAdmin

  2. Select the WordPress’s database. Most servers have username_wp### or the like, this you should recognize pretty easily or can find from your Control Panel

  3. Browse the “wp_options” table

  4. Modify the “siteurl” value to be the “real” path to the blog subdirectory, NOT what you want it to be but where it really is. For example “http://bogojoker.com/blog/” for me.

  5. Now log into your WordPress by manually going to your domains wp-admin page like so: “http://yourDomainHere.com/blog/wp-admin/”

  6. Go to the “settings” tab and change both urls to the url mentioned in step 4.

That should bring your website back up, albeit with the uglier URL structure. For those without phpMyAdmin access the you should have some form of access to your database, be it command line using the “mysql” command or via another graphical interface. Just try to follow the steps above as closely as possible.

I hope this helps you out a little if you want to accomplish the same thing. Even better though, come up with a better solution and let me know. I dreamed of solving this in a couple minutes and ended up spending a little over 3 hours taking a number of approaches. So I figured, what better to do but write about it!

JavaScript Sort an Array of Objects

I ran into an interesting problem today. I had an array of objects that I wanted sorted on a certain property. My obvious thought didn’t work! (Update: I got a comment below from Peter Michaux who points out a nicer solution, it is included here:)

// Array of Objects
var obj_arr = [ { age: 21, name: "Larry" },
                { age: 34, name: "Curly" },
                { age: 10, name: "Moe" } ];

// This doesn't work!
obj_arr.sort( function(a,b) { return a.name < b.name; });

// This does work! (Peter's update, very fast)
obj_arr1.sort(function(a,b) { return a.name < b.name ? -1 :
                                     a.name > b.name ?  1 : 0; });

That kind of frustrated me. Sorting is one of those things I expect to be available in all languages. I don’t want to have to write a sorting algorithm every time I need to sort. So I looked into things, pulled up a Javascript Quicksort Algorithm and manipulated it to support any compare function.

Now that I have the freedom to truly write a compare function that works for objects! I also changed around certain parts of the code I found online to actually extend the Array class and make the extra functions hidden. Take a look at the sample usage:

// Defaults to (a<=b) sorting.  Great for numbers.
var arr = [1234, 2346, 21234, 3456, 32134, 3456, 1234, 2345, 23, 42523, 1234, 345];

// Object Array
var obj_arr = [ { age: 21, name: "Larry" },
                { age: 34, name: "Curly" },
                { age: 10, name: "Moe" } ];

arr.quick_sort();
// => [23, 345, 1234, 1234, 1234, 2345, 2346, 3456, 3456, 21234, 32134, 42523]

obj_arr.quick_sort(function(a,b) { return a.name < b.name });
// => Curly, Larry, Moe

obj_arr.quick_sort(function(a,b) { return a.age < b.age });
// => Moe (10), Larry (21), Curly (34)

For those who want to see the code be glad, its free. I carried the copyright with it but its rather loose. Grab the JavaScript Source Here! Enjoy:

Array.prototype.swap=function(a, b) {
  var tmp=this[a];
  this[a]=this[b];
  this[b]=tmp;
}

Array.prototype.quick_sort = function(compareFunction) {

  function partition(array, compareFunction, begin, end, pivot) {
    var piv = array[pivot];
    array.swap(pivot, end-1);
    var store = begin;
    for (var ix = begin; ix < end-1; ++ix) {
      if ( compareFunction(array[ix], piv) ) {
        array.swap(store, ix);
        ++store;
      }
    }
    array.swap(end-1, store);
    return store;
  }

  function qsort(array, compareFunction, begin, end) {
    if ( end-1 > begin ) {
      var pivot = begin + Math.floor(Math.random() * (end-begin));
      pivot = partition(array, compareFunction, begin, end, pivot);
      qsort(array, compareFunction, begin, pivot);
      qsort(array, compareFunction, pivot+1, end);
    }
  }

  if ( compareFunction == null ) {
    compareFunction = function(a,b) { return a<=b; };
  }
  qsort(this, compareFunction, 0, this.length);

}

Update

Peter Michaux pointed out something very important. The sort() function can be made to work if it returns numeric output (-1,0,1). His approach is far superior. Here was a benchmark I took:

var obj_arr1 = [];
var obj_arr2 = [];
var filler = [ { age: 21, name: "Larry" },
               { age: 34, name: "Curly" },
               { age: 10, name: "Moe" } ];
for (var i=0; i<5000; i++) {
  rand = Math.floor( Math.random() * 3 );
  obj_arr1.push( filler[rand] );
  obj_arr2.push( filler[rand] );
}

var s = new Date();
obj_arr1.sort(function(a,b) { return a.name < b.name ? -1 : a.name > b.name ? 1 : 0; });
var e = new Date();
console.log(e.getTime()-s.getTime()); // => 75 ms

s = new Date();
obj_arr2.quick_sort(function(a,b) { return a.name < b.name });
e = new Date();
console.log(e.getTime()-s.getTime()); //  => 4444 ms

That shows drastic differences for arrays as large as 5000 elements (with not too random data). 75 ms versus 4444 ms (over 4 seconds). Doing the math: (4444/75) => 59.253 times better! Moral of the story, don’t rush into thinking something doesn’t exist!

So if that’s the way to do it, then I want to make it easier on me. My arrays are generally going to be under 100 in size, and at such a size building a function dynamically instead of writing a custom function works just about as well (although if you were using objects, polymorphism and a compare function would be the best way to go). Here is a simple function I can use to more quickly build compare functions in order to ascend sort an array on multiple properties!

function buildCompareFunction(arr) {
  if (arr && arr.length > 0) {
    return function(a,b) {
      var asub, bsub, prop;
      for (var i=0; i<arr.length; i++) {
        prop = arr[i];
        asub = a[prop];
        bsub = b[prop];
        if ( asub < bsub )
          return -1;
        if ( asub > bsub )
          return 1;
      }
      return 0;
    }
  } else {
    return function(a,b) { return a<=b; };
  }
}

Sample usage would be:

var obj_arr = [
  { name: 'Joe',   age: 20 },
  { name: 'Joe',   age: 10 },
  { name: 'Joe',   age: 30 },
  { name: 'Joe',   age: 40 },
  { name: 'Joe',   age: 20 },
  { name: 'Joe',   age: 15 },
  { name: 'Joe',   age: 35 },
  { name: 'Joe',   age: 25 },
  { name: 'Bill',  age: 5 },
  { name: 'Barry', age: 20 },
  { name: 'Paul',  age: 20 },
  { name: 'Peter', age: 1 },
  { name: 'Smith', age: 25 },
  { name: 'Kary',  age: 30 }
];

obj_arr.sort( buildCompareFunction(['name','age']) );

Firebug Feature – Open With Editor

This one was news to me but it just made my day (and not a minute too late)!. I used to have so much trouble copying and pasting code from the Firebug (now Firefox 3 compatible) console. The paste used to have no formatting or indention and sometimes there was line numbers… Well one more problem has been solved. Check this out:

Firebug\'s \

Checking the changelogs in the repository for Firebug shows that it was included in ReleaseNotes_1.1.txt. I can’t believe I missed it!

Now all thats left is copying and pasting from the HTML’s Style textarea on the right. Still, I think this is a nice small step forward.

search