Web Analytics Taken to the Next Level

I came up with a neat idea the other night. Using localStorage and sessionStorage you could theoretically monitor the number of tabs or windows a visitor has opened for your site. As far as I know, this capability has never before been possible. Well, now it is.

Check out this crude example.

Once you open the monitor, leave that tab/window put. Use the “Spawn Tab” link to create new tabs and windows. The monitor will be notified and display some simple debug. There are 10 second updates per tab/window so that when they close the monitor can detect it. The monitor will detect a close within 15 seconds of the tab/window closing and will display the total time the tab/window plus or minus 10 seconds. Correct values are maintained as the tab/windows browse across pages as long as they stay on the domain! Just about everything you’d want or need.

Again, I mentioned this is rather crude. The fact that the monitor tab remains open is only due to the fact that I wanted to prototype the idea. The majority of the state is stored in localStorage, and each tab/window maintains a single identifier in its sessionStorage to remind the tab/window what id it was while it navigates to multiple pages. Because everything is stored in the storage this system has the capability to become completely distributed. Meaning no “monitor” tab is necessary, and the scripts can determine, and monitor, on their own the existence of all other tabs. Thus, this would be a viable option for the next level of web analytics.

As cool as this is, I don’t think it will provide too much value to the analytics. For the first time webmasters will be able to know how many windows or tabs a visitor opens (and to what pages they open). The webmaster will know more about how its user’s use the website, but I don’t think this statistic will be a game changer. Who knows!

So, how does it work? Very simple. Each tab includes the client.js code to handle updating the localStorage and maintaining its own “tab_id” in sessionStorage. Data it maintains can be whatever you want, I went with some simple information such as its start time, current url, and latest keepalive:

localStorage values

The upkeep for a Tab Client is to restore their session information when you navigate to any new page:

// Create or Restore tab_id
var myTabId = sessionStorage.tab;
if ( myTabId === undefined ) {
  var tabs = localStorage.tabs;
  if ( tabs === undefined ) {
    myTabId = 0;
    localStorage.tabs = "0";
  } else {
    var largest = parseInt( tabs.split(/,/).pop(), 10 );
    myTabId = largest+1;
    localStorage.tabs += "," + myTabId;
  }
  sessionStorage.tab = myTabId;
}

And to perform its keepalives:

// Update the Latest Timestamp
function setLatest() {
  var key = 'tab'+myTabId+'_latest';
  localStorage.setItem(key, +(new Date())); 
}

// Update Status every 10 seconds
window.setInterval(setLatest,10000);
setLatest();

I put a few more convenience functions in there to help it update these localStorage keys, and communicate with the monitor which was crudely done through localStorage. That is explained next.

The Tab Monitor as it stands right now receives messages through localStorage’s “storage” event. It also checks all the tab’s “lastest” keepalives to make sure they didn’t pass their 10 second limit. In the case of a tab being closed, it will remove references to that tab and output an approximation of the time the tab was open:

// Listener - receive messages from tabs
window.addEventListener('storage', function(e) {
  if ( e.storageArea === localStorage ) {
    if ( e.key == "tab_msg" ) {
      console.log( e.newValue );
      addMsg( e.newValue ); // Appends to the page
    }
  }
});

// Purger - clean out tabs that died for 15 seconds
window.setInterval(function() {
  console.log("purging");
  var now = +(new Date()),
      tabs = localStorage.tabs;
  if ( tabs !== undefined ) {
    var toRemove = [], toKeep = [];
    tabs = tabs.split(/,/);
    for (var i=0, len=tabs.length; i<len; i++) {
      var tabId = tabs[i],
          tabLatest = parseInt( localStorage.getItem("tab"+tabId+"_latest") );
      ( (now-tabLatest)>=15000 ? toRemove : toKeep ).push(tabId);
    }
    localStorage.tabs = toKeep.join(',');
    for (var i=0, len=toRemove.length; i<len; i++) {
      var tabId = toRemove[i],
          tabLatest = parseInt( localStorage.getItem("tab"+tabId+"_latest") ),
          tabStart = parseInt( localStorage.getItem("tab"+tabId+"_start") ),
          time = (tabLatest-tabStart)/1000;
      addMsg( "Tab " + tabId + " Closed after " + time + " seconds!" );
    }
  }
}, 5000);

This took a little under an hour to get working. There are still minor issues that I didn’t attempt to resolve. However, if there is interest this could be developed into a completely distributed peer-to-peer communication between tabs/windows on a single domain. However, a little warning. Web Storage is not set in stone. Not all browsers have implemented it and the specification is subject to change at any minute. There has been some rather heated debate on the subject of Web Storage recently, with good reason. All I know is that when its settled, this functionality will continue to exist!

Let me know what you think.

Handling the tab key in a <textarea>

Traversing through input elements with the tab key is important for accessibility reasons. However, every once in a while you come across a situation where traversal isn’t really important. Instead, you want the tab key to actually do something for you. Even still, you may want to do something fancy with the tab key. Wether its replacing it with spaces or something else.

I found an interesting website today that had an interesting idea. You could run some test code on the page to test their library. Their instructions said, “push tab to evaluate the code.” Sure enough you could tell it was working “onblur” for the textarea. The problem with this was that when you pushed tab you lost focus.

I thought about it, and figured you could do a rather simple trick to run some code and refocus on the textarea. It goes a little like this:

window.addEventListener('load', function() {
  var textarea = document.getElementById('txt');
  textarea.addEventListener('keydown', function(e) {
    if (e.keyCode === 9) {
      e.preventDefault();
      e.stopPropagation();
      // operation goes here
    }
  });
});

Note that to get the actual character you have to get the character from the event. There are many ways to do it, keyCode, charCode, which, even keyIdentifier. You’ll have to mix things up to work across all browsers. Basically 9 is the code for the tab key. So when you get the tab key, it prevents the default behavior and allows you to execute whatever code you want: run some functions, eval some code, display something, ajax request, whatever you want. Simple. I think it would improve a few interfaces. Neat idea to make use of the tab key to perform a function.

You can check out this example of what I mean.

Freenode JSBot Command Line Script

So over my week break from college I spent a bunch of time in ##javascript learning and helping others with Javascript problems. This was to help me prepare for one of the projects that I’m working on (still to be announced).

One of the things I really liked in ##javascript was the freenode jsbot that could do all sorts of things. It was so useful in fact that I felt I had to have it for when I’m not using IRC. The website mentioned an API, so I dug in.

I wrote a command line jsbot script and added it to my ~/bin:

jsbot

A clip of the source code (yes its horrible… but its so compact!) shows how easy it is to work with JSON in Ruby. Just a few includes and its just as easy as Javascript, without the cross-site request issues:

#!/usr/bin/env ruby
# Author: Joseph Pecoraro
# Date: Friday March 6, 2009
# Description: Simple Interface for the
# really neat jsbot!

require 'rubygems'
require 'open-uri'
require 'json'
require 'cgi'
require File.dirname(__FILE__) + '/escape'


class JSBot

  JSON_PREFIX = 'http://fn-js.info/jsbot.xhr?'
  SITE_PREFIX = 'http://js.isite.net.au/jsbot?'

  def search(str)
    uri = url(str, JSON_PREFIX, "search=")
    JSON.parse( open( uri ).read )
  end

  def show(str)
    uri = url(str, JSON_PREFIX, "show=")
    JSON.parse( open( uri ).read )
  end

  def url(str, u=SITE_PREFIX, q='q=')
    u + q + CGI::escape(str)
  end

end

...

“Back and Forth” Greasemonkey For The Whole Web

Recently I wrote a Greasemonkey script to add keyboard shortcuts to The Big Picture, to improve on some of their already existing shortcuts. Once I started using some of the shortcuts I made I ended up wanting to use them all over the place at other blogs. This functionality is so tiny, but so useful, that I bundled it into its own script that runs on all web pages!

Grab it here:

//
// ==UserScript==
// @name          Back and Forth
// @namespace     http://blog.bogojoker.com
// @description   Keyboard Shortcut to Jump back and forth on a page. (esc key).
// @include       *
// @version       1.0 - Initial Version - Sunday February 15, 2009
// ==/UserScript==

(function() {
  
  // Global States
  var x = null;
  var y = null;

  // Add a new Global Key Listener for `esc`
  document.addEventListener("keypress", function(e) {
    if(!e) e=window.event;
    var key = e.keyCode ? e.keyCode : e.which;
    if ( key == 27 ) {
      var tempx = x;
      var tempy = y;
      x = Math.max(document.documentElement.scrollLeft, document.body.scrollLeft);
      y = Math.max(document.documentElement.scrollTop, document.body.scrollTop);
      if ( tempx != null ) { // First time it should be null
        window.scrollTo(tempx, tempy);
      }
    }
  }, true);

})();

On any webpage the first time you push the `esc` key position A gets stored. The next time you push `esc` position B gets stored and the browser jumps to position A. The next time you push it, A gets stored and you jump to B. So you always jump back to wherever you pushed `esc` last. Hence the name “back and forth.”

This is useful to me when I jump between comments and the content. When I’m reading a comment and I want to check back to the article, I just just push `esc` to save my position, go back to the article, and when I’m all set I just jump back to my saved position (the comments) with `esc`.

Short, Sweet, Simple: The Back and Forth Greasemonkey Script.

Stack Overflow – Edit Summary Quicklinks

I requested a feature to make the few edit summary suggestions clickable. I, like many developers, don’t normally let my keys leave the keyboard. However, this was one case where I felt making those suggestions of “corrected spelling” and “fixed grammar” should automatically be inserted.

Showing the Usage

Well the suggestion was declined. I can’t blame the team. Nobody upvoted the suggestion. But I felt strongly enough about it, and knew that it was very simple to implement that I whipped up a GreaseMonkey script to do it myself. The script runs like a charm and even adds a few extra suggestions to the original three. It handles formatting and commas all automatically, so don’t worry about a thing, just click. Enjoy!

Script to Add Edit Summary Quicklinks

Script to Prevent Blank Edit Summaries

UPDATE: Fixed to use keng’s URL and added just plain old stackoverflow.com without the beta sub-domain in preparation for a launch. Thanks keng!

DOUBLE UPDATE: Sam put out a “wanted ad” for a Greasemonkey script to prevent blank edit summaries. I whipped that script up and linked to it up above. Thanks Sam! Quick Demo

Why CDATA Matters in XML

You’ve seen it before, but you may not know what it means. Wikipedia describes CDATA as meaning “Character Data” which makes sense. w3school goes one step further and points out that this text should not be parsed by the XML Parser. The general idea is when you want to display straight textual data, without needing to encode characters or wanting them interpreted by the parser, you can just wrap that data inside of a CDATA tag.

Needless to say this is clearly the ugliest tag currently in existence (lets leave room for the future though):

<![CDATA[ ... ]]>

I promised to tell you why it mattered

Yes, words in the title become a promise. You can hold me to that in the future. Why does CDATA matter? Well, I’ll actually side-step the question for a minute and show you what looks like a perfectly fine looking XHTML document (keep in mind that XHTML is a subset of XML):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
  <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
  <title>Simple Example</title>
</head>
<body>
  <h1>Welcome</h1>
  <script type="text/javascript">
    document.write('<p>Hello, World</p>');
  </script>
</body>
</html>

Looks simple enough. We can ignore the fact that its not really the best way of doing things, but who cares… its a Hello World example right? Well, technically this is not Valid XHTML. The validator shows we’ve got a single error:

Line 11, Column 20: document type does not allow element "p" here.
document.write('<p>Hello, World</p>');

The element named above was found in a context where it is not allowed. This could mean that you have incorrectly nested elements — such as a “style” element in the “body” section instead of inside “head” — or two elements that overlap (which is not allowed).

Well, the validator hints at the cause of the error, but it is hard to understand unless you really know your XML! We are inside a “<script>” tag, we’ve got some text, and all of a sudden a “<p>” pops up! XML Parsers don’t care that its in the middle of a string, it sees another tag and that tag doesn’t make sense.

So, here is why CDATA matters. You want your Javascript to be left alone by the XML Parser. In HTML, Javascript is interpreted as text, so it can just be left alone as plain text by wrapping it in a CDATA tag:

<script type="text/javascript">
  <![CDATA[
    document.write('<p>Hello, World</p>');
  ]]>
</script>

Don’t get excited yet. Yes, that passes the W3C validator but the javascript fails to run. Why? Well, I actually haven’t got a clue. My guess would be its not stripped out of the javascript and invalidates the javascript when it tries to run. In any event, lets check our steps… Is it valid xml? Check. Is it being served as xml? Well, actually I don’t think so.

Here is a nice resource that talks about Understanding HTML, XML, and XHTML. If you haven’t read that article either read it now or once you’re done here; its important, no matter how old it is. To pull a quote:

to really send xhtml, an xhtml page must be served as xml and therefore have one of the following Content-Type’s (text/xml, application/xml, application/xhtml+xml) to a browser.

This is a simple one liner in php. I added the following code to the top of the page, and resent it to my browser:

<?php header('Content-type: application/xhtml+xml'); ?>

Doh. Well, we’ve covered all the bases and it still doesn’t work. Its valid, its sent as xhtml, now everything is left up to the browser and it doesn’t seem to work. If you know why, drop a comment. Again my suspicion is that the browsers don’t properly handle XHTML and CDATA completely. However, there is a pretty nice trick that we can make use of to get this to work and validate (even sent as text/html):

<script type="text/javascript">
  // <![CDATA[
    document.write('<p>Hello, World</p>');
  // ]]>
</script>

Well there you go. A 100% valid page, that runs in all browsers, that properly tells the XML Parser “hey, leave these characters alone” and it works. The problem is identifying when this is necessary. For most people, having the original page, which rendered correctly but didn’t validate would be enough. Browser developers are watching out for you and working around mistakes in HTML and XHTML. However, that isn’t always the case.

Real World Example

Here I’ll pull a real world example. Some XML Specifications allow the ability to send XML under a different namespace as content inside of an existing XML tag. Some do so in a “psuedo” way. Take a look at the Atom Publishing Protocol (commonly referred to as Atompub or APP for short). Here is a snippet from the RFC describing the Atom Syndication Format, specifically the structure for an atom:title tag with type=”html” within of an atom:entry:

...
<title type="html">
  Less: &lt;em> &amp;lt; &lt;/em>
</title>
...

If the value of "type" is "html", the content of the Text construct MUST NOT contain child elements and SHOULD be suitable for handling as HTML [HTML]. Any markup within MUST be escaped; for example, "<br>" as "&lt;br>". HTML markup within SHOULD be such that it could validly appear directly within an HTML <DIV> element, after unescaping. Atom Processors that display such content MAY use that markup to aid in its display.

Okay, sorry for the long setup, but we have finally arrived at the point of this post. That type=”html” element cannot have child elements. The XML parser will identify child elements based on a “<" character. Assuming whatever project you would be working on takes that input from the user that means you would have to pass it through a filter, encoding HTML characters like ampersands, less than and greater than signs, the list goes on. That operation is expensive and may even cause problems in itself. I ran into a situation just the other day where an ampersand for an encoded character (like the & inside of an &amp;) was causing errors by itself. The solution is to make the XML Parsers ignore the data by wrapping it in a CDATA tag. Lets take the above example and show how it could be done much easier:

<title type=html>
  <![CDATA[ Less: <em> &lt; </em> ]]>
</title>

Easier to understand? You betcha. Less costly for developers? Of course. So CDATA is there to help, not hurt. Don’t look at its ugly face and think of it as a hack, look deeper and you will see its purpose and power. Okay, I admit that sounds a little corny, but it could have been worse.

Side note, Javascript

As managers everywhere throw out buzz words like AJAX and encourage you to participate in new web 2.0 project ideas you’re going to end up sending and receiving XML requests with a server using the good old XMLHttpRequest object. Well if encoding isn’t enough of a problem (and I’m still wrapping my head around it) you might get struck with a problem like the above case and want to make use of your knowledge with CDATA.

Well, you’re in luck. xmlDocument.createCDATASection(…) is part of the Level 2 DOM ECMAScript Spec. Use it just like a createTextNode():

//
// Create an atom:title element with html content
// assume xmlDocument is already an XML Document object
// and entry is an atom:entry element in that document
//
// <entry>
//   <title type="html"><![CDATA[<em> &lt; </em>]]></title>
// <entry>
//
ATOM_NS = "http://www.w3.org/2005/Atom";
var node = xmlDocument.createElementNS(ATOM_NS, "title");
node.setAttribute("type", "html");
var cdata = xmlDocument.createCDATASection("<em> &lt; </em>");
node.appendChild(cdata);
entry.appendChild(node);

Now all I have to learn is encoding, and how each browser deals with it differently. That is an entirely new realm that I don’t expect to cover in single week, but I’ll report back with my findings. Until then, don’t get caught up on the little things!

JavaScript Sort an Array of Objects

I ran into an interesting problem today. I had an array of objects that I wanted sorted on a certain property. My obvious thought didn’t work! (Update: I got a comment below from Peter Michaux who points out a nicer solution, it is included here:)

// Array of Objects
var obj_arr = [ { age: 21, name: "Larry" },
                { age: 34, name: "Curly" },
                { age: 10, name: "Moe" } ];

// This doesn't work!
obj_arr.sort( function(a,b) { return a.name < b.name; });

// This does work! (Peter's update, very fast)
obj_arr1.sort(function(a,b) { return a.name < b.name ? -1 :
                                     a.name > b.name ?  1 : 0; });

That kind of frustrated me. Sorting is one of those things I expect to be available in all languages. I don’t want to have to write a sorting algorithm every time I need to sort. So I looked into things, pulled up a Javascript Quicksort Algorithm and manipulated it to support any compare function.

Now that I have the freedom to truly write a compare function that works for objects! I also changed around certain parts of the code I found online to actually extend the Array class and make the extra functions hidden. Take a look at the sample usage:

// Defaults to (a<=b) sorting.  Great for numbers.
var arr = [1234, 2346, 21234, 3456, 32134, 3456, 1234, 2345, 23, 42523, 1234, 345];

// Object Array
var obj_arr = [ { age: 21, name: "Larry" },
                { age: 34, name: "Curly" },
                { age: 10, name: "Moe" } ];

arr.quick_sort();
// => [23, 345, 1234, 1234, 1234, 2345, 2346, 3456, 3456, 21234, 32134, 42523]

obj_arr.quick_sort(function(a,b) { return a.name < b.name });
// => Curly, Larry, Moe

obj_arr.quick_sort(function(a,b) { return a.age < b.age });
// => Moe (10), Larry (21), Curly (34)

For those who want to see the code be glad, its free. I carried the copyright with it but its rather loose. Grab the JavaScript Source Here! Enjoy:

Array.prototype.swap=function(a, b) {
  var tmp=this[a];
  this[a]=this[b];
  this[b]=tmp;
}

Array.prototype.quick_sort = function(compareFunction) {

  function partition(array, compareFunction, begin, end, pivot) {
    var piv = array[pivot];
    array.swap(pivot, end-1);
    var store = begin;
    for (var ix = begin; ix < end-1; ++ix) {
      if ( compareFunction(array[ix], piv) ) {
        array.swap(store, ix);
        ++store;
      }
    }
    array.swap(end-1, store);
    return store;
  }

  function qsort(array, compareFunction, begin, end) {
    if ( end-1 > begin ) {
      var pivot = begin + Math.floor(Math.random() * (end-begin));
      pivot = partition(array, compareFunction, begin, end, pivot);
      qsort(array, compareFunction, begin, pivot);
      qsort(array, compareFunction, pivot+1, end);
    }
  }

  if ( compareFunction == null ) {
    compareFunction = function(a,b) { return a<=b; };
  }
  qsort(this, compareFunction, 0, this.length);

}

Update

Peter Michaux pointed out something very important. The sort() function can be made to work if it returns numeric output (-1,0,1). His approach is far superior. Here was a benchmark I took:

var obj_arr1 = [];
var obj_arr2 = [];
var filler = [ { age: 21, name: "Larry" },
               { age: 34, name: "Curly" },
               { age: 10, name: "Moe" } ];
for (var i=0; i<5000; i++) {
  rand = Math.floor( Math.random() * 3 );
  obj_arr1.push( filler[rand] );
  obj_arr2.push( filler[rand] );
}

var s = new Date();
obj_arr1.sort(function(a,b) { return a.name < b.name ? -1 : a.name > b.name ? 1 : 0; });
var e = new Date();
console.log(e.getTime()-s.getTime()); // => 75 ms

s = new Date();
obj_arr2.quick_sort(function(a,b) { return a.name < b.name });
e = new Date();
console.log(e.getTime()-s.getTime()); //  => 4444 ms

That shows drastic differences for arrays as large as 5000 elements (with not too random data). 75 ms versus 4444 ms (over 4 seconds). Doing the math: (4444/75) => 59.253 times better! Moral of the story, don’t rush into thinking something doesn’t exist!

So if that’s the way to do it, then I want to make it easier on me. My arrays are generally going to be under 100 in size, and at such a size building a function dynamically instead of writing a custom function works just about as well (although if you were using objects, polymorphism and a compare function would be the best way to go). Here is a simple function I can use to more quickly build compare functions in order to ascend sort an array on multiple properties!

function buildCompareFunction(arr) {
  if (arr && arr.length > 0) {
    return function(a,b) {
      var asub, bsub, prop;
      for (var i=0; i<arr.length; i++) {
        prop = arr[i];
        asub = a[prop];
        bsub = b[prop];
        if ( asub < bsub )
          return -1;
        if ( asub > bsub )
          return 1;
      }
      return 0;
    }
  } else {
    return function(a,b) { return a<=b; };
  }
}

Sample usage would be:

var obj_arr = [
  { name: 'Joe',   age: 20 },
  { name: 'Joe',   age: 10 },
  { name: 'Joe',   age: 30 },
  { name: 'Joe',   age: 40 },
  { name: 'Joe',   age: 20 },
  { name: 'Joe',   age: 15 },
  { name: 'Joe',   age: 35 },
  { name: 'Joe',   age: 25 },
  { name: 'Bill',  age: 5 },
  { name: 'Barry', age: 20 },
  { name: 'Paul',  age: 20 },
  { name: 'Peter', age: 1 },
  { name: 'Smith', age: 25 },
  { name: 'Kary',  age: 30 }
];

obj_arr.sort( buildCompareFunction(['name','age']) );

search