Travel Hack Session Diary

April 2, 2009 2 comments

I usually don’t write diaries but I think this is a story worthwhile talking about. It’s about a parser, a trip across Europe and practicing Extreme Programming. It all started after my lecture at the University of Szeged.

9:00 am – Coffeehouse in Szeged: I realize that the way back to Karlsruhe will take about eight hours. I ask Istvan whether we should use this time as a hack session. He immediately agrees and even has a cool project we could work on. He wants to use an Earley parser for a project at CAS and is exited to explain it to me. Up to this moment I’ve never heard about Earley parsers even though I took the compiler construction course back in university.

13:00 am – Intercity from Szeged to Budapest: Istvan tries to explain what an Earley parser is and how it works. He pulls out an empty sheet of paper and writes down the formal definition. I’m really impressed that he has memorized these details from. On the other hand it scares me a little since I don’t grasp it yet. It gets a little better after we manually parse the simple arithmetic expression ‘a+(a)’ on a second sheet of paper. 30 minutes later Istvan has managed to teach me the basics of this algorithm.

Train notes: Formal definition of an Earley parser

Train notes: Formal definition of an Earley parser

14:00 am – Still in the IC: At this point we decide to write an Earley parser for these kind of arithmetic expressions. Since Istvan has chosen the topic of our hack session, I choose the programming language. Of course it is JavaScript with qooxdoo. This ensures that at least the language gets out of my way. So I pull out my MacBook and create a new qooxdoo project. We start with the boiler plate code to manage the grammar and classes to store Earley sets and states. We make heavy use of qooxdoo’s test runner and unit testing framework to make sure that all code we write actually works as expected. Unintentional it becomes an intense XP session as we write all the tests first and have no way to avoid pair programming.

15:00 am – Ferihegy airport in Budapest: We have to take a break, check in and pass the security checks. Once at the gate we take the time to continue coding.

Me coding at the gate

17:00 – Germanwings flight to Stuttgart: Arrgh, the seats in the economy class aren’t made for 1.95m people like me. To make matters worse they put me in the middle row. I fail miserably to use my notebook. My elbows constantly are in my neighbor’s face. I have to hand over my MacBook to Istvan. Now I am in the passive role. We work so concentrated that it seems that the flight takes no time at all.

19:20 – City Train to Stuttgart central station: Shortly after leaving the airport my battery is almost empty. To keep on working we decide to continue with Istvan’s ThinkPad. We copy the sources to his computer using an USB thumb drive. Within a few minutes we are up to speed again.

20:00 – Intercity to Karlsruhe: We are still using Istvan’s ThinkPad. Its my turn again. God, I hate working on Windows. I permanently get all the keyboard shortcuts wrong. We both use Eclipse but there are subtle differences in the key bindings between Mac and Windows. Anyway, we are coming along nicely and gradually approach the heard of the parser. We are almost done but unfortunately we are approaching our final destination Karlsruhe. We could need 1-2 additional hours in the train.

21:15 – Istvan’s apartment: We can’t stop now. We are too close. We continue for some time but its getting harder to concentrate. It doesn’t make sense to keep on hacking. We decide to finish the hopefully small remaining amount of code tomorrow.

The next day 19:00 – Istvan’s apartment: We both come directly from our offices. I bring along two healthy Döner Kebeb to reinforce our internal batteries before we work on the code again. We are about to write the acceptor, which builds the heard of this parser. It basically computes the Earley states for each input token. If we fail to build the next set the input sentence is not accepted. If on the other hand all sets could be computed and the final set contains the start rule we know that the sentence matches the grammar. Shortly after midnight we have it. We can solve the word problem for our minimal arithmetic expression language, which means that we can decide whether the input is a valid arithmetic expression.

One week later 19:00 – Istvan’s apartment: I’m back in Karlsruhe and we have to take the last step to a fully working parser. Once we know that a sentence matches our grammar we need to derive it. This is done by starting with the start rule found in the last Earley set and compute all ways we could have come there. After implementing the required data structures, implementing the algorithm itself proofs to be not that hard. Awesome we really did it!

Extreme Programming Works

I’ve never had practiced XP in such a pure way before. No line of parser code was written without a failing test first. No line of code was written by any of us alone. Writing the tests first helped a lot to break down the complex task of writing a parser into tiny little chunks of code, which could be implemented and tested separately. This payed off at the end when writing the core of the algorithm. We never really had to debug any of our code. On the other hand it does require a certain amount of discipline to stay in this test-code-test loop. At this point pairing comes to play. It’s so much easier to keep disciplined if both keep an eye on it. Further its much easier to stay in flow. I definitely won’t think of checking my mail, while working with someone else. The working place did support it as well. It might sound paradox but for me a train or a plain are extremely productive places to work.

Earley Parser

Parsing is an exciting topic in computer science. Many see it as some kind of black art or voodoo but it really is not that hard. The Earley parser is interesting because it can work on any context free grammar and for ambiguous grammars it can return all possible derivation trees.

Of course we are not done yet. We still need a proper tokenizer and the functionality to build the abstract syntax tree. Once this is in place I want to give it a test on some more complex grammars. Parsing JSON or CSS are top of the list.

Categories: javascript, qooxdoo Tags: , ,

‘typeof’ considered useless – or how to write robust type checks

January 11, 2009 6 comments

In JavaScript it is incredibly hard to do robust type checks. The buildin “typeof” operator is meant to return the type of an expression.

typeof 12; // "number"
typeof "juhu"; //"string"

This look promising but in reality it is rarely useful because none of the buildin data types can be reliably detected by “typeof”. According to the ECMAScript specification typeof can return the following values:

Type Result
Undefined “undefined”
Null “object”
Boolean “boolean”
Number “number”
String “string”
Object (native and doesn’t implement [[Call]]) “object”
Object (native and implements [[Call]]) “function”

Lets examine the possible results of typeof.

  1. Types of “null” and “undefined”

    The type “object” of “null” values is most of the time pretty useless. The type “undefined” for undefined values however does make some sense but its more intuitive and faster to do an identity check against the value “undefined”.

    var undef; // value is undefined
    typeof undef == "undefined"; // true
    undef === undefined; // true
    
    var nullValue = null;
    typeof nullValue; // "object"
    nullValue === null; // true
    
  2. “string”, “boolean” and “number” types

    There are two ways to create a string, boolean or number in JavaScript. First by using a the literal notation and second by instantiating a string, boolean or number instance. Variables created by either of these ways behave exactly the same except that values created using the new statement are typeof “object”.

    typeof "juhu"; // "string"
    typeof new String("juhu"); // "object"
    
    typeof 123; // "number"
    typeof new Number(123); // "object"
    
    typeof 123; // "number"
    typeof new Number(123); // "object"
    

    A more robust check for these types could add an additional instanceof check.

    var value = new String("juhu");
    typeof value == "string" || value instanceof String; // true
    
  3. The “function” type

    Now functions should be simple. According to the spec should everything, which is callable should be typeof “function”. In reality this is true for all browsers except Safari/WebKit. In Safari regular expressions are considered to be typeof “function” as well. This is probably a browser bug but nonetheless makes it problematic to use typeof to detect functions .

    typeof (function() {}); // "function"
    typeof /abc/g; // "object" in IE, Firefox and Opera - "function" in Safari
    

    So instanceof checks are the better alternative.

    (function() {}) instanceof Function; // true
    (/abc/g) instanceof Function; // false in all browsers
    
  4. Detecting Arrays

    Even though arrays are native JavaScript types, the typeof statement does not support an “array” value. All arrays are typeof “object”. In my opinion this is an inconsistency in the language spec. The most common way to check for arrays is to the instanceof operator.

    typeof [1, 2, 3]; // "object"
    [1, 2, 3] instanceof Array; // true
    

BUT instanceof is Harmful

So it looks like instanceof is almost always the better alternative to typeof. This is true as long as it is guaranteed that variables are always created in the main window and never in a frame or a popup window. As kangax explains in his recent post:

The problems arise when it comes to scripting in multi-frame DOM environments. In a nutshell, Array objects created within one iframe do not share [[Prototype]]’s with arrays created within another iframe. Their constructors are different objects and so both instanceof and constructor checks fail.

Fortunately kangax comes with a nice solution to this problem. He has discovered that the “toString” method of the buildin JavaScript object reveals the internal class of a value.

function getClass(object) {
    return Object.prototype.toString.call(object).slice(8, -1);
}

This solution does not only solve the cross document issues in detecting arrays it does also solve all the problems with the typeof operator.

Value Class Type
“juhu” String string
new String(“juhu”) String object
1.2 Number number
new Number(1.2) Number object
true Boolean boolean
new Boolean(true) Boolean object
new Date() Date object
new Error() Error object
[1,2,3] Array object
new Array(1, 2, 3) Array object
new Function(“”) Function function
/abc/g RegExp object (function in Safari)
new RegExp(“abc”, “g”) RegExp object (function in Safari)
{} Object object
new Object() Object object

My prediction is that we’ll see many JavaScript frameworks switching to this kind of type checks. Dean Edwards already announced to use it in the next version of base2 in the comments of kangax post and of course the qooxdoo team will take a close look at it as well.

Update: jQuery already uses this technique since November 2008 for the “isArray” and “isFunction” functions.

Update #2: Ajaxian picked up the topic as well.

Categories: javascript Tags: , ,

How to Write a Bookmarklet

January 8, 2009 Leave a comment

Bookmarklets can be very handy. One bookmarklet I use all of the time if the “Note in Reader” for the Google Reader. If I find an interesting web page I want to archive or share with friends I just call the bookmarklet from my bookmark bar.
The google "note in reader" bookmarklet in action
This opens a window inside of the current web page, which offers to add this page to my google reader.

What are Bookmarklets?

All web browser support the special “javascript” URL protocol. If you type “javascript:” followed by JavaScript code into a browser’s location bar, the JavaScript will be evaluated in the context of the active web page (host page). Such strings are handled by the browsers just like any other URL. You can set them as “href” of a link tag and of coarse bookmark them. To get a feeling of this try typing this in the URL bar:

javascript:alert(document.getElementsByTagName("title")[0].innerHTML);

This will alert the document’s title. The bookmarklet’s JavaScript has full access to the document’s DOM and is run in the same JavaScript namespace as all other JavaScript in the host page.

Bookmarklet Mantra

When writing bookmarklets you have to keep two things in mind:

  1. Don’t make assumptions about the document. You don’t know which box model the document uses, whether it is HTML or XHTML or whether JavaScript libraries are loaded. Bookmarklets have to be programmed very defensively.
  2. Avoid unwanted side effects of your code to the document. If possible don’t add any JavaScript entries to the global JavaScript namespace and make as little changes to the DOM as possible. The page doesn’t know that you have injected code and you don’t want your bookmarklet to break the page. This means that you should not use global variables or function. The module pattern can help to achieve this.

The first bookmarklet

Simple JavaScript can be easily encoded as strings but as soon as the JavaScript becomes more complex this is no longer an option. It’s much easier to write normal JavaScript and once confident the code works as expected turn it into a bookmarklet string. To do this you can leverage a nice JavaScript feature. In JavaScript all functions are also objects and support the toString method. If toString is called on a function the function code is returned as a string. The above bookmarklet could have ben generated by the following code:

function bookmarklet() {
  document.getElementsByTagName("title")[0].innerHTML
};

document.getElementById("bookmarklet").href =
  "javascript:(" +
  bookmarklet.toString() +
  ")();"

This code converts the method “bookmarklet” into a string and sets this string as “href” attribute of a link tag with the id “bookmarklet”. Since the returned string is only the function declaration, the two parentheses are appended to immediately execute the function. This link can easily be dragged into the browser’s bookmark bar.

Using external JavaScript files

The next logical stet is to externalize the JavaScript code into an external JavaScript file. This is possible because we can use DOM methods to dynamically create a script tag and insert it into the document. During development the external JavaScript file can be statically embedded into a test HTML-document. A very simple external bookmarklet could look like this:

(function() {

var Widget = function(id) {
this.element = document.createElement("div");
this.element.id = id;
this.element.innerHTML = "<h1>Juhu Kinners!</h1> click to hide";

var that = this;
  this.element.onclick = function() { that.element.style.display = "none"; };
};

function main() {
  var widgetId = "bookmarkletWidget";
  var widget = new Widget(widgetId);
  document.body.appendChild(widget.element);
}

main();
})();

This script follows the module pattern to prevent variable and functions to leak into the global JavaScript namespace. The main method instantiates a widget, which is basically a wrapper for a DIV element, and inserts it into the document. The widgets is hidden if the user clicks on it.

The loader script has to dynamically create a SCRIPT element with the absolute URL to this script and insert it into the body.

function loadBookmarklet() {
  var script = document.createElement("script");
  script.src = "http://juhukinners.com/posts/bookmarklet/widget.js";
  script.type = "text/javascript";
  document.body.appendChild(script);
};

document.getElementById("bookmarklet").href =
  "javascript:(" +
  loadBookmarklet.toString() +
  ")();"

The JavaScript URL is generated the same way as in the previous example.

What about CSS?

The bookmarklet doesn’t look very well yet. It could use some CSS styling. Just like scripts, stylesheet links can be added dynamically into the document. The bookmarklet can be extended like this to use CSS:

(function() {

var ROOT_URL = "http://juhukinners.com/posts/bookmarklet";

function loadCss(url) {
  var el = document.createElement("link");
  el.type = "text/css";
  el.rel = "stylesheet";
  el.href = url;

  var head = document.getElementsByTagName("head")[0];
  head.appendChild(el);
};

...

function main() {
  loadCss(ROOT_URL + "/style.css");
  ...
}

Note that the URL to the stylesheet must be absolute since relative URL’s are relative to the host page and not relative to the bookmarklet’s JavaScript file. To minimize side effects of the loaded stylesheet, all CSS rules should be scoped with the ID of the top level bookmarklet element. Further its a good idea to prefix all CSS class names and IDs with an unique string to prevent name collisions with the host page.

Finishing touches

We now the demo bookmarklet is almost ready. We have a link, which can be saved as a bookmark. If the link is clicked the browser dynamically inserts an external JavaScript file into the current page and this script uses CSS to style its UI. But what if the user clicks this links several times? Obviously the bookmarklet’s JavaScript is inserted again and again. Since this is not desired in this case the loader has to be tweaked a little to only load the bookmarklet once:

function loadBookmarklet() {
  var widgetId = "bookmarkletWidget";
  var widgetElement = document.getElementById(widgetId);
  if (widgetElement) {
    widgetElement.style.display = "block";
    return;
  }

  ...
};

Test this bookmarklet or checkout the source code on github.

“With great power comes great responsibility” (Peter Parker)

Since bookmarklets run in the scope of the host page they can read and modify every piece of data on the page. The same is true for all globally accessible JavaScript data. Bookmarklets are even allowed to establish HTTP connections to the domain the host page was served from. In this regard a bookmarklet is very similar to a browser plugin. This implies that every user of a bookmarklet has to trust its author and the author on the other hand must be aware of his responsibility.

Happy coding!

Categories: Tutorial Tags: , ,

qooxdoo in the c’t

December 23, 2008 3 comments

The current edition of the famous German computer magazine c’t features a pretty long article about qooxdoo by Andreas Ecker and me. The article first introduces the idea behind qooxdoo and gives an overview about qooxdoo’s architecture. The second part is a tutorial of how to build the interface of a simple calculator application. To demonstrate the new styling capabilities of qooxdoo 0.8 we show how to use the appearance system to style the calculator in a completely different way. The graphics for this styled calculator are taken from a modified calculator widget for Apple’s Dashboard by Jonas Rask. He was so kind to give us a permission to use it.

On the left the unstyled calculator and on the right the styled calculator

For this article we have built a special version of the playground application introduced with the brand new qooxdoo 0.8.1 release. It allows the reader to follow each step without the need to install anything. A fully working version of the calculator application can be downloaded from the “softlink” page of the article.

Online playground for the c't calculator

If you have read the article please don’t hesitate to write a comment to this post. I’m really interested in what you liked, what you didn’t like and about which qooxdoo related topics you would like to read more in the future.

If you understand German and haven’t read it yet, get a copy :-)

Have fun, Fabian

qooxdoo 0.8.1 released

December 19, 2008 Leave a comment

Right in time before Christmas we have released the first bug fix release for the new qooxdoo 0.8 line. We’ve fixed plenty of bugs users have reported for 0.8. Its a drop in replacement for 0.8. No API breaking changes were made. Everyone using 0.8 is strongly encouraged to update to 0.8.1. Download and have fun.

Now its time for a long Christmas holiday before we start in an exciting new year for qooxdoo.

Merry Christmas and a happy New Year 2009,
Fabian

Categories: qooxdoo Tags: , ,

Juhu Kinners

December 16, 2008 4 comments

After years of passive consumption of blogs I finally switched sides and now plan to regularly share my ideas. OK, I already blogged serveral times on http://qooxdoo.org but that was work.

This blog will cover technical issues about may passion of bringing the desktop to the browser. Of coarse it will cover qooxdoo and JavaScript but expect more general coding/software development articles as well.

So what about the funny title? Those who know my code don’t need any explaination. For the rest of You: Its my way of saying “Hello World” or “foo bar”. I guess every programmer has a repertoire of senseless words, which are used when just any string will do. Its the kind of words you put in a unit test or in a hello world program. You will find references in any significant code base I’ve been working on. I guess the qooxdoo sources are full of it. What do You use?

Have fun reading,
Fabian

Categories: Uncategorized Tags: ,
Follow

Get every new post delivered to your Inbox.