Archive

Archive for the ‘javascript’ Category

Travel Hack Session Diary

April 2, 2009 2 comments

I usually don’t write diaries but I think this is a story worthwhile talking about. It’s about a parser, a trip across Europe and practicing Extreme Programming. It all started after my lecture at the University of Szeged.

9:00 am – Coffeehouse in Szeged: I realize that the way back to Karlsruhe will take about eight hours. I ask Istvan whether we should use this time as a hack session. He immediately agrees and even has a cool project we could work on. He wants to use an Earley parser for a project at CAS and is exited to explain it to me. Up to this moment I’ve never heard about Earley parsers even though I took the compiler construction course back in university.

13:00 am – Intercity from Szeged to Budapest: Istvan tries to explain what an Earley parser is and how it works. He pulls out an empty sheet of paper and writes down the formal definition. I’m really impressed that he has memorized these details from. On the other hand it scares me a little since I don’t grasp it yet. It gets a little better after we manually parse the simple arithmetic expression ‘a+(a)’ on a second sheet of paper. 30 minutes later Istvan has managed to teach me the basics of this algorithm.

Train notes: Formal definition of an Earley parser

Train notes: Formal definition of an Earley parser

14:00 am – Still in the IC: At this point we decide to write an Earley parser for these kind of arithmetic expressions. Since Istvan has chosen the topic of our hack session, I choose the programming language. Of course it is JavaScript with qooxdoo. This ensures that at least the language gets out of my way. So I pull out my MacBook and create a new qooxdoo project. We start with the boiler plate code to manage the grammar and classes to store Earley sets and states. We make heavy use of qooxdoo’s test runner and unit testing framework to make sure that all code we write actually works as expected. Unintentional it becomes an intense XP session as we write all the tests first and have no way to avoid pair programming.

15:00 am – Ferihegy airport in Budapest: We have to take a break, check in and pass the security checks. Once at the gate we take the time to continue coding.

Me coding at the gate

17:00 – Germanwings flight to Stuttgart: Arrgh, the seats in the economy class aren’t made for 1.95m people like me. To make matters worse they put me in the middle row. I fail miserably to use my notebook. My elbows constantly are in my neighbor’s face. I have to hand over my MacBook to Istvan. Now I am in the passive role. We work so concentrated that it seems that the flight takes no time at all.

19:20 – City Train to Stuttgart central station: Shortly after leaving the airport my battery is almost empty. To keep on working we decide to continue with Istvan’s ThinkPad. We copy the sources to his computer using an USB thumb drive. Within a few minutes we are up to speed again.

20:00 – Intercity to Karlsruhe: We are still using Istvan’s ThinkPad. Its my turn again. God, I hate working on Windows. I permanently get all the keyboard shortcuts wrong. We both use Eclipse but there are subtle differences in the key bindings between Mac and Windows. Anyway, we are coming along nicely and gradually approach the heard of the parser. We are almost done but unfortunately we are approaching our final destination Karlsruhe. We could need 1-2 additional hours in the train.

21:15 – Istvan’s apartment: We can’t stop now. We are too close. We continue for some time but its getting harder to concentrate. It doesn’t make sense to keep on hacking. We decide to finish the hopefully small remaining amount of code tomorrow.

The next day 19:00 – Istvan’s apartment: We both come directly from our offices. I bring along two healthy Döner Kebeb to reinforce our internal batteries before we work on the code again. We are about to write the acceptor, which builds the heard of this parser. It basically computes the Earley states for each input token. If we fail to build the next set the input sentence is not accepted. If on the other hand all sets could be computed and the final set contains the start rule we know that the sentence matches the grammar. Shortly after midnight we have it. We can solve the word problem for our minimal arithmetic expression language, which means that we can decide whether the input is a valid arithmetic expression.

One week later 19:00 – Istvan’s apartment: I’m back in Karlsruhe and we have to take the last step to a fully working parser. Once we know that a sentence matches our grammar we need to derive it. This is done by starting with the start rule found in the last Earley set and compute all ways we could have come there. After implementing the required data structures, implementing the algorithm itself proofs to be not that hard. Awesome we really did it!

Extreme Programming Works

I’ve never had practiced XP in such a pure way before. No line of parser code was written without a failing test first. No line of code was written by any of us alone. Writing the tests first helped a lot to break down the complex task of writing a parser into tiny little chunks of code, which could be implemented and tested separately. This payed off at the end when writing the core of the algorithm. We never really had to debug any of our code. On the other hand it does require a certain amount of discipline to stay in this test-code-test loop. At this point pairing comes to play. It’s so much easier to keep disciplined if both keep an eye on it. Further its much easier to stay in flow. I definitely won’t think of checking my mail, while working with someone else. The working place did support it as well. It might sound paradox but for me a train or a plain are extremely productive places to work.

Earley Parser

Parsing is an exciting topic in computer science. Many see it as some kind of black art or voodoo but it really is not that hard. The Earley parser is interesting because it can work on any context free grammar and for ambiguous grammars it can return all possible derivation trees.

Of course we are not done yet. We still need a proper tokenizer and the functionality to build the abstract syntax tree. Once this is in place I want to give it a test on some more complex grammars. Parsing JSON or CSS are top of the list.

Advertisements
Categories: javascript, qooxdoo Tags: , ,

‘typeof’ considered useless – or how to write robust type checks

January 11, 2009 6 comments

In JavaScript it is incredibly hard to do robust type checks. The buildin “typeof” operator is meant to return the type of an expression.

typeof 12; // "number"
typeof "juhu"; //"string"

This look promising but in reality it is rarely useful because none of the buildin data types can be reliably detected by “typeof”. According to the ECMAScript specification typeof can return the following values:

Type Result
Undefined “undefined”
Null “object”
Boolean “boolean”
Number “number”
String “string”
Object (native and doesn’t implement [[Call]]) “object”
Object (native and implements [[Call]]) “function”

Lets examine the possible results of typeof.

  1. Types of “null” and “undefined”

    The type “object” of “null” values is most of the time pretty useless. The type “undefined” for undefined values however does make some sense but its more intuitive and faster to do an identity check against the value “undefined”.

    var undef; // value is undefined
    typeof undef == "undefined"; // true
    undef === undefined; // true
    
    var nullValue = null;
    typeof nullValue; // "object"
    nullValue === null; // true
    
  2. “string”, “boolean” and “number” types

    There are two ways to create a string, boolean or number in JavaScript. First by using a the literal notation and second by instantiating a string, boolean or number instance. Variables created by either of these ways behave exactly the same except that values created using the new statement are typeof “object”.

    typeof "juhu"; // "string"
    typeof new String("juhu"); // "object"
    
    typeof 123; // "number"
    typeof new Number(123); // "object"
    
    typeof 123; // "number"
    typeof new Number(123); // "object"
    

    A more robust check for these types could add an additional instanceof check.

    var value = new String("juhu");
    typeof value == "string" || value instanceof String; // true
    
  3. The “function” type

    Now functions should be simple. According to the spec should everything, which is callable should be typeof “function”. In reality this is true for all browsers except Safari/WebKit. In Safari regular expressions are considered to be typeof “function” as well. This is probably a browser bug but nonetheless makes it problematic to use typeof to detect functions .

    typeof (function() {}); // "function"
    typeof /abc/g; // "object" in IE, Firefox and Opera - "function" in Safari
    

    So instanceof checks are the better alternative.

    (function() {}) instanceof Function; // true
    (/abc/g) instanceof Function; // false in all browsers
    
  4. Detecting Arrays

    Even though arrays are native JavaScript types, the typeof statement does not support an “array” value. All arrays are typeof “object”. In my opinion this is an inconsistency in the language spec. The most common way to check for arrays is to the instanceof operator.

    typeof [1, 2, 3]; // "object"
    [1, 2, 3] instanceof Array; // true
    

BUT instanceof is Harmful

So it looks like instanceof is almost always the better alternative to typeof. This is true as long as it is guaranteed that variables are always created in the main window and never in a frame or a popup window. As kangax explains in his recent post:

The problems arise when it comes to scripting in multi-frame DOM environments. In a nutshell, Array objects created within one iframe do not share [[Prototype]]’s with arrays created within another iframe. Their constructors are different objects and so both instanceof and constructor checks fail.

Fortunately kangax comes with a nice solution to this problem. He has discovered that the “toString” method of the buildin JavaScript object reveals the internal class of a value.

function getClass(object) {
    return Object.prototype.toString.call(object).slice(8, -1);
}

This solution does not only solve the cross document issues in detecting arrays it does also solve all the problems with the typeof operator.

Value Class Type
“juhu” String string
new String(“juhu”) String object
1.2 Number number
new Number(1.2) Number object
true Boolean boolean
new Boolean(true) Boolean object
new Date() Date object
new Error() Error object
[1,2,3] Array object
new Array(1, 2, 3) Array object
new Function(“”) Function function
/abc/g RegExp object (function in Safari)
new RegExp(“abc”, “g”) RegExp object (function in Safari)
{} Object object
new Object() Object object

My prediction is that we’ll see many JavaScript frameworks switching to this kind of type checks. Dean Edwards already announced to use it in the next version of base2 in the comments of kangax post and of course the qooxdoo team will take a close look at it as well.

Update: jQuery already uses this technique since November 2008 for the “isArray” and “isFunction” functions.

Update #2: Ajaxian picked up the topic as well.

Categories: javascript Tags: , ,