I usually don’t write diaries but I think this is a story worthwhile talking about. It’s about a parser, a trip across Europe and practicing Extreme Programming. It all started after my lecture at the University of Szeged.
9:00 am – Coffeehouse in Szeged: I realize that the way back to Karlsruhe will take about eight hours. I ask Istvan whether we should use this time as a hack session. He immediately agrees and even has a cool project we could work on. He wants to use an Earley parser for a project at CAS and is exited to explain it to me. Up to this moment I’ve never heard about Earley parsers even though I took the compiler construction course back in university.
13:00 am – Intercity from Szeged to Budapest: Istvan tries to explain what an Earley parser is and how it works. He pulls out an empty sheet of paper and writes down the formal definition. I’m really impressed that he has memorized these details from. On the other hand it scares me a little since I don’t grasp it yet. It gets a little better after we manually parse the simple arithmetic expression ‘a+(a)’ on a second sheet of paper. 30 minutes later Istvan has managed to teach me the basics of this algorithm.
15:00 am – Ferihegy airport in Budapest: We have to take a break, check in and pass the security checks. Once at the gate we take the time to continue coding.
17:00 – Germanwings flight to Stuttgart: Arrgh, the seats in the economy class aren’t made for 1.95m people like me. To make matters worse they put me in the middle row. I fail miserably to use my notebook. My elbows constantly are in my neighbor’s face. I have to hand over my MacBook to Istvan. Now I am in the passive role. We work so concentrated that it seems that the flight takes no time at all.
19:20 – City Train to Stuttgart central station: Shortly after leaving the airport my battery is almost empty. To keep on working we decide to continue with Istvan’s ThinkPad. We copy the sources to his computer using an USB thumb drive. Within a few minutes we are up to speed again.
20:00 – Intercity to Karlsruhe: We are still using Istvan’s ThinkPad. Its my turn again. God, I hate working on Windows. I permanently get all the keyboard shortcuts wrong. We both use Eclipse but there are subtle differences in the key bindings between Mac and Windows. Anyway, we are coming along nicely and gradually approach the heard of the parser. We are almost done but unfortunately we are approaching our final destination Karlsruhe. We could need 1-2 additional hours in the train.
21:15 – Istvan’s apartment: We can’t stop now. We are too close. We continue for some time but its getting harder to concentrate. It doesn’t make sense to keep on hacking. We decide to finish the hopefully small remaining amount of code tomorrow.
The next day 19:00 – Istvan’s apartment: We both come directly from our offices. I bring along two healthy Döner Kebeb to reinforce our internal batteries before we work on the code again. We are about to write the acceptor, which builds the heard of this parser. It basically computes the Earley states for each input token. If we fail to build the next set the input sentence is not accepted. If on the other hand all sets could be computed and the final set contains the start rule we know that the sentence matches the grammar. Shortly after midnight we have it. We can solve the word problem for our minimal arithmetic expression language, which means that we can decide whether the input is a valid arithmetic expression.
One week later 19:00 – Istvan’s apartment: I’m back in Karlsruhe and we have to take the last step to a fully working parser. Once we know that a sentence matches our grammar we need to derive it. This is done by starting with the start rule found in the last Earley set and compute all ways we could have come there. After implementing the required data structures, implementing the algorithm itself proofs to be not that hard. Awesome we really did it!
Extreme Programming Works
I’ve never had practiced XP in such a pure way before. No line of parser code was written without a failing test first. No line of code was written by any of us alone. Writing the tests first helped a lot to break down the complex task of writing a parser into tiny little chunks of code, which could be implemented and tested separately. This payed off at the end when writing the core of the algorithm. We never really had to debug any of our code. On the other hand it does require a certain amount of discipline to stay in this test-code-test loop. At this point pairing comes to play. It’s so much easier to keep disciplined if both keep an eye on it. Further its much easier to stay in flow. I definitely won’t think of checking my mail, while working with someone else. The working place did support it as well. It might sound paradox but for me a train or a plain are extremely productive places to work.
Parsing is an exciting topic in computer science. Many see it as some kind of black art or voodoo but it really is not that hard. The Earley parser is interesting because it can work on any context free grammar and for ambiguous grammars it can return all possible derivation trees.
Of course we are not done yet. We still need a proper tokenizer and the functionality to build the abstract syntax tree. Once this is in place I want to give it a test on some more complex grammars. Parsing JSON or CSS are top of the list.