Balisage 2009 trip report

Balisage (once Extreme markup) seems to be the Summer conference of choice for schema architects, developers, visionaries and markup philosophers: in short, those laboring on the underpinnings of XML and related technologies. Conference literature calls us “markup geeks.” Mere users of these technologies might find this conference heavy sledding (it is quite technical), and marketers should be aware that this is not a trade show: there are few opportunities for promoting goods and services. However, idea-pushers will find a welcoming audience at Balisage.

This year, I attended Balisage for the first time, and I found it a highly rewarding experience. I came away full of ideas for new ways to work, new tools to use, and new directions in which to push our XML-based publishing practice. In addition I made a passel of new friends: The community that has grown up around this conference is very welcoming, and surprisingly so, given how close-knit it seems to be. The conference is well-organized and run by folks from Mulberry Technologies and friends, with a wide array of reviewers vetting papers and presentations. My one complaint is that I was unable to remain for the full schedule and missed some very exciting-looking presentations on Thursday and Friday.

Disclaimer: I was unable to see every talk, so this report represents merely some highlights, as I saw them. This report is biased and unfair, but honest and without the intention of brutality.

Day 1

On Monday, Michael Kay of Saxonica chaired the Symposium on Processing XML Efficiently, a separate but related series of presentations that shared the same logistical backbone with the main Balisage conference.

The surprise hit of this session was Rob Cameron’s presentation on Parallel Bit Stream Technology. At first, I had no clue what was going on in this talk. The basic idea seemed clever: rotate your bytes sideways, packing all the first bits into a wide (64-bit or wider) word, all the second bits into the next word, and so on. But what was the point? When Cameron demonstrated that some simple addition and shifting operations led to a bit pattern that encoded the positions of all the valid XML numeric entities, the “aah” of dawning comprehension rippled through the audience. He went on to show how this could be used to jump-start a tokenizer and accelerate XML parsing, resulting in impressive speedups in a core technology. There are some remaining challenges, though: to make these results available via Java will require some tricky bridgework.

I was curious to hear about the internals of our competitor, Highwire’s, much-ballyhooed H2O platform, presented by James Robinson. Given the context of the conference, it was no big surprise that the workhorse of H2O (Firenze) is an XSLT-based rendering pipeline with built-in caching features. What was most interesting was what it’s not built on top of: a database. The core storage mechanism of the public-facing content delivery system is essentially a hierarchical file-system store. Search features are provided by FAST, (acquired last year by Microsoft). Some interesting observations:

  1. The use of XML data and processing models in every step introduced some overhead (for example, performing HEAD requests that check the cache went through an inefficient XSLT and consumed 30% of overall execution time). Resolving these performance bottlenecks requires some special attention and custom optimizations. However, Robinson felt such costs to be outweighed by the benefit of the more convenient programming model.
  2. The LRU cache was distorted by search engine crawlers, which tend to perform broader-spectrum requests than typical users. Robinson plans to optimize for human users by introducing an ARC cache, which will take into account the frequency of requests for a given resource when aging cache entries.

David Lee presented a performance comparison of various scripting techniques for running multi-step XML pipelines. The resulting comparison of xmlsh and xproc, which run xml processes internally, with shell scripts which execute xslt and xquery externally, was largely a cautionary tale about the cost of repeatedly starting and stopping the Java VM. Although this wasn’t news to many attendees, it is the kind of implementation mistake that is easy to make even if one’s theoretically aware of the issue. The talk was also valuable as an introduction to xmlsh and xproc, two scripting technologies that ought to be in every XML-hacker’s toolbelt. Lee developed xmlsh, an XML scripting language with syntax based on Unix shell; XProc is a nascent W3C standard with a few available implementations, including Norm Walsh’s Calabash, which was tested as part of this work.

Balisage, Day 1

After some introductions, Tommie Usdin launched the conference with a dose of tart wisdom. She issued cautions about blind adherence to standards, urging a deeper understanding of requirements. The rest of the conference was peppered with mild disagreement with her plenary point: Standards considered harmful.

Alex Milowski reminded us that Netscape has had sophisticated XML support in the browser since 1999, although the lack of comparable support in Internet Explorer was treated as a minor, if unfortunate, side note. Milowski made a plea for a core set of XML vocabularies including HTML, SVG and MathML. He then demonstrated some pioneering work with audio-enabled eBooks (in the DAISY format) using Firefox extensions, and indicated a similar model is also available in the mobile space (using iPhone and Android). What I learned: you can associate MIME types with Firefox plugins so as to trigger custom behavior for your content type.

Somehow I managed to miss David Birnbaum’s talk on writing optimal XPath and XQuery for eXist’s evaluation engine, which I had intended to see. I guess I’ll have to read his paper when it’s posted. I also sadly missed seeing Michael Sperberg-McQueen’s talks, which, to judge from the questions with which he prodded the speakers, would have been fascinating.

Uche Ogbuji presented his application development platform, Akara, which integrates XML-native operations with Python in clever ways. Python developers working with XML will want to stay abreast of this work from the former developer of the 4Suite toolset, currently available in Beta.

To wrap up the first day, we had to choose between a fascinating-sounding flight into abstraction from Wendell Piez, and a survey of XML visualization tools from Mohammed Zergaoui. In the end, my practical orientation won out, and I was rewarded with a number of links to follow. In particular I am intrigued by Xopus, an in-browser WYSIWYG XML editor that looks very promising. I got a demo at the conference (from Betty Harvey), and will definitely follow up since this is something we need to embed in our solutions. I still wonder what a “nomic game” is, though.

Day 2

Fabio Vitali presented a new approach to handling overlapping markup. I learned that this is a major area of theoretical interest in markup that has been addressed by numerous proposed schemes. XML’s element tree can’t represent overlap directly in a natural way, and markup such as bold, italic is inherently ambiguous (which tag should be the outer one?). Although new to me, this seemed to be well-trodden ground for many at the conference, and the new contribution was the idea of using RDF and OWL as representation meta-languages.

Norman Walsh shared some data gathered from the “phone home” feature of XML Calabash, his XProc implementation. Every time Calabash runs, it sends some basic profiling information to xmlcalabash.org, unless the user explicitly opts out. Some insights: it appears that many (most) pipelines are streamable, at least from the viewpoint of XProc, although it was less clear what the benefit of streaming might be in these cases. One odd feature of Walsh’s data was that a surpriing number of pipelines have only a single step in them; he attributed this to the convenience of executing other XML processes via XProc, or possibly to a pattern of tire-kicking behavior.

The final regularly-scheduled presentation was Michael Kay’s discussion of the polarity of streaming processes. This included a fascinating discursion on the merits of co-routines for programming inherently multi-processing programs; their performance benefits relative to multi-threading; the difficulty of implementation in single-stack languages such as Java. Remarkably, there was a dearth of questions after Kay’s presentation. This was not a case of lack of interest; rather it seemed as if all the bases had been covered, and in the end there was just nothing more to be said once the master had spoken.

After a deeper dive into eBook formats with Alex Milowski, and a presentation on REST and XML that (according to him) came to Kurt Cagle during a sleepless night in the hotel room, it was time for me to drive home. Perhaps next year I’ll be able to stay for the full week, and not miss the XML games that were scheduled.

(note: cross-posted at the ifactory blog)

25 selfish things in 2^10-1 words

1. I live in one of the wealthiest parts of the only inhabited planet I can think of at the moment, but I still sometimes feel jealous of other people who seem to have it better than me.
2. I like to make things, especially things I’ve never made before; here’s a partial list:: a guitar, a virtual paper-folding simulation, a cryptic crossword puzzle, the coffee table my feet are resting on, a couple of sweaters, both long gone, a bavarian creme, also gone. Most of these things are misbegotten in one way or another, because perfecting a craft doesn’t happen if you keep skipping around like that.
3. I’d still like to write a novel, but the one I started kept growing and growing until I became afraid of getting lost in it with no way out, ever, and feared spending my whole life there. I saw that I could end up having written some all-consuming banality like Arcadia. Some time I’ll go back there and try again, though.
4. I am sometimes homesick for dirty old grumpy New York, but now everything seems so nice there. That city is too big for anyone to hold on to for more than a moment or two: it keeps getting polished with money and never remains the same.
5. 25 things is a lot to come up with
6. I never forward chain letters, or participate in group writing projects
7. I am not a team player, but people seem to tolerate me anyway. Maybe I am a team player. Or maybe people are actually finding me intolerable, like the woman on the train this evening who didn’t seem to like the way I blew my nose. OK, it was kind of loud, but she was really offended.
8. I like garlic. A lot.
9. I like to destroy things, especially other people’s preconceptions. This has probably tended to tie in to the whole “not a team player” thing.
10. I haven’t killed anybody, or written a symphony, or been to the moon, or made more money than Croesus, or taken over a small country in a bloodless coup.
11. My nose runs a lot, especially in the winter. I am starting to get some aches and pains that I am told will only increase. I have a small black spot on the sole of my foot that makes me feel stupid when I think about cancer and that I chose not to bother to have it cut out of my flesh with a sharp knife. Did I say it’s in the bottom of my foot?
12. The closest I have come to dying is when I drove a car off a cliff into some trees, but that was a long time ago, so I guess with hindsight we can see that I am closer to death now, in fact, as I sit in my cozy chair writing to you with my feet up on the coffee table that I made.
13. I used to think I was messy, but then I started a family and had dogs for a while. Now I am just a lazy neat freak, but the house looks about the same.
14. I don’t believe in God right now, but when I do, God exists. Sorry if that seems solipsistic, but this is 25 things about me, after all, and that’s just how I feel. I’m sure your God is always there for you – don’t get all defensive.
15. I can’t write anything about Ellen here because all the good stuff is secret.
16. I talk too much sometimes and too loudly; apparently it’s a family trait. I am trying to be a better listener.
17. I am very good at giving clear instruction, but terrible at enforcing anything. I am very good at following instructions, but can’t stand being told what to do.
23. Actually, about #8 – garlic is alright, but it’s definitely not one of the 25 most important things about me. I just liked the mood-enhancing short quality of the sentence more than anything else. Sorry, can I take it back?
18. Something about me you might not know is that my heart rate is really low.
19. Being competent has always been very important to me, and knowing the answer, too, but the more I think about that, the less important it seems.
20. I keep thinking there must be more to life than this, and then there is. But maybe I’m not unique in that regard. I think actually, that personality is overrated. Luck has a lot more to do with it. I’ve been very lucky, having made so incredibly many blunders, and yet I keep getting a second chance.
21. I like music that’s so weird that I don’t even like it, but every now and then I just want to sing along to the Beatles.
22. Virtue seems tedious to me. Of all the seven deadly sins, I think Pride is the best, followed closely by Envy. I’d like to invent a new one. Can Frivolity be a sin? Maybe not a deadly one, but it suits me. Humor is the best tonic for a bruised soul, and really the only attitude towards Death I can condone since the alternatives are so dreary. Maybe when someone I love dies, I’ll feel differently, but I think I’ll owe it to them to keep on laughing anyway.
23. When I die and people start rooting around in my stuff, they’ll find a lot of files called “TODO.” I hope someone will bother, but I wouldn’t wish that task on anyone I like.
24. 25 things is not enough to say what I want to
25. I’m right-handed, married, have three children, live in a house, drive a car, take the train to my job where I work with occasional complaints 240 days a year, eat red meat, green vegetables, white rice and drink coffee and beer and wine and water, go sailing and play soccer in the summer, get fat in the winter, stay up too late, get up too early, love my wife, fight with my wife, spoil my children, indulge myself far too much and go on far too long. Basically, I would say, to sum up, that I am just one helluva regular guy!

Puzzle Solver

I’ve posted my first working puzzle solver here. It is based on a flexible framework with a high degree of abstraction, with the idea of adding new puzzle types easily; so performance is not the main goal. So far, it can solve Sudoku and Kakuro type puzzles; it will be fairly straightforward to build brute solvers (guess-and-check) for puzzles of the type where you fill a grid with one of a small number of tokens and there is only a single solution.

At one point I was considering creating a constraint language in which declarative rules could be expressed in a shorthand notation, but this is beginning to seem like a lot of work with a dubious payoff. For now, all the constraints are expressed as Java programs. My current plan is to add more puzzle types (perhaps including some simple word puzzles at some point), and then to evaluate whether there is any benefit in further abstraction.

Scrapple

Play the scrapple word game against the computer, or play a solo game and try to achieve the perfect score. You will need to have installed a Java runtime environment (at least version 1.5) to use this applet-based program. Also don’t forget about the sudoku solver and the snowflake creator, all on the toys page.

This is the so-called official s c r a b b l e players dictionary (3) word list: ospd3, used by scrapple to determine allowable plays. It is missing some pretty real words, but apparently is used in tournament play nonetheless. What do they do when someone challenges zen, straddles, or irker?

WORD LIST

This list of words is pretty big (1.7M); it is all lower-case, includes various word forms and excludes proper names.  I wish I could remember where it came from (I think it was called WORDS, if that helps you identify it??)

words

Gloves off

OK – I’ve removed the password-protection from the site so it is now open to the ravages of the internet savages.  Oh yes, let the slings and arrows begin.

in plain sight

in-plain-sight is my very first cryptic crossword. Maybe someday I’ll figure out how to post it in HTML, but for now you’ll need to download the PDF and print out: don’t most people still like to solve that way? The crossword has fairly standard clues with an extra twist worked in. If you enjoy it (or don’t), you can let me know by posting a comment on this site.

Hi! falutin promises

Welcome to falutin.net, a hopefully promising new site.

I hope that if you read this, you will want to return again and again.

I promise not to write very often, so it will be easy to keep up.

I promise not to make this site maudlin or trivial. I promise not to twitter.

I hope this site will be irreverent, relevant, literate and readable.