Is there any better feeling than finally finding an elegant answer to a problem that's previously forced you to use dirty hacks? The freedom that comes from getting to strip out line after line of filthy code and replace it with a single line that does the job properly.
XSLT is my latest flirtation. I've used it a bit before, but I've come back to it to solve a couple of problems at work, and the more I use it, the more powerful and elegant it becomes. In my line of work, we do an awful lot of XML processing, especially converting to CSV, and up until now it's largely been writing and rewriting SAX handlers to do the job. This has been the default option because, well, someone's already done the hard work before, so the quickest route to get something done is to copy another SAX handler, tweak it, and job done. But it's slow and difficult to maintain, even with the best written code (and believe me, a lot of our SAX handlers are far from it). XSLT is really what should have been used from the very start.
Anyway, that problem. We use a third party product as our core system, and a large part of what we do is to take reports from that system as XML, and turn them into something that a lowly user could make sense of (read: CSV). I really want to push the use of XSLT to do this, and set about doing some proof of concept stuff. This was all going very well, until I came across a set of reports that appear to use some sort of generic MS database export - the namespace declaration is:
<xml xmlns:s='uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882'
xmlns:dt='uuid:C2F41010-65B3-11d1-A29F-00AA00C14882'
xmlns:rs='urn:schemas-microsoft-com:rowset'
xmlns:z='#RowsetSchema'>
The sharp eyed amongst you will notice that the declaration of the z namespace involves a
relative path to an embedded schema,which is a
deprecated practice. This is no bad thing, but given this xml fragment:
<z:row someAttr="someData" />
and this stylesheet fragment:
<xsl:template match="z:row">
Xalan appears to throw a bit of a wobbly:
A node test that matches either NCName:* or QName was expected
It seems this is Xalan specific - other XSLT processors handle it just fine. So, the obvious solution is to switch processors right? Well, maybe, but dammit, I use Xalan for other stuff, I'm not prepared to let this stand in the way. Removing the # from the namespace declaration in both the stylesheet and the xml makes it work.
I spent a while looking at this, and finally came up with a dirty old hack - the use of a HashFilterInputStream. That's right, it's that dirty, a FilterInputStream that filters out the hash, so the resulting namespace is simply "RowsetSchema". Dirty, but it worked, as long as you weren't expecting a hash sign to occur anywhere else in the document. Believe me, I had sleepless nights over this one.
Thankfully, on another pass over the problem, I finally found a much more agreeable solution. Make no mistake, it's still a bit of a hack and no replacement for Xalan actually getting fixed, but at the very least it meant I didn't have to rely on my Java code to do the right thing before processing the transform. The solution is this - where you reference a tag in the dodgy namespace, explicitly test for the node-name() and namespace-uri() as strings, rather than letting Xalan do the name resolution.
So where you may have previously used:
<xsl:template match="z:row">
you should instead do the explicit test:
<xsl:template match="node-name()='row' and namespace-uri()='#RowsetSchema'">
You're welcome.