XML is like a straitjacket

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

Dan Lyke, Monday August 1st, 2005

XML is like a straightjacket because:

  1. It's dangerous to do anything without assistance. Sure, you can try to get the bottle off the top of the cabinet using only your teeth, but about the time you've nudged the chair over to the shelf, and have hopped up on it, and have finally managed to loop the towel over the neck of the bottle by swinging your head just so. and you think you can catch it in the crook of your neck as it falls... well... something's going to go wrong, and you should have just called for the nice man in the white coat to help you. Even if he won't give you the whisky.

Similarly, with XML, regular expressions or special-casing stuff is a bad idea; you should always ask the parser to do the work for you, and you should always run everything you're going to do by the parser first to make sure it's not going to get you in trouble.

  1. It severely limits what you can do. How? A character is one of:

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

And that means ASCII text cannot be sent via XML. Think about it for a moment, and if you don't get it, ask a programmer.

It makes you get your character set stuff right. Copying stuff off the web into a form and assuming each character in there will be legal... won't work. Reading from an HTML document... won't work. You *must* be clear on what character set encoding you're writing your XML data with, and what encoding your source document is, and if they're different do translation in the middle.

Also, if you're using Perl, this means you should know when regular expressions are operating on "characters" and when they're operating on bytes. If you're using C, note that a "char" is the one data type that you should *not* be using to represent characters.

  1. You have to be careful what you say. If you're not really careful to qualify your statements, the aforementioned nice young man might put you in the padded room where you can't talk to anyone. Similarly, aside from the character set issues mentioned above, there are a sacred 5 characters that should *always* (yes, there are exceptions, but if you take advantage of them it will bite you) be entity escaped.
  2. In the straightjacket, your day is structured. You take your pills at the same time. You go for your walk at the same time. Similarly with XML.

In fact, by the time you get into the straightjacket, someone else has probably thought through the structure for you. With XML, however, it's probably going to be up to you to lead a committee to figure out what that structure should be. So while XML might not be like a straightjacket in this instance, dealing with that committee will eventually send you around the bend, and the nice smiling people with the butterfly nets will be along shortly to help you get dressed.

So, after all this, why use XML? Look at your coworkers and people you're exchanging data with. Wouldn't the world be a better place if they were all in straightjackets?

Thought so.