(Warning: long but not entirely off topic)
David Golden opined:
> XML is NOT "one format". It's a (heinously bloated) means of specifying
file
> formats that can be parsed in a particular way. (XML == tree)
To use the appropriate jargon, XML is a meta format - something like a
specification about a specification.
This is a common mistake: two programs can exchange an XML file, but the one
sends "Amount" while the other expects "Total" which is the same as if they
had send random byte streams to each other. XML is not a solution on its
own. But it is a start.
As for heinously bloated: it is very redundant, but there is a lot of sense
to having opening and closing tags that match. Especially in text files that
must be:
* human readable - shows what the end of a section applies to, useful in
large documents.
* human writeable - makes it possible for a parser to detect incorrectly
closed tags
If file size is an issue, your average compression program should
efficiently squash any XML file.
Word documents tend to contain a lot of cruft, and can correctly be
described as heinously bloated. XML is somewhat bloated, but for a reason.
> XML is overrated.
Is it? Compared to what? AFAIK the only thing that comes close is EDS, and
that is not a pretty format by any description.
Look at the vast array of Unix file formats (fstab, aliases, named.cond,
inetd.conf to name a few). There are any number of ways that one can define
such a format: tab seperators, comma seperators, braces, brackets,
semicolons at the end of a line, no semicolons. How does one escape control
characters? How does one parse a file, include other files? How can a file
be verified?
And do not get me started on that sad Windows .ini file format. Why it is
being adopted by some GUI programmers I will never know.
One of the great successes of Unix is the culture of the command line, which
includes the principal that configurations must be stored in text files. But
there is no common format for such text files. And these files are not
normally designed to be manipulated by anything other than a sentient
being - witness the problems faced by programs such as linuxconf and most
other GUI configuration tools. These must modify files without upsetting the
very picky parsers that use them.
It may be heresy, but I think it would be an excellent idea if the contents
of /etc were converted to XML. However, there would need to be some
restrictions:
1. A simple, efficient and *working* XML parser library needs to be
available. The regexp library would be a good role model.
2. Applications need to have a common approach to defining their schema,
principally deciding which information goes between tags and which
information goes in as properties.
3. Meaningful command line tools need to be developed: XML equivalents for
grep and sed would be a good start (Try and extract information from an XML
file using sed and grep.)
4. Useful XML editors need to be available. (vi can work in an emergency,
emacs has an XML mode that needs work.)
5. A concious decision needs to be taken on how to structure the XML - as
many small documents or a few big documents. It would most probably also be
useful to have a configuration daemon that monitors the configuration files
and stores the information in memory. AFAIK the file pam module already does
something similar.
As an academic exercise, here is an example of such a file:
---------------------------------------------
<config>
<hostname>bilbo</hostname>
<domain>shire.org</domain>
<interfaces>
<interface id="eth0" ip="bilbo" dhcp="yes" />
<interface id="eth1" ip="192.168.100.3">
<netmask>255.255.255.0</netmask>
</interface>
</interfaces>
<names>
<hosts>
<files />
<dns>
<server ip="192.168.1.3" />
<domain name="shire.org" />
</dns>
</names>
<filesystems>
<filesystem location="/">
<device>/dev/hda1</device>
</filesystem>
<filesystem location="/home" type="nfs">
<device>mordor:/home</device>
</filesystem>
<filesystem location="/mnt/cdrom" mountatboot="no">
<device>/dev/hdb</device>
<option>read-only</option>
</filesystem>
</filesystems>
<hosts>
<host ip="192.168.1.1" name="gateway" />
<host ip="192.168.1.2" name="server">
<alias>bilbo</alias>
<alias>bilbo.shire.org</alias>
</host>
</hosts>
<groups>
<group name="root" id="0">
<member>root</member>
</group>
<group name="staff" id="1">
<member>mfrench</member>
</group>
</groups>
<users home="/home/{name}" group="staff" shell="/bin/bash">
<user name="root" id="0" group="root" home="/usr/root"
shell="/bin/sh">Super-User</user>
<user name="mfrench" id="100">Matthew French</user>
<user name="nobody" id="60001" shell="">Nobody</user>
</users>
</config>
---------------------------------------------
XML has an INCLUDE directive that makes it possible to split a bigger file
into many smaller ones, but keep the same structure.
Command line tools would be something like:
# xmlgrep -l */filesystem -v type=nfs -v device~=mordor* config.xml
<filesystem location="/home" type="nfs">
<device>mordor:/home</device>
</filesystem>
# xmlsed -l "*/filesystem/device" -e "s/^mordor:/frodo:/" config.xml
...
<filesystem location="/home" type="nfs">
<device>frodo:/home</device>
</filesystem>
...
# xmlinsert -l "*/users" -f config.xml '<user name="dgolden" id="101">David
Golden</user>'
...
<users home="/home/{name}" group="staff" shell="/bin/bash">
<user name="root" id="0" group="root" home="/usr/root"
shell="/bin/sh">Super-User</user>
<user name="mfrench" id="100">Matthew French</user>
<user name="nobody" id="60001" shell="">Nobody</user>
<user name="dgolden" id="101">David Golden</user>
</users>
...
# xmlprint -l "*/filesystem" -f "(location) is (device~)" config.xml
/ is /dev/hda1
/home is mordor:/home
/mnt/cdrom is /dev/hdb
---------------------------------------------
Then David Golden suggested:
> Well, dunno about that. Lisp sexps are about half as verbose for the
> pretty much the same amount of information carrying ability
Hmm. That is a matter of taste. I am not sure I find Lisp sexps any easier
to read.
At the end of the day, XML can be converted into other formats - a Python
like indented format comes to mind, or even a variation of the .ini file.
But XML is widely known, widely used and is a very complete specification.
Yes it can be unwieldy. Yes it is wasteful. Yes it is ugly. But so is any
other complex meta-format that I can think of. My personal belief is that is
better to have one common format for all files that is slightly ugly than to
have to guess the exact format of every single configuration file.
And looking at my example I can already think of problems that would not
have happened if Unix configuration files were in Unix format. Missing tabs
in fstab for example, or losing a ':' in /etc/passwd.
> A markup language would allow overlapping tagbodys, since "markup"
> should be like a layer of logical highlighter pen over a data stream.
That's
> what _markup_ is for. XMLers lost sight of that, and started thinking
their
> tags were more important than the text in between them, and have produced
a
> mediocre tree-structured data representation format.
I think there are two different uses for markup:
1. Configuration files, where it makes sense to have a strict heirarchical
format.
2. Free flowing text documents (see original thread), where a strict
heirarchy can get in the way.
I agree that in the second case XML has lost its way a little bit. But most
uses of XML I have encountered are for the first case.
Is there an XML version of LaTeX yet? :)
And finally:
> Perhaps XML is a plot so that C++ and Java weenies don't have to admit
> to themselves that the unbearably smug Lispers were right all along... :-)
Thus confirming the smug Lispers are both paranoid and in denial? (Spot the
C++ and Java weenie.)
<big grin />
- Matthew
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!