

Why isn't server-side web made with XML libs? - slashnull

I&#x27;m not a professional web developer, far from it, to be fair I only have been taught PHP and JSP server-side web programming at school, and I poked around with Node.js; so far I noticed that, outside from ajax-based apps (which I reckon are becoming almost ubiquous), the dominant idiom is to represent the final HTML page as a text stream, and to append text to it in a sequential manner.<p>Two variants exist: 1) there is the continuation-passing model, used by POJO Java servlets and Node.js, of creating a request and a response objects, and passing them along the pipeline of the request handlers while adding text to the response stream, and 2) there is the tag model, which is the essence of PHP, JSP and if I understood correctly, ASP, where the page itself is raw HTML, with the option of adding actual code to denote business logic, with some syntactic sugar to append to the underlying HTML page which <i>is</i> the text stream.<p>Now what bugs me is that [X]HTML tries really hard to be a <i>tree</i>, and the current model of treating the HTML response as a fundamentally flat stream leaves the responsibility of creating a well-formed tree to the developer. XML probably is the simplest recursive grammar around, which is the cause of the woes of people who try to treat it with Regex, and Java almost treat XML like an actual extension of itself, and most languages offer decent libraries to read, create and generally mangle XML trees.<p>So why isn&#x27;t XHTML created by building XML trees?<p>Why isn&#x27;t it mainstream to do stuff like this? (in pseudo-Java)<p>Element root = new Element(&quot;html&quot;);
Element body = new Element(&quot;body&quot;).append(new Element(&quot;p&quot;).text(&quot;Hello, world!&quot;));
root.append(body);<p>Why don&#x27;t I ever see that idiom?
======
pjungwir
One templating approach that takes HTML's tree structure more seriously is
Haml. Its syntax (mostly) enforces valid output, so you don't have to worry
about closing/matching tags etc. This also makes it very concise, and (since
its indentation matches the HTML tree) extremely easy to scan & read. It's
basically a DSL optimized for markup trees like HTML.

Composing a DOM by instantiating objects for each node sounds awful, but I
think you're right that flat plain-text templates miss an opportunity to
leverage HTML's tree-like structure. Haml does a great job exploiting that
structure.

------
moocow01
So you are correct that purely programmatically its a bit wonky to construct
an XML document using a bunch of strings and then pass that to the client and
pray that that your XML (HTML) is well formed enough.

I think the reason for why HTML construction is done the current way that it
is is that it goes back to when most web companies had "software engineers"
handling the system and "web developers/designers" who spruced up the data
coming out of the system with HTML. The designers didn't know much deep
programming per say but they knew HTML and would create HTML templates in
plain text with placeholders to expose anything dynamic. Those templates would
then be handed off to engineering whose sole concern was getting the system
right and didn't really want to care about HTML. This workflow is also what
gained PHP a lot of traction originally.

So ultimately I think stitching together HTML with strings makes the system a
lot more accessible to semi-technical folks and its honestly probably a lot
faster to do development by typing out HTML rather than building XML objects.

------
123___
Using DOM implementations like that quickly gets bloated and hard to debug,
not to mention performance overhead from the DOM itself (e.g. PHP).

Personally, I tried working with HTML like that in PHP with a class that
parsed HTML files into a hierarchical array which could be manipulated like:

$var['html']['body']['p']['strong'] = 'etc';

With support for attributes and CDATA and everything but no matter how
seductive it may be from a programmer's point of view, it's always just more
trouble than it's worth to work like that.

(I ended up just using that xml2array/array2xml crap to create XML trees for
XSLT)

~~~
slashnull
Death of a thousand cuts, then.

Sad.

Thanks for the input.

------
michaelmior
Another reason is that this style of programming makes it very difficult to
improve your TTFB (time to first byte). If the entire document can be
manipulated at any time, you can't start sending anything to the browser until
all your processing is done. This means a lot of unnecessary buffering on the
server side and ultimately a delay in pushing data to the client. Modern
browsers do pretty well at rendering incomplete HTML, so the faster you can
start sending data, the faster the page load will be perceived.

~~~
slashnull
I actually didn't know web servers started sending packets before I was done
with writing to the stream.

Interesting.

~~~
michaelmior
This depends on the configuration of both your web server and your application
server. There's a good chance things won't work this way out of the box. But
this is generally something that you want for high performance.

------
wmf
That code is crazy verbose; that's why you don't see people using it. Even
JQuery-like wrappers would still be more verbose than templating. Stuff like
E4X and XQuery is more concise but I think it came out after XML's reputation
had already been destroyed.

~~~
slashnull
Ok, interesting response, thank you.

But there's a point I'd like to be clarified, if possible: it's not the first
time I read that XML is bad in some abstract way, so I'd really be interested
in knowing how was XML's rep "destroyed"?

~~~
johnbm
An important one is that XML does not map cleanly to any programming
language's typical data structures, even those that explicitly support it. It
always turns into an elaborate object hierarchy with awkward query syntax. The
combination of unique named string-only attributes with ordered, non-unique
named child objects is awkward and ambiguous (i.e. should that property be an
attribute, or a child?). Contrast for example with JSON, which translates
directly into arrays and hashes, structures available in any decent language.

E.g. you have <user name="john" address="123 Oracle Road"></user>

Now you decide you need multiple addresses per user. Do you: a) invent an ad-
hoc string serialization scheme for the address="" property, or b) completely
restructure your XML into:

<user name="john"> <address>123 Oracle Road</address> <address>456 XML
Lane</address> </user>

But in that case, why not just make every property a child?

XML also spawned idiotic things like XSLT, which most agree is pretty
terrible. If you try to use XSLT for HTML templating, by definition you have
to escape every meaningful HTML character like "<>&'". If you wish to print
one of those characters in your templates, you have to double escape them. No
designer I know of would touch it with a ten foot pole, and every programmer
I've met who loved XML and its associated technologies was terrible at their
job.

Basically XML is strongly associated with the bloaty enterprise Java world,
who used XML as their hammer to beat down anything remotely nail-like.

~~~
brudgers
As a tree, XML syntax is largely isomorphic with that of Lisp's exposed AST.

~~~
slashnull
Well, yeah, but XML has attributes. I was writing about how that was a problem
and so on, but I came up with this

<look attr="at" attr2="up"> this <awkward> problem </awkward></look>

(look (|attr at) (|attr2 up) this (awkward problem))

just by saving "|" as a meta to annotate an attribute pair.

And the weird problem of lone tags looking like unclosed ones magically flies
away (I think) since

<script type="text/javascript" src="foo.js" {">", "/>", "></script>"}

is just

(script (|type text/javascript) (|src foo.js))

...sometimes I wonder why isn't everything ever made out of S-exprs.

~~~
brudgers
Lisp has keywords:

    
    
       (look #:attr "at" #:attr2 "up" this (awkward problem))
    

Or to handle XML on the fly, cons cells are another option:

    
    
       (look '(attr . "at") '(attr2 . "up) this (awkward problem))
    

Going further since it's Lisp and we're just parsing, we could pretty much use
whatever syntax we want by adding a macro:

    
    
        (look (attributes attr "at" attr2 "up") this (awkward problem))

