Hacker News new | past | comments | ask | show | jobs | submit login

> do not use template languages to generate XML.

Small correction: do not use text template languages (Jinja, moustache, erb — which seems to be the one used here considering `%= display_date %>`, raw PHP, smarty, freemarker, what have you) to generate XML. There are templating languages whose primary use case is to generate markup (including XML)[0] and (unless they're broken to uselessness) they should guarantee the output is valid XML.

> Schema-design-wise, the content:encoded and excerpt:encoded element names are deeply suspect, as if someone looked at RSS 2.0, squinted, shrugged, and invented their own ad hoc analogous namespace prefix, rather than understanding the role of elements in XML.

They seem to be using Wordpress's WXR import/export format, hence the wp-namespaced elements. The "content" and "excerpt" namespace garbage comes straight from there according to http://ipggi.wordpress.com/2011/03/16/the-wordpress-extended...

> <content:encoded> Is the replacement for the restrictive Rss <description> element. Enclosed within a character data enclosure is the complete WordPress formatted blog post, HTML tags and all.

> <excerpt:encoded> This is an unknown elementThis is a summary or description of the post often used by RSS/Atom feeds..

Considering the cottage industry of wordpress interaction, it was probably a good move to shoot for interop (should allow posterous exports to be directly imported into wordpress?). Not sure they succeeded though.

[0] genshi for instance http://genshi.edgewall.org/




   There are templating languages whose primary use case is to generate markup 
   (including XML)[0] and (unless they're broken to uselessness) they should 
   guarantee the output is valid XML.
Since they are using Rails, they should be using Builder for this: http://api.rubyonrails.org/classes/ActionView/Base.html#labe... https://github.com/jimweirich/builder


> Since they are using Rails, they should be using Builder for this

Indeed. It's really odd that they munged together an XML export in ERB when builder exist. Does it have some sort of breaking issue with namespaces or something which could explain the choice?


Builder can be a bit of a pain if you want to do things that are...let us say "questionable" (e.g. output a tag with inner content which is _not_ XML escaped).


There's no such thing in this case though, there's a single layer of tags with escaped content inside (the example document uses CDATA, but as others have noted automated generation is not a good use case for CDATA)


Agreed; just saying in general. Also, I would guess the reason they used ERB here is simply familiarity, not any type of reasoned decision.


RABL does a pretty decent job at generating XML too.


If you need layouts RABL falls apart with Ruby 1.9: https://github.com/nesquena/rabl/wiki/Using-Layouts there's a note at the of the "Using Rabl" section.


While I do not recommend people to generate XML with Jinja2, it's actually not to bad at doing that. It will escape properly for you automatically and unlike many other solutions in Python it actually supports streaming.

</biased response>


  Error on line 2: Closing tag for non-existent opening tag "biased"

  Error on line 2: Closing tags cannot have attributes


> It will escape properly for you automatically and unlike many other solutions in Python it actually supports streaming.

True and true, but it does not guarantee the output XML will be valid: as far as Jinja's concerned it's all just text is it not? Genshi also supports streaming (using `serialize`), will also properly escape everything and — using the default xml serializer — ensures the output is valid XML.

(edit: I want to note that I wasn't trying to put down jinja, it's just the first text-based template I thought of when trying to write down a list, it's a fine templating language) (just not to generate XML)


Agreed on your stipulation on Genshi and the like.

And thanks for the further reverse engineering of the likely intent of the export. I wouldn't disagree with most of the WP-centric design choices. But attempting to run through a real XML parser might've been a good choice as well. (And I note there's a fair bit of complaint on the WP forums about the difficulty of using the data for import.)


Just generated an export from Posterous. It's not just namespaces. XML files contain unescaped html entities (&nbsp; for example). What a mess.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: