Hacker News new | past | comments | ask | show | jobs | submit login

> I would argue that any self respecting xml parser should parser it just find and shouldn't demand the namespaces to be defined at all.

The XML Namespaces specification unambiguously requires that a namespace be declared:

> The namespace prefix, unless it is xml or xmlns, MUST have been declared in a namespace declaration attribute in either the start-tag of the element where the prefix is used or in an ancestor element (i.e., an element in whose content the prefixed markup occurs).

A self-respecting XML parser would follow the spec. A namespace-aware XML parser must fault on undeclared namespaces.

Most XML parsers are namespace-aware.

> I don't think you understand the base concept of XML much.

Pot, meet kettle.

> XML in and of itself doesn't enforce node naming. Sure if you are talking about the official spec it does

Don't you feel like you're contradicting yourself a bit there?

> Well maybe you should look into a parser that just parses as is without attempting to use some specific encoding.

So he should look into parsers which do not parse XML and have no issue mangling the content? What are they going to do, assume the encoding is ascii-compatible anyway and go to town? How wonderfully anglo-centric.

> Check out XML::Bare on cpan for perl.

XML::Bare is an XML parser in the same sense that xhtml interpreted as text/html is an XML document: not in any way, shape or form. And if that's what you're shooting for, don't pretend to suggest an XML parser and suggest a recovering "soup" parser instead, something like html5lib or BeautifulSoup.

But herein remains the issue: I expect Posterous advertised their export as XML files, not as "encoding-deficient tag soup" (which it apparently is). I'm sure TFA would have had no expectations if he'd been told he got garbage in, and would have relied on tagsoup-parsing and encoding-guessing (using whatever libraries for doing so are available in his language of choice).

As it stands, he did have the pretty basic and undemanding expectation that he could shove supposedly-XML files into an XML parser and get data.




You seem to know a lot about the XML specification. More than your parent and certainly more than me. That's great, and following specifications is good and all, but citing the spec as requiring that "a namespace-aware XML parser must fault on undeclared namespaces" does not give me any sense for why I would want it to. Put another way - what does the namespace declaration and halting error due to its omission accomplish for me?

Failing to define a content type is obviously dumb, but I can't seem to get riled up about leaving off namespace declarations.


> That's great, and following specifications is good and all, but citing the spec as requiring that "a namespace-aware XML parser must fault on undeclared namespaces" does not give me any sense for why I would want it to.

That's not really relevant though. XML is bondage and discipline.

> Put another way - what does the namespace declaration and halting error due to its omission accomplish for me?

Depends. You could think "nothing" which is basically the same mindset as using a tagsoup-parser for "xml" documents: as long as you can get stuff out of it do you really care?

The other way to think about it, and the way espoused by most XML specs (and really most serialization specs at all) is that if something goes wrong, what guarantee is it anything is right? If a namespace prefix is undefined, is it because nobody cares, because there's a typo or because the unsafe transport flipped some bytes? The parser can't know, and as is generally done in the XML world the spec says to just stop and not risk fucking things up (as it does when nesting is invalid, attributes are unquoted or decoding blows up).

What that accomplishes is the assurance that the document was correct as far as an XML parser is concerned, I guess?

If you don't care, you're free to use a tagsoup parser in the first place, after all.

> I can't seem to get riled up about leaving off namespace declarations.

I see it more from a "canary in a coal mine" point of view: namespace declarations not being there hints that either they're using a non-namespace-aware serializer (unlikely) or they're not using an XML serializer at all, and here dragons lurk. In this case it's confirmed that they're pretty certainly using ERB text templates to generate their XML, and that means the document could be all kinds of fucked up with improper escaping, invalid characters and the like.

Meaning maybe the export can't be trusted to have exported my data without fucking it up.


A central idea of XML was that - in reaction to the mess that was HTML - any tool that calls itself XML MUST barf loudly on anything that is not XML, so that you could never have a situation where one tool is happily calling something XML and another tool barfs on it. (Because, given the choice, humans will regularly mess it up but not notice unless their tool tells them so, and what works for one tool won't for another.)

The requirement of strictness everywhere is a precondition for interoperability between diverse toolsets.


"... what does the namespace declaration and halting error due to its omission accomplish for me?"

It encourages generation of proper XML.

If consumers accept invalid XML, guessing at what it's supposed to mean, then producers will become sloppier over time, since there's no penalty for failing to follow the specification. Eventually producers will be so sloppy that producers will no longer be able to make meaningful guesses.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: