Hacker News new | past | comments | ask | show | jobs | submit login

Don't you also have to escape stuff in XML? Like &gt, which is even worse.



Yes, though many languages have lenient parsers. Most browser parsers, for example, will probably only be lenient if parsing "HTML."

    new XMLSerializer().serializeToString(new DOMParser().parseFromString("<a>hello < </a>", "text/html")) 
The above in my console does as expected there. And again, entities are a very dangerous part of XML and friends.

You are correct that if you tell it that that is xml, the browser will throw it back at you. Just as the JSON parser will barf on JSON.parse("{'test':'value'}").


per specifications, json parsing is not lenient, html parsing is lenient


Right, and amusingly, more than a few json parsers are very lenient in this. That or folks abandon ship fairly quickly and go for another spec that is far more friendly.


well json definitely does not accept `{'test':'value'}` as valid input

any parser that behaves otherwise is pretty clearly buggy

json has many problems but parsing ambiguity is not really one of them


Me thinks you have never looked at the field. I'd as soon declare csv is an error free format. Only true if you ignore the proliferation of applications that get it wrong. In subtle ways, often. Still wrong.


csv is wildly ambiguous, to the frustration of ~every data science engineer in industry

json is not

show me an application that parses `{'a':'b'}` as valid JSON, i'm actually interested, probably there are some which exist, but there is no ambiguity about those applications being wrong



fun doc! it lists many of the undefined behaviors of the spec, and many of the problems in common parsers

afaict none of them permit keys or value strings to be expressed with single quotes


Apologies for the, in retrospect, somewhat lazy posting of an article with no comment. I thought that article had a section about how many of them allow single quotes if you don't "enable strict." I am not seeing it on review, though; so either I made that up in my mind, or I'm remembering another article. Either way, apologies.

I did find https://github.com/json5/json5 no a quick search that basically says what I asserted about people just jumping to another standard for things that you hand write. I was probably also thinking heavily about python's dict syntax. (And I confess, I still don't know when to use single versus double quotes in python...)


no worries mate


To be pedantic, html parsing is not lenient, it is unambiguously specified.


if that were true then browsers would refuse to render text/html responses that didn't include a closing </html> tag, i guess


No, because the closing </html> tag can be omitted according to the current HTML spec. See https://html.spec.whatwg.org/#optional-tags


this is exactly my point

html is not precisely defined


Sorry I don't understand your argument. HTML is fully and unambiguously defined, as you can see if you follow the link. Some tags are optional in certain contexts, but this is also precisely defined.


I think you're missing the point that it is defined, the current html5 spec says that <title> implies the existence of <head>, <body> implies the end of <head>, body tags imply the end of <head> and the start of <body> etc.

HTML5 is not XHTML.

<!DOCTYPE html> <title>Title <h1>Heading

expands to

<!DOCTYPE html> <head><title>Title</title></head> <body><h1>Heading</h1></body>


if `<title>A <h1>Heading` is equivalent to `<head><title>A</title></head> <body><h1>Heading</h1></body>` then this means the language is not precisely defined




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: