Hacker News new | past | comments | ask | show | jobs | submit login

> Most of the web quickly moved to XHTML.

Most of the web didn't move to XHTML.

A lot of people who were interested in being standards compliant moved to XHTML 1.0 Transitional, which was the HTML compatibility subset, but they only ever served it and validated it as HTML, not XHTML, because if you served it as XHTML, one single stray < that someone had forgotten to quote somewhere would break the parsing of the whole page.

The piece written by Hixie was influential because it was a wake up call that the direction the standards bodies were going in was pretty much fruitless, and that there could be a much better way to do it which wouldn't involve breaking compatibility with all of the existing content and would give web developers and users features that they actually wanted.

> As web developers, we should follow the WHATWG and ignore the W3C, because the W3C have lost the political battle for HTML and we need to get our stuff working on browsers, all of whom follow WHATWG. But that's an unfortunately pragmatic approach that shouldn't amount to acceptance.

I fail to see how there is anything unfortunate about this. What about rewriting everything in XHTML 2.0 (https://www.w3.org/TR/2010/NOTE-xhtml2-20101216/), and having to be extremely conscious of any possible stray < that could sneak in to a page without being quoted, would have been preferable to:

1. Consistent parsing support for existing content, and content that might have slight problems like stray <, in all browsers

2. Standardization of things that people actually use to build web apps, like XMLHttpRequest and Canvas

3. Consistent handling of encodings between browsers, including encoding sniffing

4. Consistent handling of quirks mode vs. standards mode between browsers

5. Actually having browsers support compatibility with vendor-prefixed versions of features, because some browsers widely used introduced prefixed features that web developers actually started relying upon

And also, have you ever tried getting involved with the WHATWG process? I have, and I find that they are very receptive to intelligent discussion of issues.

What doesn't work well is to insist that you have a problem and that this particular solution must be used to address the problem; because a lot of times, it's easy to come up with some proposed solution but it then turns out that it's either a lot more complex in practice, your proposal does't fit in with the rest of the ecosystem well, or the problem can actually be solve just in tooling on top of HTML without having to change the spec at all and then wait for multiple browser vendors to all independently implement it.




> and having to be extremely conscious of any possible stray < that could sneak in to a page without being quoted, would have been preferable to:

Any system that publishes content that would let this kind of thing pass is incredibly insecure, and shouldn't be on the internet. Today it's a stray <. Tomorrow it's a stray <script>

It's no wonder software is where it is today with attitudes like these.


Not if that < had slipped in because it was in a piece of static text in a string somewhere in the source code.

You can apply mandatory quoting to untrusted input all you want, but there are going to be times when you have trusted strings that can still contain stray characters that will make the resulting markup invalid. And in many cases you don't want to have mandatory quoting for all of that, because these strings may have markup you want to include.

And yeah, you can argue that instead of generating content by appending strings, you should be building up a proper type-safe DOM structure that can be serialized. I'll wait while you go boil the ocean of converting every single web application framework that exists now outside of a couple of obscure type-safe functional programming frameworks, and in the meantime I'll be able to browse the real web without every other page giving me validation errors.


To be fair, I only use obscure type-safe functional programming frameworks. That's what I'm employed to do, and this obviously impacts my feelings on the matter. Personally, I think it's irresponsible to use anything that could be this unsafe. This doesn't mean everyone needs to use FP, just that frameworks and libraries should be chosen so as to guarantee safety. There are easy-to-use libraries for all these things in every language.

In no other world of engineering is this attitude okay. If you were a civil engineer and had to hold a license to practice due to the danger your designs could present to society, this attitude would eventually cause you to lose your ability to practice. It's becoming more and more clear that software can have similar levels of impact, and software engineers should practice as scuh.


I agree with you that we do need to do better about writing more robust software, and type safe languages are a good way to do that.

But what you're saying is as if you suggest that since the metric system is more consistent and more widely used than the English, I as a bolt distributor should start selling my bolts in metric sizes, despite the fact that the nuts that everyone has are in English sizes.

The browser vendors, at least, are working on implement their browsers in more type-safe languages (https://github.com/servo/servo), but even still they have to work with the content that is produced by thousands of different languages, frameworks, and tools, and millions of hand written HTML files, templates, and the like. Just turning on strict XML parsing doesn't make that go away, it just makes your browser fail on most websites.


A good first step in enforcing web standards would be if browsers would detect these rule violations, and -- instead of failing -- put a giant banner on the top of the page warning end users that the site may be compromised and could compromise data.

Soon, every business will be clamoring to fix their buggy software, and users will still be able to access the unsafe websites they so desire.


You can be unsafe even with typesafe builders. See

fn build(text_to_show: &str) -> HTML{ HTML(Body(H1(text_to_show))) }

What if text_to_show wasn't sanitized? You got yourself a XSS. And if you do sanitize it (and keep it in a StrSanitized type), what are the chances of accidental XSS?

Really, what should have been done is a "user supplied tag", which automatically displays everything as plain text, like <user-supplied id="ahdjdh37736xhdhd"> Content </user-supplied id="ahdjdh37736xhdhd">


You would generally want the general purpose string type in your language to always be escaped when serializing, and only allow avoiding that if you opt-in explicitly.

So, for instance you'd have an H1::new(contents: TextNode) constructor, and you'd have to build a TextNode; if you build TextNode::new(text: &str), then it would escape it. If you wanted to explicitly pass in raw HTML, then you'd need something like HTMLFragment::from_str(&str), and it would parse and return the fully parsed and appropriately typed fragment object that could then be used to build larger fragments.

There might be some way to unsafely opt out, like HTMLFragment::from_str_raw(&str), that would just give a node that when traversed would just be dumped raw into the output, but that would be warned against and only used if you wanted to avoid the cost of parsing and re-serializing some large, known-safe fragment; it wouldn't be what you would normally use.


Your builder isn't really using types to guarantee safety. You can write untyped programs in a strongly typed language, by just coercing everything to strings, but this isn't what I mean when I say 'type-safety'.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: