> Most of the web quickly moved to XHTML. Most of the web didn't move to XHTML. ...

_fq4v · on April 13, 2018

> and having to be extremely conscious of any possible stray < that could sneak in to a page without being quoted, would have been preferable to:

Any system that publishes content that would let this kind of thing pass is incredibly insecure, and shouldn't be on the internet. Today it's a stray <. Tomorrow it's a stray <script>

It's no wonder software is where it is today with attitudes like these.

lambda · on April 13, 2018

Not if that < had slipped in because it was in a piece of static text in a string somewhere in the source code.

You can apply mandatory quoting to untrusted input all you want, but there are going to be times when you have trusted strings that can still contain stray characters that will make the resulting markup invalid. And in many cases you don't want to have mandatory quoting for all of that, because these strings may have markup you want to include.

And yeah, you can argue that instead of generating content by appending strings, you should be building up a proper type-safe DOM structure that can be serialized. I'll wait while you go boil the ocean of converting every single web application framework that exists now outside of a couple of obscure type-safe functional programming frameworks, and in the meantime I'll be able to browse the real web without every other page giving me validation errors.

_fq4v · on April 13, 2018

To be fair, I only use obscure type-safe functional programming frameworks. That's what I'm employed to do, and this obviously impacts my feelings on the matter. Personally, I think it's irresponsible to use anything that could be this unsafe. This doesn't mean everyone needs to use FP, just that frameworks and libraries should be chosen so as to guarantee safety. There are easy-to-use libraries for all these things in every language.

In no other world of engineering is this attitude okay. If you were a civil engineer and had to hold a license to practice due to the danger your designs could present to society, this attitude would eventually cause you to lose your ability to practice. It's becoming more and more clear that software can have similar levels of impact, and software engineers should practice as scuh.

lambda · on April 13, 2018

I agree with you that we do need to do better about writing more robust software, and type safe languages are a good way to do that.

But what you're saying is as if you suggest that since the metric system is more consistent and more widely used than the English, I as a bolt distributor should start selling my bolts in metric sizes, despite the fact that the nuts that everyone has are in English sizes.

The browser vendors, at least, are working on implement their browsers in more type-safe languages (https://github.com/servo/servo), but even still they have to work with the content that is produced by thousands of different languages, frameworks, and tools, and millions of hand written HTML files, templates, and the like. Just turning on strict XML parsing doesn't make that go away, it just makes your browser fail on most websites.

_fq4v · on April 13, 2018

A good first step in enforcing web standards would be if browsers would detect these rule violations, and -- instead of failing -- put a giant banner on the top of the page warning end users that the site may be compromised and could compromise data.

Soon, every business will be clamoring to fix their buggy software, and users will still be able to access the unsafe websites they so desire.

greenhouse_gas · on April 14, 2018

You can be unsafe even with typesafe builders. See

fn build(text_to_show: &str) -> HTML{ HTML(Body(H1(text_to_show))) }

What if text_to_show wasn't sanitized? You got yourself a XSS. And if you do sanitize it (and keep it in a StrSanitized type), what are the chances of accidental XSS?

Really, what should have been done is a "user supplied tag", which automatically displays everything as plain text, like <user-supplied id="ahdjdh37736xhdhd"> Content </user-supplied id="ahdjdh37736xhdhd">

lambda · on April 14, 2018

You would generally want the general purpose string type in your language to always be escaped when serializing, and only allow avoiding that if you opt-in explicitly.

So, for instance you'd have an H1::new(contents: TextNode) constructor, and you'd have to build a TextNode; if you build TextNode::new(text: &str), then it would escape it. If you wanted to explicitly pass in raw HTML, then you'd need something like HTMLFragment::from_str(&str), and it would parse and return the fully parsed and appropriately typed fragment object that could then be used to build larger fragments.

There might be some way to unsafely opt out, like HTMLFragment::from_str_raw(&str), that would just give a node that when traversed would just be dumped raw into the output, but that would be warned against and only used if you wanted to avoid the cost of parsing and re-serializing some large, known-safe fragment; it wouldn't be what you would normally use.

_fq4v · on April 14, 2018

Your builder isn't really using types to guarantee safety. You can write untyped programs in a strongly typed language, by just coercing everything to strings, but this isn't what I mean when I say 'type-safety'.