Hacker News new | past | comments | ask | show | jobs | submit login

Those specifications from 1997 are still relevant. That's why we end up with things like quirks mode:


And on the subject of WHATWG, all of them were excluded from the word count. And, the JavaScript spec, and nearly all of the JavaScript APIs browsers are implementing. Things omitted include WebGL, Web Bluetooth and Web USB, the native filesystem API, WebXR, Speech APIs... and, the informative notes you mentioned are (1) a rounding error when compared to the specs, and (2) are also included in the word counts for POSIX, C11, and so on.

And the word count I gave in the article is half of the real count I ended up with, and I didn't even finish downloading all of the specs to consider.

My full write-up on the methodology is here:


Anyone who thinks that the web isn't hundreds or thousands of times more complicated than almost anything else out there is lying to themselves.

I just poked through 50+ the first ~4000 things in that list. Of them, every single one was either

- Unrelated to an actual web standard (such as a guide for authors of web pages(www.w3.org/TR/html5-author/dimension-attributes.html ), or a guide on how to create a PDF for a W3 event (https://www.w3.org/TR/2016/NOTE-WCAG20-TECHS-20160317/pdf_no...)

- a raw xml file (www.w3.org/TR/2012/WD-its20-20121023/examples/xml/EX-locale-filter-selector-2.xml)

- a diff (www.w3.org/TR/prov-dm/diff.html)

- an error (www.w3.org/TR/unicode-xml/index.html)

None were actual signal that relates to the web's specifications.

>such as a guide for authors of web pages

I explained why I included these in my methodology doc. They felt this necessary to document, so I included it. The same is true of other specs I compared against, such as POSIX.

>a raw xml file

This XML file is 18 words according to my measurement. The total words I claim in my article are 113 million. Do you really think that this changes anything?

>a diff

Okay, I should have caught that. There are ~700 of these and I am computing the difference these make to the word count now. I expect it will be within the >100M word margin I left on these figures. [Edit: 28M words from diffs, which eats up about 25% of the 100M word budget I allocated for errors]

>an error

123 words. See my XML comment.

Out of curiosity, is it your intention to also look for flaws in my approach to word-counting the non-web specs I compared against?

I don't think you fully appreciated my comment. I've looked at now 100+ documents from that list. Not a single one has had actual content related to the web standard.

I was finally able to find one, by looking elsewhere: https://www.w3.org/TR/css-grid-1/. You include 8 copies of the css-grid-1 standard in your count. So of the small fraction of documents that are actually web standards, you're miscounting by an order of magnitude. In other words, I expect that the actual count here is off by 2 orders of magnitude and that the real size of the "relevant" web standard is 1-2 million words, and the rest is just bad measurement.

> Out of curiosity, is it your intention to also look for flaws in my approach to word-counting the non-web specs I compared against?

No, I think pointing out a 2-3 order of magnitude mistake in your methodology speaks for itself.

> They felt this necessary to document, so I included it. The same is true of other specs I compared against, such as POSIX.

The posix spec includes examples and docs yes. But so do the actual web specs (see again the css grid spec doc). What the posix spec doesn't include is a parallel version of the docs meant entirely for posix users, that is wholly irrelevant to people who are building a posix shell. Again, you're including an analysis of which PDF readers to test the accessibility of the PDF you're writing in an analysis of web standards.


For an even more egregious example, https://www.w3.org/TR/2013/CR-xpath-datamodel-30-20130108/ is one of eighty versions of the xpath datamodel spec that you count, and xpath isn't even an officially supported browser thing.

I think I was extremely generous with my margins and went to lengths to be selective with my inclusion criteria, I didn't even catalogue everything under those criteria, and I omitted huge swaths of web standards on the basis that (1) it was more forgiving to W3C and (2) they would be difficult to compare on the same terms. At most you've given a credible suggestion that there might be an order of magnitude off, but even if there were, it changes the conclusions very little. I explained all of that and more in my methodology document, and I stand by it. If you want to take the pains to come up with an objective measure yourself and provide a similar level of justification, I'm prepared to defer to your results, but not when all you have is anecdotes from vaugely scanning through my dataset looking for problems to cherry pick.

No, I've given credible reasons for two orders of magnitude:

1. The majority of the documents you are including are not reasonably considered web standards

2. Of those that are, you are counting each one 5-50 times.

That's two orders of magnitude.

All your analysis has proven is that it's (ironically) difficult to machine-parse the w3 data, and that you did so in a way to justify your preconceptions.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact