And the word count I gave in the article is half of the real count I ended up with, and I didn't even finish downloading all of the specs to consider.
My full write-up on the methodology is here:
Anyone who thinks that the web isn't hundreds or thousands of times more complicated than almost anything else out there is lying to themselves.
- Unrelated to an actual web standard (such as a guide for authors of web pages(www.w3.org/TR/html5-author/dimension-attributes.html
), or a guide on how to create a PDF for a W3 event (https://www.w3.org/TR/2016/NOTE-WCAG20-TECHS-20160317/pdf_no...)
- a raw xml file (www.w3.org/TR/2012/WD-its20-20121023/examples/xml/EX-locale-filter-selector-2.xml)
- a diff (www.w3.org/TR/prov-dm/diff.html)
- an error (www.w3.org/TR/unicode-xml/index.html)
None were actual signal that relates to the web's specifications.
I explained why I included these in my methodology doc. They felt this necessary to document, so I included it. The same is true of other specs I compared against, such as POSIX.
>a raw xml file
This XML file is 18 words according to my measurement. The total words I claim in my article are 113 million. Do you really think that this changes anything?
Okay, I should have caught that. There are ~700 of these and I am computing the difference these make to the word count now. I expect it will be within the >100M word margin I left on these figures. [Edit: 28M words from diffs, which eats up about 25% of the 100M word budget I allocated for errors]
123 words. See my XML comment.
Out of curiosity, is it your intention to also look for flaws in my approach to word-counting the non-web specs I compared against?
I was finally able to find one, by looking elsewhere: https://www.w3.org/TR/css-grid-1/. You include 8 copies of the css-grid-1 standard in your count. So of the small fraction of documents that are actually web standards, you're miscounting by an order of magnitude. In other words, I expect that the actual count here is off by 2 orders of magnitude and that the real size of the "relevant" web standard is 1-2 million words, and the rest is just bad measurement.
> Out of curiosity, is it your intention to also look for flaws in my approach to word-counting the non-web specs I compared against?
No, I think pointing out a 2-3 order of magnitude mistake in your methodology speaks for itself.
> They felt this necessary to document, so I included it. The same is true of other specs I compared against, such as POSIX.
The posix spec includes examples and docs yes. But so do the actual web specs (see again the css grid spec doc). What the posix spec doesn't include is a parallel version of the docs meant entirely for posix users, that is wholly irrelevant to people who are building a posix shell. Again, you're including an analysis of which PDF readers to test the accessibility of the PDF you're writing in an analysis of web standards.
For an even more egregious example, https://www.w3.org/TR/2013/CR-xpath-datamodel-30-20130108/ is one of eighty versions of the xpath datamodel spec that you count, and xpath isn't even an officially supported browser thing.
1. The majority of the documents you are including are not reasonably considered web standards
2. Of those that are, you are counting each one 5-50 times.
That's two orders of magnitude.
All your analysis has proven is that it's (ironically) difficult to machine-parse the w3 data, and that you did so in a way to justify your preconceptions.