

Tim Bray on the style chosen for writing the HTML5 and WebSockets standards - bensummers
http://www.tbray.org/ongoing/When/201x/2010/02/15/HTML5

======
axod
>> So I went and read the Web Socket protocol and my reaction was more or less
the opposite. I like the protocol and I gather it’s already been implemented
and works. But I found the spec hard to read, amazingly long and complex for
such an admirably simple protocol, and missing information that seemed
important.

Completely agree. I found it very hard to get to the 'meat' of what the data
format actually was. Look at the following.

Here's a sample from the spec:

    
    
       The server must run through the following steps to process the bytes
       sent by the client.  If at any point during these steps a read is
       attempted but fails because the Web Socket connection is closed, then
       abort.
    
       1.  _Frame_: Read a byte from the client.  Let /type/ be that byte.
    
       2.  If /type/ is not a 0x00 byte, then the server may disconnect from
           the client.
    
       3.  If the most significant bit of /type/ is not set, then run the
           following steps:
    
           1.  Let /raw data/ be an empty byte array.
    
           2.  _Data_: Read a byte, let /b/ be that byte.
    
           3.  If /b/ is not 0xFF, then append /b/ to /raw data/ and return
               to the previous step (labeled _data_).
    
           4.  Interpret /raw data/ as a UTF-8 string, and apply whatever
               server-specific processing is to occur for the resulting
               string (the message from the client).
    
           Otherwise, the most significant bit of /type/ is set.  Run the
           following steps.  This can never happen if /type/ is 0x00, and
           therefore these steps are not necessary if the server aborts when
           /type/ is not 0x00, as allowed above.
    
           5.   Let /length/ be zero.
    
           6.   _Length_: Read a byte, let /b/ be that byte.
    
           7.   Let /b_v/ be integer corresponding to the low 7 bits of /b/
                (the value you would get by _and_ing /b/ with 0x7F).
    
           8.   Multiply /length/ by 128, add /b_v/ to that result, and
                store the final result in /length/.
    
           9.   If the high-order bit of /b/ is set (i.e. if /b/ _and_ed
                with 0x80 returns 0x80), then return to the step above
                labeled _length_.
    
           10.  Read /length/ bytes.
    
           11.  Discard the read bytes.
    
       4.  Return to the step labeled _frame_.
    
    

I think you could better write this as:

    
    
      Data is either framed as a 0xff terminated UTF8 string
      (If the first byte of the frame has a 0 high bit), or as
      a length specified byte array if the first byte has a
      1 high bit. The length is specified using 7 bit variable
      length encoding, with the last byte having a 0 high bit.
    
      Example: 0x00 | 0x01 0x02 0x03 0x04 | 0xff
      UTF8 string (0x01,0x02,0x03,0x04)
    
      Example: 0x82 0x03 | 0x55 0xaa 0x55 ... 
      Data length = 0x103, data bytes follow.

~~~
mhansen
Goddamn do I hate specs that try and write code in English. Specifications are
for WHAT I have to implement, not HOW.

For another offender, see ECMA-262, the Javascript specification.

------
hypermatt
Read Ian's response in the comments, its brilliant. I love when the author of
the spec casually comments on some blog ;)

~~~
RyanMcGreal
Also worth noting is Dorian Taylor's amusing troll.

~~~
doriantaylor
I have more where that came from.

------
mark_l_watson
A good read, good points criticizing the spec. At first, I was dead set
against HTML5 because I had been waiting for wide XHTML+RDFa adoption. That
isn't going to happen now. Still, if HTML5 implementations are good and
support web app portability between browsers, then I am happy enough with it.
I also liked Bray's point on bring patent troll problems with video codecs
more into the public view.

