

Accept-Charset Is No More - charliesome
http://hsivonen.iki.fi/accept-charset/

======
TorKlingberg
"There is no way for any non-UTF-8 encoding to represent something more, since
everything gets converted to UTF-16 internally."

Internally in what? Certainly not servers, but do all major browsers use
UTF-16 internally?

Also, does Unicode include all characters from various east Asian encodings
now?

~~~
hoppipolla
The DOM and ECMAScript both assume that strings are sequences of UTF-16
codepoints. So while a browser could use non-UTF-16 internally it wouldn't
help much because you would have to convert to UTF-16 in all the externally-
facing APIs anyway.

~~~
robin_reala
Nearly 100% true, but not quite: <http://mathiasbynens.be/notes/javascript-
encoding>

~~~
obtu
'𝌆'.length == 2

ಠ_ಠ They did the same kludge as the old narrow builds of Python.

------
shocks
Chrome 16.0.912.77 m is sending Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
for me.

~~~
hsivonen
I was wrong to claim Chrome didn't send Accept-Charset.

Before I wrote the post, I opened another charset-related test case in Chrome
(on two operating systems even) and after opening that URL and the relevant
URL in various browsers thought I had opened the relevant test URL in all
browsers I was testing, but apparently I hadn't opened it in Chrome on any OS.

Very embarrassing. Sorry.

So 4/5 OK, 1/5 still to go. Not quite newsworthy yet. :-(

------
NelsonMinar
Does anyone know the historical justification for charset negotiation in the
first place? I guess the idea was that the browser and the server would agree
on what encoding they'd use and the server would auto-translate for the web
visitor. But I can't imagine that ever being used in practice.

~~~
pilif
The idea wasn't necessarily for the server to translate because back when this
was invented, there was no Unicode and no UTF-8, so there might very well have
been no way to losslessly translate.

The client would send an Accept-Charset header to tell the server what it
supports and the server would send either the content in the appropriate
encoding (if available) or a "406: Not Acceptable" error.

~~~
NelsonMinar
I think Unicode predates Accept-Charset. Unicode 1.0 was published late 1991.
Accept-Charset was formalized in the HTTP 1.0 RFC (1996). Usage presumably
predates that, but it's not described in the HTTP 0.9 docs (which say ASCII
only!) <http://www.w3.org/Protocols/HTTP/AsImplemented.html>

But what really matters is widespread acceptance of Unicode and UTF-8 and that
was definitely later in coming. Thanks to dchest below for referencing a
transcoding server. It's interesting the HTTP/1.0 docs characterize Accept-
Encoding as a way to signal the client could handle something other than ASCII
and ISO-8859-1.

Google started favoring UTF-8 in search result pages somewhere around 2002 or
2003.

------
Eduard
Huh, is this a hoax? My installation of Chrome still sends Accept-Charset for
every HTTP GET request.

~~~
hoppipolla
It seems like a weird thing to perpetrate a hoax about :)

A more reasonable conclusion might be that the author made a mistake when
testing Chrome. Having passed on the reports that Chrome still produces the
header, the author has verified that this is indeed the case. I guess he will
update the article.

------
nodata
Interesting. Related: Can anyone explain to me how different versions of
Unicode are handled?

~~~
jensnockert
It isn't, they are backwards compatible.

