
Is the use of “utf8=✓” preferable to “utf8=true”? - tomse
http://programmers.stackexchange.com/questions/168751/is-the-use-of-utf8-preferable-to-utf8-true
======
ollysb
Sorry to be so meta, but what on earth was the point of extracting
programmers.stackexchange.com from stackoverflow.com? Is this why so many
questions get closed as being "off topic" on stackoverflow now? </rant>

~~~
xhrpost
I believe the guidelines divide the two as objective vs. subjective.

Stackoverflow (Objective): Why does this code give me a syntax error?

Programmers (Subjective): What programming methodology best fits my project
and team?

~~~
mratzloff
Usually subjective questions are now closed as "not constructive".

~~~
xhrpost
Looking at their current faq, it seems that they've better defined (and
perhaps somewhat re-defined) what is to be asked there. Understandable as I
stopped browsing the site because the questions got annoying. Compare the old
faq:
[http://web.archive.org/web/20100912194040/http://programmers...](http://web.archive.org/web/20100912194040/http://programmers.stackexchange.com/faq)

------
jerf
So, of course, the opposite of that is utf8="✘", right?

Hmmm... there's something wrong with that idea, but I can't quite put my
finger on it....

~~~
buro9
You didn't read the article.

This isn't a config file, this is the query string of a URL, or more
importantly the POST data of a form.

From the article:

> By default, older versions of IE (<=8) will submit form data in Latin-1
> encoding if possible. By including a character that can't be expressed in
> Latin-1, IE is forced to use UTF-8 encoding for its form submissions, which
> simplifies various backend processes, for example database persistence.

> If the parameter was instead utf8=true then this wouldn't trigger the UTF-8
> encoding in these browsers

~~~
Sumaso
"Hmmm... there's something wrong with that idea, but I can't quite put my
finger on it...."

I think your missing the sarcasm...

~~~
jemfinch
No, the sarcasm is obvious. It's just not very funny, and doesn't positively
contribute to the level of discourse here.

~~~
anjc
Of course it positively contributes. It's the first comment to say that it's a
silly way of encoding the form as utf-8 when utf8="✘" will also do the same
thing, even though it's counterintuitive.

~~~
jerf
That is actually what I was going for; the humor attempt was a bonus.

~~~
anjc
ConstructiveHumour=✓

Wait, true...true

------
grey-area
I've often wondered if they could get rid of this entirely in rails by
enclosing it in conditional comments, so that it is only included in forms
sent by older IE:

<!--[if lt IE 8]><input name="utf8" type="hidden" value="&#x2713;"
/><![endif]-->

Has anyone experimented with doing that?

~~~
riffraff
what would the gain be?

~~~
simonw
It would be less likely to leak out in to a GET string and confuse people.

~~~
Dylan16807
It also gives people more info to search with.

------
jasonlingx
Correct me if I'm wrong but I think forms in Rails do this by default.

~~~
mosburger
They do. IIRC they used to use a Unicode snowman instead of a checkmark, but
it was changed as the snowman wasn't deemed "enterprisey enough" or something.

~~~
nicholassmith
Maybe when a service/product/framework etc hits a specific point in it's
lifecycle (when it's trying to be enterprisey for example), we can say that
it's melted the snowman.

~~~
xanadohnt
I mean ... this is one of the funniest comments I've read on HN. I will be
diligent in adopting this into my nerd vernacular.

~~~
nicholassmith
If it ends up as a thing I'll be surprised and proud.

------
IgorPartola
Under what case would IE use Latin 1 when there are UTF-8 characters that
should be encoded? I seem to be missing the actual effect it's having.

~~~
jerf
Yes, that took me a moment too, but Latin-1 is an 8-bit ASCII, and UTF-8 only
encompasses 7-bit ASCII. Note I have to say _an_ 8-bit ASCII, because there
are numerous 8-bit ASCII encodings.

~~~
derleth
> UTF-8 only encompasses 7-bit ASCII

What? This is wrong. UTF-8 encodes a lot more than just ASCII.

UTF-8 is compatible with ASCII in that all of the characters ASCII and Unicode
have in common are represented the same way in ASCII and UTF-8. Going beyond
ASCII involves the introduction of multi-byte representations in UTF-8, and
that takes you smoothly (that is, no surrogate pairs) out into the entire rest
of Unicode. As a bonus, it's always possible to verify that a given string of
bytes is valid UTF-8, given that there is a nontrivial structure imposed on
UTF-8 multi-byte encodings that is very unlikely to occur by chance in any
non-UTF-8 sequence of bytes.

~~~
Firehed
I think the point was that anything above 7-bit ASCII will be represented
differently in Latin-1 vs UTF-8; i.e. ¢ (U+00A2) is rendered as 0xA2 in Latin1
and 0xC2A2 in UTF-8 - and 0xC2A2 in Latin1 will be displayed as Â¢.

It gets far worse in 3-byte UTF8 characters, but I don't believe any of them
exist natively in Latin1 (see: euro symbol)

Assuming I'm reading these various character tables right, at least ;)

So a more accurate version of what you quoted would be "UTF-8 and Latin-1 only
overlap for 7-bit ASCII"

~~~
kbolino
Not to detract from your points, all good, but:

0xC2A2 will be rendered as Â¢ only if it's encoded in UTF-16/UCS-2 _big
endian_ and misinterpreted as ISO-8859-1/Windows-1252.

If it's encoded in _little endian_ (much more common on Intel x86 computers),
then it would be rendered as ¢Â when misinterpreted.

~~~
kelnos
That doesn't really make sense. If someone's intending to encode ¢ in utf-8,
endianness does not come into play, as it's a stream of octets, not of
anything larger that you can chunk such that you could swap bytes.

At any rate, if you were to encode ¢ in UTF-16BE, it would be 0x00a2, not
0xc2a2. If a piece of software then misinterpreted it as latin1, likely you'd
get nothing at all due to the embedded NUL.

    
    
      $ echo -n ¢ | iconv -f UTF-8 -t UTF-16BE | hexdump -C
      00000000  00 a2                                             |..|

~~~
kbolino
"That doesn't really make sense."

Indeed. I either completely misread the parent post, or else it said something
different when I responded to it (knowing myself, I'm going with the former).

------
gweinberg
If the point of the field is just to make ie work correctly, wouldn't it be
more appropriate to leave utf8 out of the name and write something like
"ie=💩"?

~~~
Zakharov
If you want to check whether a browser supports a feature, it's preferable to
check whether the browser supports the feature instead of whether the browser
is a browser that supports the feature. The latter can cause problems when
dealing with an unexpected browser.

------
aviraldg
Best way to detect a Ruby on Rails app ;)

~~~
charliesome
Another good one is sending a query string of '?a=1&a[a]=1'. 500s Rack
applications.

------
tlrobinson
Does IE not respect the "accept-charset" attribute on form elements?

~~~
gpvos
Only partially; read the other comments.

------
bitwize
Ummmm, false. I'm going to go with false.

