
OMG Ponies (Aka Humanity: Epic Fail) - gthank
http://msmvps.com/blogs/jon_skeet/archive/2009/11/02/omg-ponies-aka-humanity-epic-fail.aspx
======
arohner
With every passing year, I'm more and more convinced a language approaching
Haskell's pedantic-ness is A Good Thing. A language where Joda is the most
convenient option, where all numerical quantities have units, and you have to
make an explicit option to do "unsafe" things. Naturally, the goal of the
language is to do as few unsafe operations as possible.

I think lisp is a good example: Most lisps have a full numeric tower that
automatically converts between ints and bignums and ratios as appropriate. You
don't have to think about the floating-point representation of 0.3 unless you
wrote (double 0.3) in your code somewhere. Correctness should be the default,
because it's much easier to trade some correctness for performance. Going the
other way is much harder, and most people don't need performance.

This goes a long way, but now I notice the biggest obstacle is "legacy
systems". The only time I get in trouble with the Clojure numeric tower is
when I want to store a Ratio in the DB.

~~~
RyanMcGreal
>a language approaching Haskell's pedantic-ness

I believe the word you're looking for is "pedantry".

/pedant

------
seldo
"due to rainfall thousands of miles away, my unit test had moved Greenland
into Argentina. Fail."

Argentina's 11-day notice for changing their DST rules wreaked all sorts of
unexpected havoc. It's sort of awesome (small worlds are more fun) and also
totally exasperating.

------
kylec
The video, for those that missed the link in the second paragraph:
<http://vimeo.com/7403673>

~~~
KevBurnsJr
WAAAAY more entertaining w/ actual sock puppets.

------
blasdel
_Unicode has its own special line terminator character as well, just for
kicks_

<clippy>It looks like you're trying to refer to NEL, which is in Unicode
because it was distinct in EBCDIC, and who doesn't want lossless round-trips?
</clippy>

~~~
die_sekte
Unicode also has separate Line Breaks and Paragraph Breaks somewhere around
U+2000.

------
mumrah
It's important to note that choice of language affects a lot of these issues.
For example, the unicode issue in Python:

    
    
      >>> s = unicode('Les Misérables','utf-8')
      >>> print s[::-1]
      selbarésiM seL

~~~
blasdel
Nope, Python fucks this up just the same, even in Python 3:

    
    
      >>> print u'Les Mise\u0301rables'[::-1] #2
      >>> print('Les Mise\u0301rables'[::-1]) #3
      selbaŕesiM seL
    

Almost no implementation will fuck up LATIN SMALL LETTER E WITH ACUTE U+00E9,
but nearly all programming languages will royally fuck up a COMBINING ACUTE
ACCENT U+0301 even in much easier cases like string length. Almost all
implementations that claim to be UTF-16 are actually UCS-2, and can't handle
surrogates in the slightest.

~~~
simonw
You can get the correct result by normalizing the string first:

    
    
        >>> import unicodedata
        >>> print unicodedata.normalize('NFC', u'Les Mise\u0301rables')[::-1]
        selbarésiM seL

~~~
fh
That's in some sense even more broken: Not all combined characters have a
normalized form, so the result is even less predictable. (I don't have an
example ready.)

------
nomoresecrets
I saw this presentation at the London DevDays where Jon gave it.

I was quite far back, and when he walked on stage with the sock puppet on, I
assumed it was one of those wrist support gloves that you use when you get RSI
because you won't just stop typing all day long, even though you're harming
yourself.

After all, I thought, it's Jon Skeet - he never stops typing, surely? :-)

------
gchpaco
I'm embarrassed. I admit it, I forgot Turkish upcases 'i' to İ. But the core
teaching, which is "if it involves internationalization or talking to humans
find a library written by a legitimate expert and use it instead" is true
gold.

~~~
iron_ball
Yeah, but who expects to need a special library for uppercasing some seemingly
Latin text?

------
jmatt
The article really was entertaining. These are all common problems for new C#
programmers.

Doubles in .net are IEEE 64-bit (8-byte) double-precision floating-point
numbers . I've written a few different comparers to deal with problems like
the one he demoed. He should have typed it to be a decimal if it was currency
or he wanted it to have an exact value instead of be a double-precision
floating-point number.

As for the reverse on the string. He reversed the characters, which are
defined in .net as char / byte. Not unicode. If you want to play nice with
unicode in .net you need to parse it as unicode. A hugely common mistake for
all of us who have done some internationalization in .net.

I remember or sorts of trouble with java.util.GregorianCalendar back in the
day. Ugh.

I'm not defending .net or C#. I'm just not surprised he ran into these newbie
problems. You guys should see the evilness that is Access, the inconsistencies
there make this look like nothing. And it's a black art with almost no
documentation anywhere.

~~~
barrkel
Jon didn't run into most of these problems so much as answer others who did
run into them on stackoverflow.com or the MS newsgroups.

------
nazgulnarsil
he sums it up quite well at the end.

 _if you write a lot of code under a set of assumptions which then changes,
you're in trouble._

I think the key question is: do my assumptions differ from my customers
assumptions, if so where?

------
kuda
I've gotten to the point where I automatically dislike anything containing the
phrase "Epic Fail."

~~~
brown9-2
Same here, but you should really look past that in this case.

