
 A bug I won't forget - paulasmuth
http://paulasmuth.com/blog/a_bug_i_wont_forget
======
pilif
"I hadn't noticed this before since the parser is written in a way that it
will ignore everything that doesn't look like JSON."

and this is precisely why you want to fail hard if you encounter invalid
input. Yes. It's annoying in the cases of "nearly valid" input or "valid input
but with some garbage". Yes it's more work to deal with the error.

But it also means that something like this blows up before you end up in a
"sometimes it works, sometimes it doesn't" situation.

Yes. I have been dealing with and even preferring the silently-failing kind of
functionality, but over the years I've been bitten by it too many times to
still being able to prefer it with a good conscience.

Overall you might still spend more time overall dealing with bitchy libraries,
but at least you will hopefully never have to deal with bugs that happen only
sometimes as those are really hard to track down and fix (if it's at all
possible).

Sure. Sometimes you can get away with "yea - it fails at times - that's an
unfortunate fact of life", but the moment that issue which only rarely appears
costs the customers or your money, it all becomes really important and "it
fails at times" just doesn't do. Of course, by then, the problem needs to
fixed _right then_ \- which just doesn't go very well with "it usually works".

At that point you spend the hours it takes to track the problem down and you
will curse your decision to fail silently once.

~~~
pimeys
I'm currently having issues with em-http-request, resque and resque-retry.
It's still sometimes dropping work before the retry limit and not behaving
nicely with the retry timespans. Also the async http request is not using the
timeout value, randomly...

It only happens with big traffic, like 0.01% just fails in a wrong way. It's
not much, but still it's our and our customer's money. I hope our get together
to solve this problem helps tomorrow.

~~~
alttab
I've noticed the same issue with that gem. It only happens randomly and with
high density asynchronous traffic.

------
ardillamorris
This bug reminds me of what my dad taught me when I got my driver's license.
He taught me that knowing how to drive carefully wasn't enough to prevent
accidents. I had to drive for myself and for everyone else. I clearly remember
"you don't know what kind of drunk will be blowing a red light".

In this bug, Paul was driving carefully - he relied on the parser to do a good
job. But relying on the parser is like crossing in GREEN light without
checking. 99.9% of the time you should be OK. Until that one time with the
drunk blowing the RED light.

My dad was right. Had Paul not relied entirely in the parser and done accurate
memory allocation (checked for that drunk blowing the RED light) - everything
would have been fine.

~~~
RyanMcGreal
Sidenote: I'm pretty sure the author is Paul Asmuth, not Paula Smuth.

~~~
ScottBurson
Ah, thanks.

I wonder how many people even know that dashes are legal in DNS names. (I
mean, of course, the ASCII character that serves as hyphen, en-dash, and minus
sign.)

I think there are lots of domain names that would benefit from a well-placed
dash -- the most amusing example I've seen being Pen Island's.

~~~
gjm11
Expert Sex Change. (Familiar to almost everyone here, I think.) Power
Genitalia. (An Italian battery company.) Whore Presents. (A service for
finding out about people's publicity agents, etc.)

------
jwr
It's a pity I can't read this blog post. On an iPad it specifically prevents
me from zooming and the font is too small.

Please don't use "ipad-specific" or "mobile" themes. They break the web.

~~~
derwildemomo
Zooming works fine here, ipad3, safari.

------
FrankBooth
Would it be awfully smug to point out that Valgrind would've pointed this bug
out in mere minutes? That's exactly why I make a habit of running my tests
under Valgrind regularly during development; there's no point wasting hours
debugging the classes of problem that tools can pinpoint in minutes.

~~~
Domenic_S
Tangent -- it's crazy to me that _println_ passes as debugging.

------
kevingadd
So, wait. When you allocate a new array in the JVM, it's filled with random
data instead of zeroes? That seems like a fundamental security model error. Or
are these 'buffers' special native IO primitives that break all the Java
security rules and guidelines? I haven't used Java in a while...

~~~
paulasmuth
Heh no, this was the actual bug (that it was reading "random" data from memory
on the first iteration). I just hadn't noticed the issue until this "random
memory" contained fragments of invalid json.

~~~
cpeterso
Were you or the network library reusing buffer objects (to avoid reallocating
them), so the random data was leftover from an early socket read? I'm
surprised the JVM would allocate a new buffer object with non-zero data.

~~~
fizx
Yeah, he was almost certainly reusing his byte[]s. My takeaway is that if you
program a high-level language as if it's C, expect C-like bugs.

~~~
i386
Likely reusing his directly allocated ByteBuffer and not checking the number
of bytes that he filled it with.

From what I remember, directly allocated ByteBuffers are not guaranteed to be
zeroed.

------
CookWithMe
Just out of curiosity: How much Scala are you using at DaWanda? And for what
use cases?

~~~
paulasmuth
We currently use it extensively for analytics, our product recommendation
engine (blog on that soon!) and we are writing a custom scala based http proxy
for our new API. In general, we are trying to progessively do more scala and
less ruby.

If you (or anybody else) is interrested in hacking with us, please drop me a
note (link to your github profile is enough) at paul@dawanda.com :)

~~~
SimHacker
Oh, Paul. ;( Less Ruby??! What happened? Are you angry at her? Did she cheat
on you? Did you catch her in bed with Mikael? I warned you about him.

~~~
paulasmuth
Oh, Don. No, everything is fine with ruby, I just got bored. And scala... she
is so much faster! See you soon in AMS; I'll bring Mikael for fun and profit
;)

