

Data Processing -- Be Afraid, Be Very Afraid - tonystubblebine
http://dev.hubspot.com/bid/6916/Data-Processing-Be-Afraid-Be-Very-Afraid

======
smhinsey
Interesting. I think where the author writes "reentrancy" they may mean
idempotency instead.

I haven't thought about the "kill the whole pipeline" approach before, but it
doesn't really seem that different from one of my standard techniques, which
is to flag any data that was gathered under error conditions so it can be kept
but reviewed for cause. Shutting down the whole process seems like it might be
a hard sell to non-technical stakeholders, but I would absolutely operate that
way in all non-production environments.

~~~
tonystubblebine
What's the difference between reentrancy and idempotency?

~~~
smhinsey
In my mind, reentrancy is associated with threading, and implies that the code
could be executing in multiple locations simultaneously, whereas an idempotent
operation is one that can be repeated multiple times without negative side
effects.

~~~
gchpaco
Reentrancy comes up with recursion and signal handling as well, both of which
can come up in nominally single threaded programs (nominal because signals
usually but don't always come from out of process).

~~~
smhinsey
You are of course correct, I have been working in a multi-threaded environment
for long enough to have let that slip my mind.

------
rjurney
This is the same issue you have when you're building your own analytic system
using SQL and you're building big queries to do more complex descriptive, and
inferential statistical shenanigans.

No matter the technology, automated tests on known data with expected results
are the only way to be sure what you're getting is right. And you better be
sure, because people have zero tolerance for inaccurate analytics. They will
immediately ignore your tool, permanently.

For instance... if your slot machine analytic system does not include slot
machines with apostrophe's in the name in its 'revenue total' column... you
are so screwed. Even once you fix the bug, your stuff is tainted. Don't ask me
how I know ;)

------
ntoshev
Great post and the comment (only one at the time I post this) doubles it's
value, be sure not to miss it!

------
harry
Good article - I'm happy to find out that other DBRs have the same healthy
fear & utter suspicion for each analysis they write as I do.

------
staunch
Great article and advice. I'd love to see more like it!

