
Netflix open sources resilience engineering library - jondot
https://github.com/Netflix/Hystrix/wiki
======
splicer
I used roughly the same philosophy recently at work in dealing with a library
that could block for intolerable periods of time (sometimes indefinitely).
Basically, I used a pthread_mutex_trylock and a pthread_cond_signal to pass
data to a worker thread that interacted with the library. If the worker thread
was still holding the mutex when main thread wanted to send data to the worker
thread, I simply say "f*ck it" and generate an error, rather than queueing up
requests or locking up the system. The reason why it was okay to do this is
that the library calls were to gather non-critical metrics, and calls into the
library were fairly infrequent.

(I just drank an entire bottle of wine, so please excuse any apparent lack of
reasoning, typing ability, or general coherence in the above comment.)

------
ryandotsmith
This is a great set of reading materials. It is great to find out how teams
are building in resiliency at the micro level. That being said, I would never
actually use this project for building things. IMHO: Libraries > Frameworks.

~~~
benjchristensen
Glad to hear the documentation is thought-provoking - it was intended to not
just explain the project but also communicate our learnings in operating at
scale and how the Hystrix library is used by Netflix.

I also agree that libraries are preferable to frameworks and Hystrix is in
fact just a java library that can be used as little or as much as one wishes.

It purposefully tries to have a small number of dependencies so should be easy
to pull in without significant impact.

@benjchristensen

------
elq
I've had the "pleasure" of using the predecessor of this library at work.

The code written by someone who's just completed reading the GoF book feels
remarkably similar...

~~~
jondot
From the 10 minutes I glanced at the code I kinda got lost in the abstractions
and that's where I decided it's worth a deep look later, but I wouldn't say
it's over-engineered.

Can you elaborate a bit?

~~~
elq
my problem with this library (well, actually the internal version, I haven't
looked much at the code in github) is that it was written from the perspective
of an edge service - the netflix API. The needs of that system are quite
different from the needs of middle-tier systems.

The API has a hell of a lot more surface area but is trivial in complexity
compared other systems at netflix, and therefor this library has some huge
gaps in design.

The two biggest issues IMHO are putting the throttling/fallback handling at
the outermost edge of an external service rather than at the lowest level
(i.e. the actual rpc) and a very C/errno like method of handling errors.

I'm also quite unhappy with the API. It require creating boilerplate classes
to implement the commands. Yuck. A bit of magic with annotations or code gen
would've been much cleaner and much less prone to errors caused by programmer
fatigue or boredom.

~~~
quotemstr
Thanks - you expressed the uneasy feeling I had more eloquently than I could
have. Between the library's architecture, the (IMHO) architectural inversion,
and the (let's be honest) stilted writing style in the documentation, this
project gives me an "I'm just out of college" feeling, which in turn usually
makes me run like hell.

------
quotemstr
So it's a wrapper that aborts an entire request if an API call made by the
code processing that request times out? What am I missing?

~~~
JOnAgain
I was reading it trying to tease out it's core purpose as well. I reached a
similar conclusion. The only piece that seems incremental is the way it
remembers past performance and will fail-fast if the service is down or
failing rather than keep sending every request to the downstream service. Like
client-side back-off logic or throttling.

~~~
quotemstr
Right. This behavior, and a few other things, is nice, but I'd be hesitant to
say it belongs in its own library. If this kind of baroque architecture (and
the stilted writing demonstrated by the documentation) is de rigueur at
Netflix, I don't think I'd ever want to work there.

~~~
buddycasino
Care to elaborate? I think Amazon does it similarly, so what's so baroque
about it?

~~~
quotemstr
I agree that the behavior Hysterix provides is a Good Thing, but I'm not
convinced that packaging up this behavior into a separate library and open
source project is architecturally elegant. Instead, the base-level web
services client library should provide this functionality. I'm wary of adding
too many modules and too many dependencies to a system.

------
dschiptsov
Reinventing Erlang in Java for investor's money?)

People who don't know CS are doomed to poorly reinvent Lisp or Erlang again
and again..))

But why not, if someone pays for it..

And the whole idea of using Java for serving media content, while there is a
specialized, well-engineered solution, created especially for this purpose in
the telecom world, is such a brilliant management decision.. In Java we
trust.)

~~~
jlouis
With due respect to the library, there are certain parts of it which is not
supported out of the box in Erlang. On the other hand, building tools like
what the library has is not going to take a long time.

~~~
davidw
What in particular jumps out at you? I don't know Erlang or this library well
enough to pick it out right away, but I think it'd be an interesting thing to
look at. Erlang is often cited, but probably not as widely used, so sometimes
gets some 'magic properties' associated with it.

