
Circuit Breaker - henrik_w
http://martinfowler.com/bliki/CircuitBreaker.html
======
defcon84
I like the Circuit Breaker as a stability pattern. It is also explained very
good in "Release It!" by Michael T. Nygard (
[http://pragprog.com/book/mnee/release-
it](http://pragprog.com/book/mnee/release-it) )

~~~
tigeba
"Release It!" is a great book. Netflix has also written a bit on this subject.
[http://techblog.netflix.com/2011/12/making-netflix-api-
more-...](http://techblog.netflix.com/2011/12/making-netflix-api-more-
resilient.html)

------
eldavido
This seems like a problem that only manifests in environments with heavyweight
concurrency primitives: threads, processes, etc.

If you're going to hit this kind of problem often, I'd suggest using a
language with better concurrency primitives, either actors (Erlang/Akka in
Scala) or a reactor (node.js).

Granted, it's often not up to a late-stage maintainer which language was used,
but the state of the art in language design is giving us much more power to
deal with the kind of problems this "design pattern" solves.

~~~
yummyfajitas
I don't understand - how does using actors/reactor allow you to skip using
circuit breakers?

You might implement it differently, for example the circuit breaker might just
be a wrapper around your original function `Request => Future[Response]`. But
I don't see how using actors saves you from thinking about the consequences of
throwing a normal workload at a failing service.

~~~
eldavido
Using a reactor or actors doesn't completely mitigate the problem, but it does
make the symptoms considerably less harmful.

Some hypothetical examples:

Java/Ruby: 8 threads, mean service time: 50ms (20 requests/s). A system like
this has upper-bound throughput of 160 requests/s. If one request is allowed
to timeout, say 5s, this reduces the effective throughput of the system to 7
threads as one is "locked" servicing the timed-out request. It doesn't take
many requests like this to significantly degrade performance, and the only
remedy (absent circuit breakers) is to throw a ton more hardware/workers at
the problem = $$$.

Consider the problem in, say, node.js. If a single request times out, that
request will cause problems, but it won't have _nearly_ the same capacity-
starving effects as the thread-bound example above; the request will simply
timeout, but other requests won't be starved because the number of in-flight
concurrent requests isn't limited by the thread/process count of the system.

I'm realizing as I write this that I'm straying a little off-topic here, but
the point is, using process/thread-based concurrency in a high-performance
system where failure is likely is a bad idea. It's just too easy to get the
kind of failures Fowler describes: "What's worse if you have many callers on a
unresponsive supplier, then you can run out of critical resources leading to
cascading failures across multiple systems."

~~~
cromwellian
Which Java systems these days use 1 thread per request? In my experience, Node
scales far less well than typical Java setups.

~~~
pmahoney
Your typical synchronous servlet container (Tomcat, Jetty, etc.) all maintain
a thread pool and dedicate a single thread to each request for that request's
lifetime. These threadpools can easily hold hundreds of threads on every-day
hardware (vs. something like forking unicorn where 8 Ruby processes consumes
quite a bit of memory).

This works well for many workloads. It allows a straightforward blocking-IO
model, but you don't typically worry about a few slow requests bogging
everything down (which you do if you only have 4-8 unicorn processes). I'd say
in many apps, the database becomes a bottleneck before the pool runs out of
threads.

~~~
galaxyLogic
For Tomcat-8 "The default HTTP and AJP connector implementation has switched
from the Java blocking IO implementation (BIO) to the Java non-blocking IO
implementation (NIO)."

@
[https://tomcat.apache.org/migration-8.html](https://tomcat.apache.org/migration-8.html)

------
beat
Nice to see "Release It!" get a shout-out from Martin Fowler. That book is one
of the most valuable and terrifying reads I've ever encountered for those of
us working on big, complex systems.

------
fooyc
If I understand correctly, the goal is to avoid too much calls from being
blocked for $timeout seconds at the same time, by failing fast when too much
calls have _already_ failed.

Am I correct that blocked calls can still accumulate before $threshold calls
timeout ? e.g. if 1000 calls are initiated before 5 of them timeout, there are
still 1000 blocked calls.

It seems that this problem can also be resolved by limiting the number of in-
flight calls: If too many calls are already waiting for a response, reject
further calls.

~~~
SonicSoul
This would set unnecessary bandwidth limit if supplier could support larger
number of consecutive connections

------
Fasebook
Now you have two problems, unresponsive services and managing breakers.

