
How the end-to-end back-pressure mechanism inside Wallaroo works - scottlf
https://blog.wallaroolabs.com/2018/04/how-the-end-to-end-back-pressure-mechanism-inside-wallaroo-works/
======
scottlf
Howdy. I'm the author of both parts of this 2-page series on overload
mitigation. Let me know if I've overlooked an important technique for handling
too-big workloads or if you have questions (e.g., how we've stitched together
TCP's sliding window protocol flow control with Wallaroo's internal flow
control).

-Scott

~~~
petesoder
I hope we hear something about this in your talk at DataEngConf SF!

~~~
scottlf
I don't recall which of us is planning on attending DataEngConf SF, but I'm
asking around now.

EDIT: Ah, it will be Vid Jain, the CEO of Wallaroo Labs.

------
colanderman
Backpressure gets more "interesting" when you throw QoS into the mix. If the
client has hard latency requirements, the server can't afford to "stop the
world" when it gets backed up (à la TCP).

Keeping your queues shallow can be a solution, but only if your system doesn't
need the benefits of batching that deep queues can bring. You then have to be
a bit smarter about crediting the client, only handing credits out when you
know you'll be able to complete that work within the QoS requirements, instead
of simply whenever you have space in your queues. (Simply rate-limiting the
source will work in a pinch too, at the cost of potential throughput.)

~~~
scottlf
No disagreement from me.

It's definitely not easy to keep queues shallow end-to-end. In my experiences
with Erlang systems, queue management is an area where the BEAM VM's runtime
is definitely not actively helpful enough. One anecdote: within the last few
months, the weak scheme that the BEAM uses for runtime back-pressure was
removed because it isn't effective enough to justify the complexity of its
code.

When a system does have the ability to keep queues shallow end-to-end, then I
think it's a good base to build on, adding additional features to allow deeper
queues where and when we want them.

Regarding Wallaroo: Today, you can use Kafka as a data source, allowing Kafka
to be your deep-as-you-wish buffer upstream and/or downstream of Wallaroo.
Tomorrow (a.k.a. vapor, though we've discussed the feature internally) it's
certainly feasible to add an option to (for example) `TCPSource` that would
add a large buffer at the entry to a Wallaroo cluster ... with flexibility to
queue in RAM, disk, or elsewhere. It would also make failure recovery more
complex, and it's one reason why we've deferred implementing it.

