
Centrifuge: a reliable system for delivering billions of events per day - bretthoerner
https://segment.com/blog/introducing-centrifuge/
======
mnutt
I'm currently investigating a very similar problem (very high throughput
webhooks, specified by customers, to unreliable endpoints) and was considering
an architecture involving a set of queues partitioned by webhook response time
and/or failure rate.

So if your webhook is bucketed into the 0-100ms queue and your responses start
to exceed 100ms, you'd be bumped up to the 100-500ms queue which is more
likely to have periodic queuing delays, and upwards from there depending on
your response time / failure rate. If your API later recovered and started
responding faster you'd be moved back up into the faster queue. That way we
could offer different (soft) SLAs for different classes of response times, and
scale the workers independently per queue.

I'm curious if there are known issues that people have run into with this
approach? The main unknown was how many workers we'd be willing to throw at
consistently slow endpoints to try to keep the slower queues from backing up
too much, and possibly some flapping as endpoints could respond quickly under
low throughput but slow down as soon as they move up to the faster queues.

~~~
lpr22
I really like this approach. You could even automate it where the 0–100ms
queue's workers could sever the connection after 100ms and re-queue the
message on the next level up, while also incrementing the timeout counter. You
could do really interesting things by incrementing and decrementing a timeout
counter... e.g., it gets decremented when the transaction completes under the
line for the queue it's in. That could help with flap.

------
ryanwaggoner
Ugh. This is only tangentially related, but Segment has always looked great,
but is so incredibly, ridiculously overpriced for web analytics.

If you want to slap it on your website for basic analytics, you better not get
any traffic to speak of, because they charge you about $.01 per monthly
tracked user, anonymous or identified.

If you get 100k visitors per month (not even THAT much), it's $1,125/mo. Their
pricing estimator stops at $2,375 / month for 225k monthly tracked users.

God forbid you put it on your site or in your mobile app and then anything you
do goes viral. It'll bankrupt you.

Who on earth is this aimed at? Is this only suitable for tracking logged in
users? Or only suitable for enterprise companies?

Better yet, what about what they're doing is so difficult and expensive, when
many of the places you'd be sending these analytics (which will be handling
the same number of events) are free or cheap at this level of use?

Segment has always seemed cool, but overpriced for web analytics by a factor
of 20-50x.

OK, rant over.

~~~
tomnipotent
Segment isn't a web analytics tool, so much as a data dispatching service.
Send data to Segment, which in turn sends it to the services you have
configured.

Think of it as a central data store for your business that has built-in
integrations to move that data to other providers such as Google Analytics,
Salesforce, Mailchimp etc.

~~~
ryanwaggoner
Yep, that’s what makes it so cool and useful.

But if one of those services is Google Analytics, for example, then you
probably want to use it with all visitors, which is excruciatingly expensive
if you get hardly any traffic at all.

------
manigandham
>> To implement per source-destination queues with full isolation, we’d need
hundreds of thousands of different queues. Across Kafka, RabbitMQ, NSQ, or
Kinesis–we haven’t seen any queues which support that level of cardinality
with simple scaling primitives.

I've been posting this a lot recently but it keeps coming up as the relevant
solution, Apache Pulsar supports millions of topics without much overhead and
offers the log semantics of Kafka with better scaling and per-message
acknowledgement:
[https://pulsar.incubator.apache.org](https://pulsar.incubator.apache.org)

------
calvinfo
Hey HN, author of the post here. A number of Segment engineers are hanging
around today, and we are happy to answer questions in the comments. Thanks in
advance for any feedback and thoughtful discussion!

~~~
ThePhysicist
This is a very interesting blog post and some great engineering, congrats!

I fail to understand the following aspects though, maybe you can clarify them
a bit:

* How does a director recover from a failure? From my understanding it would require fetching all job IDs from jobs that are in an active state via the job transactions table (which sounds expensive) and then loading the associated meta-data from the jobs table? Is that correct?

* Do you assume that the director will have archived all non-completed jobs when deleting the database? Do you try to gracefully shut down the director first then? From my understanding it seems you perform a "drop table" statement on a given database and then regenerate the tables, but this would require being sure that all the jobs have been processed or archived.

~~~
achille-roussel
On your first point, you’re correct. Directors scan their databases on start
to rebuild their cache and reschedule the jobs that need to get retried. The
scan is usually quick for data that was recently inserted since they’re likely
in cache, it make take a couple of minutes to scan everything. Because we keep
the database to a small size we can cap how long this operation will take. We
also scan the database in reverse order (using the primary key, thanks to the
rough ordering of KSUIDs), to help reschedule he most recent jobs first, which
is possible because the scan can happen concurrently with tye jobs processing.

On your second point, archiving actually happens in the “drainers”, not the
director. It’s the component that picks up unused databases and flush their
data back into the set of running directors. We initially built archiving into
the directors but it turned out to steal too much resources away from job
processing so we moved it out into the drainers, which actually gives us
opportunities to do it more efficiently (for example creating larger archives,
which helps with compression as well).

Sorry if we cut some details off of the post, there is plenty more to tell
about this system but it’s a lot for a single blog post ;)

------
jbs40
I wonder if this would have built on Apache Pulsar
([https://pulsar.apache.org](https://pulsar.apache.org)) if it had been in
open source and on the Segment team's radar at the time they started on
Centrifuge. I work with one of the architects of Pulsar, and his first thought
on seeing the blog was that Segment's scenario had a lot of similarities to
what he and the team and Yahoo set out to do when they first built Pulsar
there several years ago.

------
whalesalad
My spidey sense is telling me that this could have been achieved with a few
Erlang VMs and a lot less moving parts.

It's a supervisor (Director) with a bunch of actors communicating with
different 3rd party API's. The state mechanism could ostensibly get abstracted
away so the underlying DB is irrelevant.

~~~
otterley
One of the challenges for us is that our downstream API integrations are
almost all Javascript (and often provided by the integration vendors
themselves). Rewriting the hundreds of integrations we provide into Erlang
wasn't a viable option for us.

~~~
whalesalad
Interesting. How does the Go code interface with that?

~~~
otterley
The integrations gateways are built into HTTP services, so the Director is an
HTTP client to those gateways. (The image under "All Together Now" omitted our
integrations gateways -- architecturally, they lie between the Director and
the actual third-party API endpoints.)

~~~
temuze
I don't know much about Erlang - wouldn't it be able to interface with HTTP in
the same way?

~~~
otterley
Erlang certainly can be used to implement an HTTP client, but once you step
outside of Erlang's actor model and virtual machine, you lose those benefits
that it provides in terms of message passing behavior.

~~~
jhgg
(the state of http clients on Erlang is also pretty abysmal.)

------
temuze
Great post!

> To keep the ‘small working set’ even smaller, we cycle these JobDBs roughly
> every 30 minutes. The manager cycles JobDBs when their target of filled
> percentage data is about to exceed available RAM.

I'm confused - why does JobDB's memory trickle up over time? Isn't it a
database? Are you using MySQL's memory storage engine or something?

~~~
achille-roussel
There are no issues with memory utilization of the MySQL databases other than
once the working data set don’t fit in memory anymore it has to be fetched
from disk on cache misses, which slows down fetches. Disk utilization of the
Job databases increases over time because we don’t do any deletes to avoid
extra write operations (a delete is basically a write). Once a database disk
utilization reaches a defined threshold it is swapped for a spare, then
drained, destroyed, and reinitialised to be reused later on.

I hope this gives some clarity, let me know if you have further questions.

------
Zaheer
Awesome write-up! Curious on the decision to choose MySQL since it's a write-
heavy load with minimal querying. Would something like Redis be better suited?
I'm not super familiar with Redis so just curious about what other DBs were
considered.

~~~
jlisam13
The biggest problem with Redis compared to MySQL in this use case is
persistence. You could technically run Redis using the different persistence
options
[https://redis.io/topics/persistence](https://redis.io/topics/persistence),
but it is not going to be as safe as storing things in a RDBMS.

------
sjeanpierre
Hi, thanks for the great write up. Can you provide some more details about how
you handle the RDS side of the rotation process and maintaining the spares?

------
siscia
Thanks for the post! It was really a good read.

In a similar position I would have tried MQTT that should handle quite well a
great number of topics.

Did you guys tried such protocol?

~~~
achille-roussel
Correct me if I’m wrong, but MQTT is just a network protocol, it doesn’t solve
storage, retries, failure resilience, etc... it’s well suited for pub/sub
operations over TCP, not so much for ensuring “exactly once” delivery of
messages.

~~~
siscia
Yes, you are correct. MQTT is a network protocol.

You build your infrastructure on top of it to support stuff like storage,
retries, resiliency, etc...

However, it does gives you some guarantee, namely that the message will oblige
its QoS (at most once, at least once, exactly once).

On top of these guarantees, you build whatever you need.

Honestly, I am quite glad that you are writing these posts, it is a great
service for the community and if you end up open sourcing it, the project will
bring a lot of value to everybody.

Thanks for your posts :)

------
tekmaven
By age 35 you should have written your own queuing system at least once.

~~~
whalesalad
Exactly.

------
scrollaway
Ok, I've read through most of this (really cool post btw!) and I still can't
figure out if this project is using the Centrifuge stack or not:

[https://github.com/centrifugal](https://github.com/centrifugal)

It looks like not, but that's a hell of a naming overlap.

I looked at Centrifugo a few years ago to deliver live Hearthstone games (in
game replay format) through the web. It's a pretty sweet project.

~~~
otterley
It is not. The naming resemblance is strictly coincidental :)

~~~
firebacon
Well I for one think that it's not really "cool" to just copy the name for
more or less the same thing.

BTW, it also appears there is an active US trademark (#78010053) for
"centrifuge" in the computer database context. Isn't that an issue, too?

~~~
otterley
By coincidental, I mean that no copying was intended. It was strictly an
accident.

~~~
firebacon
Ah ok; I have made a habit of typing "<name> {software,github}" into google as
well as the WIPO trademark search when picking a name for a new project.

I started doing that after having to go through two iterations of renaming a
project after finding out the old name was somehow encumbered... of course
after we had already used it in front of other people; it was very
embarrassing.

~~~
otterley
Honestly, I don't think any of us had heard of this other project when we
coined the name internally. If someone had raised it up the flagpole, we
certainly would have considered it.

Still, do keep in mind that it's not the name of a product offering - it's
solely an internal name for the technology.

~~~
achille-roussel
That’s correct, we were multiple months into the project and already using it
in production when I came across (the other) centrifuge. We chose to just go
with it because it wasn’t worth changing all the code and everyone’s habit of
referring to centrifuge with this name.

------
carapace
What is with the rash of bad naming decisions these days?

It started with "Cucumber" or "Celery" or something a few years ago didn't it?

I'll skip ranting about how "Go" was a terrible name for a PL (and only
mention parenthetically how they gaffled that name from a different PL!) They
have "Grumpy", and something called "Thanos" (good luck searching for that
until the hype for that comic book movie dies down.) I feel like I've seen
several other projects recently that have been named after other things.

~~~
littlekosh
Yes, fortunately Ruby, Python, Julia, Java, Delphi, Elixir, Lisp, Forth,
Groovy, and Rust are all unique words with no other meaning.

~~~
carapace
Oh hey everyone, sarcasm! You don't see a lot of that every day. ;-P

And Elm, Swift, Logo, Pascal and Haskell and Ada, Icon and Self, Opal and
Occam and Maple...

There's a lot of them:
[https://en.wikipedia.org/wiki/List_of_programming_languages](https://en.wikipedia.org/wiki/List_of_programming_languages)

If all your friends name their projects after something and then jump off a
bridge, it's still a dumb thing to do.

~~~
littlekosh
Yes, sarcasm but, much like jumping off bridges recreationally, naming
programming things after existing words is an old practice. Lisp is 60 years
old. Complaining about Celery and Centrifuge seems as disingenuous as sarcasm.
We can agree on Go though. Some words are simply too common.

~~~
carapace
Well met. And you're right, the practice is older than I said (and not even
confined to computer stuff for that matter.) I just feel like I've seen a rash
of generically-named projects recently and that's why I went off when I saw
this one. (I've also been on imgur this last week and I think it's affecting
my communication style.)

