
Drivetribe’s Modern Take on CQRS with Apache Flink - wints
https://data-artisans.com/drivetribe-cqrs-apache-flink/
======
pat2man
I like this architecture a lot but I don't know how "modern" it is. Event
sourcing around a message bus is relatively common.

In fact you don't even need that many moving pieces. Akka persistence does
most of this out of the box:
[http://doc.akka.io/docs/akka/current/scala/persistence.html](http://doc.akka.io/docs/akka/current/scala/persistence.html)

Blue green deployments are a good addition here that many systems don't seem
to be able to handle but the article doesn't really get into the details here.

------
adamkl
It nice to see documentation on a real world implementation of this
architecture, and if anyone is interested in more detail, I'd suggest this
book[1] on my to-read list (it details a very similar approach).

I've done SOA, and n-tier DDD, and I'd love a chance to build something like
this, but as an enterprise developer, I have to operate quite a bit behind the
curve (13 years later, and 90% of my co-workers have never heard of Evans'
book). Still, selling new approaches to management gets easier if you can
point to exiting production systems like this, so, thanks for posting this
wints.

[1][https://www.manning.com/books/functional-and-reactive-
domain...](https://www.manning.com/books/functional-and-reactive-domain-
modeling)

------
aub3bhat
Honestly this sounds like overcomplicating a system just for the sake of it.
Drivetribe can be implemented reliably in Django or Rails without any of these
fancier "architectures". But then considering that the product itself is not
innovative, perhaps they needed something to sell to their naive recruits.

~~~
ariskk
Hi. I am the author of the article. Thank you for spending time to read this.
The combined reach of the co-founders is very large, thus being able to
provably handle scale was an essential prerequisite. Additionally, the
requirements of the platform extend way beyond a simple content server.
Content performance is tracked in real-time and this is fed to multiple
ranking and recommendation models. Those frequently change, thus we need a way
to retroactively process our data. Flexibility is key when trying to build an
intelligent platform. Thus, we decided to early-on invest time in the ability
to quickly iterate and experiment on algorithms, in real time over live data.
You are right that the API fleet could be implemented using the aforementioned
technologies; We use Scala and thus decided to use Akka HTTP instead. The
challenging part is how you manage state behind that.

~~~
aub3bhat
Don't get me wrong, I am not denying that CQRS/stream processing style
approach is not useful for any application. Rather it is unsuitable for this
particular problem.

In my experience all these features sound nice on paper. But you quickly run
into practical issues that are far easier when you know approximate
information about the state.

E.g. Developing a model? you might just want a subset/batch data. Doing
BI/Analytics? are you going to continuously tax your server to recompute? The
argument about recommender systems is also honestly flimsy, having built and
applied such systems to live traffic at very large scale (more than hundreds
of millions of users). There is only a small advantage from being able to
quickly reconfigure flows. In most cases you have a single baseline model
which you compare against for a small fraction of the traffic. The real
complexity/gains in recommender systems lie in choice of algorithm/hyper-
parameters/features, not on continuous multi armed bandits with 1000 different
models applied simultaneously while waiting an infinite amount of time to
produce any statistically meaningful answer. In fact for a website like this
one, recommender systems can only provide so much advantage.

There are actually several really good specialized use cases, e.g. Google
secmon-tools uses a system like this one.

[1]
[https://web.stanford.edu/class/cs259d/lectures/Session11.pdf](https://web.stanford.edu/class/cs259d/lectures/Session11.pdf)

~~~
ariskk
You mention the word "batch" when talking about models. Also "BI/Analytics".
Since Django/Rails applications do not support any of the two, another sort of
system would be needed. This is the point where, having built everything on
Django, with no foresight whatsoever about future requirements, we would have
ended up creating DataFrames from SQL tables in Spark. Our BI guys have no
experience with Spark, so we would need to load data to a DW-like solution,
like BigQuery/Redshift/Impala/Presto/you-name-it. Instead of another sink in
Flink, we would need to implement and schedule ETL jobs. Even at our current
load, computing counters (eg likes) at read time would be slow and
inefficient. Which means we would need a way to pre-aggregate them. Maybe
another service, possibly behind a queue? You can see where I am going. As
requirements evolve, systems evolve, and with no planning before hand, people
end up with spaghetti architectures. We knew we were funded enough to run for
a couple of years. We knew the site would have traffic. We were tasked with
delivering an algorithmicly-driven product, and this is the solution we came
up with.

I really do not understand how such a strong set of conclusions can be drawn
out of so little information.

------
UK-AL
Event Sourcing + CQRS is an extremely scalable, and flexible system, but at
the same time it can overcomplicate things. I'm not sure it was required here.

Interesting to see it used outside the .Net world.

------
BMarkmann
"This is fairly straightforward; professional software developers of all
levels usually have some experience building and scaling stateless APIs."

I would question this, in fact suggest the complete opposite. Given that they
say they're a team of Scala developers, I think there is definitely some
selection bias going on -- in some subset of the wider population of
developers that might be the case, but certainly not globally.

------
ible
I'm not clear on why you would use redis to get transactions. Wouldn't sending
a single message to kafka that represents the whole transaction with acks=all
have the same effect? Then instead of pulling from redis like it is a queue,
you can just do that from kafka.

~~~
ariskk
This is exactly what we did initially. Really early on (before any rate
limiting was in place), a few spam accounts followed 100K people, created lots
spam content etc. Encapsulating those deletions started yielding messages
bigger than the default max kafka message size (1MB). Additionally, this
method had a few side effects on the downstream processors. We could of course
increase the limit, but we decided to deal with the problem at its core.

------
snambi
Seems over complicated to me. Why do they need so many components?

