
Spotify's Kafka-Based Event Delivery System - vgt
https://labs.spotify.com/2016/02/25/spotifys-event-delivery-the-road-to-the-cloud-part-i/
======
serialpreneur
Interesting blog post.

I don't have a good idea of the requirements at Spotify, but looks to me that
using a streaming system like Storm or Spark Streaming would solve the 30 min
event delivery delay they are experiencing w/ unstructured text -> Avro
conversion. The latency for delivery would go down to sub-second levels.

~~~
raphar
Also the article doesn't say WHY they didn't use Kafka to persist the events.
Kafka it's designed to do that.

Once persisted the consumers can just read kafka data and send them to hadoop,
with less latency. Or you can plug storm o spark in as you said and do the
analisys there real time. Or both.

I'm just intrigued why.

~~~
RickHull
They do talk about this:

> _When the system was built, one of the missing features from Kafka 0.7 was
> the ability of the Kafka Broker cluster to behave as a reliable persistent
> storage. This influenced a major design decision to not keep persistent
> state between the producer of data, Kafka Syslog Producer, and Hadoop. An
> event is considered reliably persisted only when it gets written to a file
> on HDFS._

------
kinofcain
Interesting to look at the chart of event volumes at the end. Did Spotify just
accidentally reveal the effect Apple Music has had on their growth rate?

Steep drop around mid-2015, some recovery since then. Hard to glean anything
concrete, but interesting nonetheless.

~~~
brazzledazzle
I would avoid reading into that since the text surrounding that chart indicate
that the system was struggling with the load. That drop could have simply been
them prioritizing and removing certain event sources in the clients while they
figured out how to fix things.

------
pheeney
I wish they went more in-depth into the particular events. They seem to imply
the events are user actions, which I took to mean user played this song, user
viewed this playlist, and other analytics.

However do they, or would you, use a system like Kafka as an event source
instead of the db. So you would also capture events like user added this song
to their playlist that would get persisted to the event source db and then
eventually a view model instead of directly to a relational db. It doesn't
sound like they do that due to the delays, but I can't find a lot of real
usage blog posts about how much to put into something like kafka.

~~~
dundun
You certainly can use a db for an event source. This article does a really
good job of explaining how: [http://www.confluent.io/blog/turning-the-
database-inside-out...](http://www.confluent.io/blog/turning-the-database-
inside-out-with-apache-samza/)

As mentioned in the post, we've pushed Kafka to at least 700,000 events per
second. We have room to push it to much more, but stay in tune for post 2 and
3 to see what we're doing instead.

~~~
pheeney
That is actually the post that got me into event sourcing / streams. As far as
user analytic type events go this makes complete sense to me. What I haven't
been able to discern is whether its useful to use this architecture on a much
much smaller scale for things that may not be user events.

I love the thought of throwing everything into a stream and populating the
read models, analytics, search index, etc with the data. However, for example
if you had a CMS / ecom for a smaller organization, should the admin actions
also be events? If you have an event source db, they would have to be, and you
get all the benefits outlined from the article.

At what point do you decide what to put in the stream and what to build
without? Are there events that should never be in a stream? Those are the
questions I have been researching but I haven't found a lot of resources or
discussions around making these decisions.

~~~
pheeney
My current thought process is you use a relational db like postgres with json
support to go from hobby / early startup to traction where you would need to
start being concerned with scaling. At that point you switch to kafka or
related hosted tools.

As far the data you put into the stream, I would think it could be everything
if you treated all data as immutable even admin actions? Only thing that seems
up in the air is transactions.

That is as far as I got though. I don't work with a company that has that kind
of scale to use this, but I'd like to start working with it.

~~~
macca321
I built an CRM/CMS application where every single controller action call is
event sourced.

The whole application lives in memory as a single object aggregate, which gets
rebuilt on startup. I started off with writing json to the file system, moved
into compressing and appending to a log file, and moved into using Azure cloud
tables.

It's awesomely fast to respond to requests (15ms), and to add new features,
but you do get interesting new problems, e.g. along the way I had to:

\- come up with a way of migrating events (as my storage formats changed as I
improved my frameworks) \- find a good way to do fast full-text-search against
in memory objects as I had no SQL or ElasticSearch infrastructure (ended up
using Linq against in-memory Lucene RamDirectories) \- deal with concurrency
issues in a fairly novel manner(as all users are acting against a single in-
memory)

I'm hoping this architecture will start to become more popular - I think we
are in need of a framework equivalent to Rails to take it mainstream.

~~~
pheeney
That is very interesting. I am guessing this is a closed source application?
Did you do something along the lines of CQRS (Command, Query part) or just
write directly the event source? At what point did appending to log file stop
working which caused the switch to the cloud (or was that for unrelated
reasons)?

I am also hoping it will become more popular as the pros seem to vastly
outweigh the cons. But I think you are right about the framework. From my
research it seems to be medium to large enterprises that would typically be
best suited to using and developing something like kafka, and those
enterprises typically would not open source their applications. So I
definitely think a framework from a company who is using it as scale would be
huge.

Until then, I suppose I will keep reading up and learning all I can and figure
out how to implement this on a much smaller scale.

~~~
macca321
Cloud storage was just used so I didn't have to manage backups myself.

I absolutely didn't separate command and query - the commands themselves are
actions which execute against the domain model, and that domain is used to
build responses.

My project is here:
[Sourcery]([https://github.com/mcintyre321/Sourcery](https://github.com/mcintyre321/Sourcery))
but I think a more mature project you might like to look into is
[OrigioDB]([http://origodb.com/](http://origodb.com/)).

Another thing that gets tricky is making your application deterministic - any
calls to the current time, random number or guid generatiom, or to 3rd party
services, have to be recorded and replayable in the correct order for when you
reconstruct your application instance. This can get tricky if you refactor
your application or change its logic later.

It's worth reading up on Prevalence/MemoryImage, and looking into NEventStore
also.

------
shockzzz
Nice!

