
Stream processing, Event sourcing, Reactive, CEP… and making sense of it all - suprgeek
http://blog.confluent.io/2015/01/29/making-sense-of-stream-processing/
======
presty
ye, good article, but it's already been discussed here
[https://news.ycombinator.com/item?id=8966852](https://news.ycombinator.com/item?id=8966852)

------
jpatte
I'm currently in the process of redesigning the whole server architecture of
our SaaS application to embrace event sourcing - we were using a classic
relational database until now. I'm excited by all the new possibilities this
will give us, but it's a lot of work to rewrite just about everything and to
migrate old data into event streams. My advice to anyone who considers event
sourcing for future projects: use it from the beginning. Transitioning from a
more traditional model is hard.

Btw, the author seems to make a confusion between events and commands. These
are not the same thing: a command represents an action (by the user, an other
system or an internal process), while an event represents a change in the
data. A command may generate 1 or several events during its processing. It is
not saved in a data store, but it can be replayed in case of failure if retry
policies are in place.

~~~
boothead
I always think that event sourcing is missing the concept of commands.

If you have the idea of commands (which I think are mentioned in some of the
literature) it becomes a lot easier. Commands are intensions to change the
world that might fail, and have access to the current state when they are
executed. If they succeed when they're run, they will emit events. Doing it
like this means that you are guaranteed a sane event stream, and it makes it
much easier to map from an existing system to events.

I've written both a web scraper and a database exporter that works in this
way: as the legacy system is traversed you try to run commands. If the command
succeeds and produces events you know that the events are compiant with your
business logic. If the command fails you have the option to fail the whole
operation or continue without the parts of the old system that don't meet your
business logic.

 _edit_ updated as you mentioned commands above.

~~~
jpatte
The concept of commands is very useful but I don't think it belongs to the
event sourcing pattern. Event sourcing is just a way to write and read data,
it doesn't say anything about how you process inputs and how that processing
can lead to a change of data. For that concern there exist other patterns like
CQRS which introduces the concept of command. These patterns are
complementary, and using both CQRS + ES is actually common practice.

------
ghc
Some of this is very good, and it's a good beginner's introduction, but there
is significant misunderstanding about the applications of FRP and the actor
model in the area of complex event processing. These are fertile areas of CS
research (to which I made some small contribution in the CS department at
Yale), not just "loosely coupled ideas" or industry buzzwords. They have real
academic meaning, even if the terminology is sometimes co-opted to make
unrelated software sound cutting edge.

~~~
anebg
Care to point me to a better place to learn about the aplications of FRP and
the actor model besides what it is being buzzed around?

------
grandalf
It's great to see the world finally rediscovering this stuff. CEP is a superb
paradigm that makes reasoning about so many kinds of complex, asynchronous
systems far easier. One just has to get over any aversion to storing a massive
event store. The good news is that great datastores exist for this purpose and
one can usually bootstrap by storing events in whatever relational DB you are
already using until size/perf becomes an issue.

This is great article. I really like the work from Stanford on Rapide (a
declarative, logic style language for event pattern rules).

[http://complexevents.com/stanford/rapide/](http://complexevents.com/stanford/rapide/)

The site is a bit outdated but the language is awesome.

Also check out these books on CEP:

[http://www.amazon.com/Power-Events-Introduction-
Processing-D...](http://www.amazon.com/Power-Events-Introduction-Processing-
Distributed/dp/0321951832/ref=sr_1_3?ie=UTF8&qid=1423071897&sr=8-3&keywords=complex+event+processing)

and

[http://www.amazon.com/Event-Processing-Action-Opher-
Etzion/d...](http://www.amazon.com/Event-Processing-Action-Opher-
Etzion/dp/1935182218/ref=sr_1_2?ie=UTF8&qid=1423071897&sr=8-2&keywords=complex+event+processing)

------
kiyoto
A great article. A couple of open source projects for CEP/event processing.

1\. Esper ([http://www.espertech.com](http://www.espertech.com)) has been
around for awhile. This is _the_ CEP engine a lot of people are familiar with.

2\. Norikra ([http://norikra.github.io/](http://norikra.github.io/)) is a
schema-less event processing engine, often used with Fluentd
([https://www.fluentd.org](https://www.fluentd.org)) as the data collector
(Fluentd can do stream processing as well).

3\. Apache Storm ([https://storm.apache.org](https://storm.apache.org)) has
been popular in the Hadoop community, often with Kafka as the event source.

------
sleazebreeze
Storm and Kafka are great building blocks for event streaming and processing,
but if you're interested in a very well designed and non-opinionated framework
for Java that helps you build CQRS applications, Axon[1] is worth looking
into. The creator and maintainer, Allard Buijze is a great guy (and he is paid
to maintain it) and the code base is very solid with improvements being made
on a regular basis.

[1] [http://www.axonframework.org/](http://www.axonframework.org/)

------
tristanz
This architecture seems great for analytics apps, but when expanded beyond
that (e.g., Pete Hunt's Full Stack Flux talk) I never see explanations of
basic patterns like validation. Where does validation happen and how are
errors sent back to users?

For instance:

AddToCart(prod=1, quantity=1) -> Transactionally check that there is still
inventory, return error if there isn't, add to stream if there is.

~~~
jpatte
This is where the distinction between commands and events is important.
"AddToCard" is a command (which might fail), while "AddedToCart" is an event
that will result from processing only if the command passed validation. You
should store events, not commands.

------
jnaour
Really interesting blog post by one of the guys behind Kafka related to this
topic: [http://engineering.linkedin.com/distributed-systems/log-
what...](http://engineering.linkedin.com/distributed-systems/log-what-every-
software-engineer-should-know-about-real-time-datas-unifying)

