
Getting Started with Event Sourcing in Postgres - srt32
https://www.backerkit.com/blog/getting-started-with-event-sourcing-in-postgres/
======
karmajunkie
Before anyone goes building their very own eventstore based on this blog
article—even calling this implementation of the pattern eventsourcing is a
stretch. You don't "query" the event stream. Your projections subscribe to it.
(Also, there's no projection in this.) And while its not the first time i've
heard of someone using a table per event type, its definitely something I'd
regard as an antipattern.

There are MUCH better articles out there about eventsourcing, even in Rubyland
where its still fairly uncommon. This one is just cargo-culting, poorly.

~~~
codebeaker
I'm currently writing a book on this (ES/CQRS), but I have come to the
conclusion through some of our projects that a "subscription" model is
actually less ideal as you end up falling back to keeping a lot of state in
the application heap.

We rather prefer to have a subscriber who re-runs the projections which are
purely functional, and dump their state.

This mandates that you have some way of querying relevant events from the
underlying store. Sometimes the best store for this is a DBMS, sometimes it's
something simpler like CAS linked lists in Redis or similar (lists of hashes
pointing to keys, similar to git's ref/tag model)

Either way, this idea that you subscribe to a stream makes reconnection and
place-holding (cursor) semantics difficult, and simply calling as-frequently-
as-makes-sense your update function is simpler to reason about, and helps
highlight performance issues by making it easier to see the perf of frequently
called short lived processes vs dealing with any VM bloat of your choice of
language's runtime managing long lived subscribers.

Caveat: naturally this is all very subjective, just my 2 ¢

~~~
fnord123
>Either way, this idea that you subscribe to a stream makes reconnection and
place-holding (cursor) semantics difficult

This is true. It's also a problem if you drop a packet - you need to build up
state again on the client side.

If you're writing a book about event sourcing, I should think you'll spend a
good deal of time covering trading systems and order books since these are an
area where event sourcing (AIUI) has been used for decades.

------
amluto
All this "event sourcing" stuff is fun to watch. I've been using essentially
this paradigm in production for years. The difference is that I use a simple
custom append-only database for it. The consumer side can read, parse, and
handle well over 1M events per second on a single core. Take that, Postgres.
If the code wasn't such a mess, I'd open source it.

I'm a bit mystified as to why it's called event sourcing, though. Everything
sources events. The difference is that this data model has everything
consuming events instead of consuming state.

~~~
fweespeech
I'm not sure but the term is pretty old.

[https://martinfowler.com/eaaDev/EventSourcing.html](https://martinfowler.com/eaaDev/EventSourcing.html)

This is an article I have from 10+ years ago and its still roughly the same
way I do it now. I've never needed to scale it beyond something like Postgres
or Redis.

~~~
fnord123
Isn't Event Sourcing the same as anything using a Write Ahead Log (WAL)? i.e.
it's been around for decades.

~~~
karmajunkie
No, a WAL is an implementation detail of a database—you could only consider
that to be ES if you're building a database. ES refers to the primacy of
domain-related events as the source of truth about your application state.
Your application consumes event from individual streams (events belong to one
or more streams—usually one) and changes state with each event. You can think
of it as a functional data pattern.

~~~
fnord123
>you could only consider that to be ES if you're building a database.

I poke around in the guts of databases, so that's what I meant. The WAL is ES
in the context of databases. So I guess ES is the
generalization/abstraction(?).

~~~
tomnipotent
Probably more appropriate to think of ES as relying on a WAL for managing an
ordered event stream. When getting the current state of a business object e.g.
"SELECT * FROM table WHERE pk=313", you would replay the event log to get
current state and discard it when you're done. Mutation is only done by adding
an event to the stream.

It's not uncommon to see RDBMS's still used to improve read performance (but
not as a source of truth, just a cache with SQL).

~~~
fnord123
Right, and then, if the domain allows it, you can compact the logs into the
current state so the compacted state is the single source of truth. e.g. HDFS
works this way by having an fsimage and the edit logs. After a compaction, the
fsimage is the HDFS namespace. The RDBMS that you mention is just a coarse
tool for doing the same thing (IIUC).

------
friendzis
I see a lot of event sourcing tutorials are based on ordering by timestamp. If
events (state transitions) are generated concurrently that would lead to huge
problems reconstructing actual flow (and breaking state machines). For example
when user has multiple sessions open and makes concurrent changes in both
sessions.

Wouldn't it be better to store identifier of previous state and reconstruct
the event tree using CONNECT BYs/WITH RECURSIVEs?

~~~
piaste
If your event payload contains the whole updated state of the system, then
yes, you need to know the ID of the original state in order to detect
concurrent updates to the same original state. But if you want to keep things
smooth for the user you'd have to merge those concurrently updated states,
which may well be a massive pain, depending on your data model.

A better option is to only include the changes in the event payload, in which
case you don't need to know the original state and you can simply project them
all in order - and if one makes the next impossible, it's first come, first
serve. Again, though, depending on your data model, generating an event
payload that only includes the changed elements may also be a pain.

~~~
friendzis
But you do not necessarily need to merge events. Contrived example, but say
you have some kind of transactional flow (e.g. accounting) that is in the end
signed/sent/processed by a central authority/job. There are two concurrent
transactions modifying the same object. If the flow is to commit the latest
state, then user1 can accidentaly commit user2's edits. This can easily happen
if accountants connect with company credentials.

\--- edit addition ---

For example there is an error in submitted (not yet committed) invoice for
green and blue pens - invoice is for 5 boxes of blue and 5 boxes of green
pens, but the courier departed with 11 total boxes, because it was a quick
call directly to warehousing. Accounting does not particularly care for pen
colour and get request to modify the invoice to total to 11 boxes. Accountant
1 modifies green pens count to 6, accountant 2 modifies blue pens. There is
now high chance to sign the invoice with 12 boxes.

I understand that problem in example is solvable with checks and full state
changesets, but it is a problem nevertheless. That's why I am proponent of
history trees.

~~~
piaste
In your example, I think ending up with a 12-box invoice is absolutely the
correct behaviour for the program.

------
bruce_one
The CTE in the final query seems like it should be replaced with a subquery
(or view?) because the CTE is an optimisation fence, so it's going to seq scan
email every time, even if there's a where condition which could use the
message_id index.

------
Yuioup
I've inherited a project with CQRS+Event Sourcing but I can't wait to get rid
of it. It's genuinely depressing to work with the code.

~~~
skyde
Could you elaborate about what you don't like about it

~~~
Yuioup
It's to do with the fact that the original implementors decided to apply this
design pattern for the entire system, including places where it's not really
useful like CRUD.

A lot of redundant plumbing in the code with classes mapped to other classes.
The data structure they chose does not allow inheritance so there are a lot of
classes that look exactly alike where sub-classes would be useful.

This notion of Command that generates an Event that generates more Commands
that can generates more Events asynchronously with code for eventual
consistency. Lovely concept but database records don't exist. You can't just
query a database. You have to build your views based on playing back your
events. I just find it completely confusing.

Frustrating as well because certain things that are trivial in an RDMS end up
costing an incredible amount of development time in CQRS+Event Store. I must
admit unfamiliarity with the architecture is definitely a factor - except that
my predecessors wrote everything from scratch down to the JSON format of the
data structure. One mistake and the microservices crash causing the data to be
in an invalid state. In order to fix the data, you need to replay all the
events - and there are millions of them. Nothing ever gets deleted because a
'delete' is a new event.

pffff ....

~~~
karmajunkie
I don't blame you for being soured on it—sounds like there are a lot of
antipatterns in play there!

> including places where it's not really useful like CRUD.

Probably the most common mistake made by architects doing it for the first
time.

> You can't just query a database. You have to build your views based on
> playing back your events.

Its pretty common to have projections that build database tables so you can do
exactly that. It sounds more like the original architects violated (or had
poor delineation) of service boundaries, which just creates a tightly coupled
set of microservices, which is worse than the monolithic architecture it seeks
to replace.

> In order to fix the data, you need to replay all the events - and there are
> millions of them. Nothing ever gets deleted because a 'delete' is a new
> event.

Its also common to archive streams periodically and "declare bankruptcy" with
an initial state setting event, proceeding forward from there. Snapshotting is
also a thing.

You might find this helpful in dealing with some of those issues:
[https://leanpub.com/esversioning/read#leanpub-auto-
status](https://leanpub.com/esversioning/read#leanpub-auto-status)

------
olalonde
For those of you who use event sourcing: how do you scale horizontally? Do you
have a single process that handles all events sequentially? Maybe I'm
overthinking this but I'm worried this might become a bottleneck eventually.
Otherwise, how do you deal with concurrency issues (e.g. 2 different visitors
who signup with the same username concurrently)?

~~~
karmajunkie
The first rule of eventsourcing is don't eventsource EVERYTHING.

Eventsourcing requires really, really detailed analysis of your boundaries
(and no small bit of experimentation) to get right, even when you've done it
before. It should really only be applied to the areas of your application that
are going to see a real benefit from it. My proxy for measuring this is "how
much money does this group of features make for me?" If the answer is some
version of "none" or "none, but its a requirement to have it anyway" then I
don't try to apply ES to that area of the application. I _never_ use ES on
user records anymore, for example.

I'm sidestepping your actual question, but mostly because your use-case is
something I consider to be a bit of a canard. To answer it though, as others
noted, "it depends"—I model projections with actors, which means all events DO
come through a single process handling events sequentially (and this is a very
common way to do it.) But because I've got tons of these actors (one per
stream that's actively running), there's no real bottleneck—this is also one
way you guarantee ordering within a stream.

Higher up in the stack, at the eventstore level, you'll probably have a pool
of processes handling writes and routing events to the correct
handlers/projections, but this is an implementation detail that can vary
widely. The advice to not get bogged down in the details early on is good—the
thing i see almost every developer new to ES do is get mired in the basic "how
do i implement this" question, and that's really not what's going to kill your
project. Not understanding nearly as much about your business and the software
boundaries in your system is where most projects go wrong.

~~~
Vinnl
> The first rule of eventsourcing is don't eventsource EVERYTHING.

I've heard this rule so often now, but never what exactly it is that you _can_
safely event source. I'm afraid it's something you can only get a feeling for
by doing it a few times and making mistakes, but that way of learning is hard
to justify in real projects.

~~~
karmajunkie
> I've heard this rule so often now, but never what exactly it is that you
> _can_ safely event source. I'm afraid it's something you can only get a
> feeling for by doing it a few times and making mistakes, but that way of
> learning is hard to justify in real projects.

see above:

>> My proxy for measuring this is "how much money does this group of features
make for me?"

That said, i think you're also correct—there's a lot of trial and error in
developing your instincts for this architecture. That's also true of
monoliths, web MVC, or really any architectural style. Experience, as they
say, is that thing you only get AFTER you needed it.

------
Scarbutt
Is this the same as what datomic does by default?

------
felixge
Maybe I'm missing something, but the final query in the article seems
complicated/inefficient for no apparent reason. IMO this one should be better:

    
    
      SELECT DISTINCT ON (message_id)
        message_id,
        status
      FROM emails
      ORDER BY message_id, created_at;

~~~
srt32
Great feedback! You are right and the post has been updated to reflect this
feedback!

------
fred256
It's late and I'm tired and far from a SQL expert, so this may be a silly
question, but why is the dense_rank needed instead of just using the event's
timestamp directly in the 2nd query?

~~~
pulisse
To break ties when there are multiple events with the same timestamp, I would
guess.

~~~
sudhirj
With or without a dense rank a second sort order would be necessary to break
ties.

