
How do you cut a monolith in half? - dedalus
http://programmingisterrible.com/post/162346490883/how-do-you-cut-a-monolith-in-half
======
andrewwharton
After reading Enterprise Integration Patterns [1] a few years ago, I've been
under the impression that message queues are the holy grail for decoupling
systems that you want to pull apart, but this offers a really nice perspective
on where they might not be a panacea. Thank you!

There's a couple of nice ideas in here (the "message pump" for one at least)
that I'm going to steal and use at work. It's also comforting to know that
we're not completely crazy for using a DB to store tasks/processes that need
to be run and retried if necessary instead of a message queue.

[1]
[https://en.wikipedia.org/wiki/Enterprise_Integration_Pattern...](https://en.wikipedia.org/wiki/Enterprise_Integration_Patterns)

~~~
balfirevic
I know what you meant by saying you use DB _instead_ of message queue but I
think it kind of points to a broader point - message queue is a concept and
the technology used to implement it is just implementation detail. It just
turns out that databases have all kinds of useful and powerful operations that
make implementing message queues relatively easy.

~~~
vkjv
If you don't need high throughput and otherwise don't have a reason to add a
separate technology to the stack just to have a queue, SQL can make a great,
simple and reliable queue!

* Postgresql and Oracle with skip locked

* MS SQL Server with readpast

* DB2 with skip locked data

If you use MySQL, it's going to be a bit more difficult.

~~~
burkemw3
I hadn't heard of skip locked before, and very much enjoyed this article
walking through it for postgres: [https://blog.2ndquadrant.com/what-is-select-
skip-locked-for-...](https://blog.2ndquadrant.com/what-is-select-skip-locked-
for-in-postgresql-9-5/)

~~~
swah
Love the headline "Most work queue implementations in SQL are wrong". That
kind of stuff makes me read an article...

This short screencast was also very helpful
[https://www.pgcasts.com/episodes/7/skip-
locked/](https://www.pgcasts.com/episodes/7/skip-locked/)

------
eldavido
I've been working in a related space for a few years and wanted to offer a few
counterpoints to the article.

First, if you're doing request/response using messaging, you're probably doing
it wrong. Pub/sub and request/response are totally different animals. I for
one, consider it both reasonable and necessary to use both side-by-side, in
the same infrastructure. (Is this view uncommon?)

In our technology stack, which is a monolith-becoming-microservices, we use
both pub/sub and request/response side-by-side. The general rule is that if
service A calls service B, if the nature of that interaction is such that
service B's response can preempt/interrupt/influence service A, the call needs
to be done inline, in service A. If the nature of the interaction is more
"advisory", use pub/sub.

Examples (from the hotel booking space): (a) When a reservation gets canceled,
we publish a cancellation event. The reservation is then CANCELED, officially.
A separate service sees the cancellation event and frees the associated held
room inventory; that's a pub/sub interaction. (b) When a reservation wants to
check in to a room, we check whether the room is already occupied. This has to
be done using request/response (in our case, gRPC) because if the room is
occupied, that's a hard gate on the success of the checkin.

Second, pub/sub != work queues.

Pub/sub is about distributing small bits of information all over the system
and letting things be advised of stuff. Using a messaging system as a work
queue is overall pretty stupid. I know it's common to use a message broker
like RabbitMQ for task distribution, but it's silly. It's _really_ silly when
the tasks themselves contain huge binary objects inline, as part of the
message payload. Store that shit in S3 or a proper system, and keep the
message payloads light.

I guess that's all for now.

~~~
omegaworks
What do you use for task distribution?

~~~
eldavido
Surprisingly, we don't any long-running tasks.

------
sametmax
This article is not just very well written, it's also very funny. Some gold:

> a message broker is a service that transforms network errors and machine
> failures into filled disks

...

> mark it as required in the database, and wait for something else to handle
> it. > > Assuming that something else isn’t a human who has been paged.

...

> Systems grow by pushing responsibilities to the edges

...

> A distributed system is something you can draw on a whiteboard pretty
> quickly, but it’ll take hours to explain how all the pieces interact.

I love when you can mix technicals and not taking yourself seriously.

Although be careful with using DB as a task queue. Concurrency is a b* and
message brokers are very good at it. AMPQ has been created because the authors
started with a BD message broker and it didn't work.

A task queue is message broker + persistance + status. Celery does that very
well in the Python word, and works with rabbitmq, redis, postgres, etc.

What amaze me the most is autobahn + crossbar.io. It does PUB/SUB, RPC, load
balancing and all the stuff for Python, JS, PHP, C#, Java... And it works even
in the browser. Cool stuff.

~~~
marcosdumay
> Concurrency is a b*

I have to say. I spent way too much time trying to understand what b* trees
have to do with concurrency, and what system you are using that implements
them.

~~~
102030485868
I interpreted that as, "Concurrency is a bitch...".

~~~
marcosdumay
Yeah, that's the correct interpretation. I got there, eventually.

------
bjflanne
A follow up from a reader on this article: [http://bravenewgeek.com/smart-
endpoints-dumb-pipes/](http://bravenewgeek.com/smart-endpoints-dumb-pipes/)

------
BenoitEssiambre
It seems to me that there are great benefits to be had if you can keep the
transactional integrity of direct connections to a relational database like
PostgreSQL. The performance and scalability is often better than people expect
because the data stays closer to CPUs and it avoids moving too much across
multiple nodes on the relatively slow and unreliable network.

In a lot of cases, there are natural separation lines such as between groups
of customers you can use to shard and scale things up. Unless you are building
something like a social network where everything is connected to everything,
you don't need a database that runs over a large cluster or clustered queues
in between components. These are often just more moving parts that can break.

~~~
smilliken
The benefits of transactional queues in your database are hard to overstate;
commit the result of the task in the same transaction as you commit the queue
update. Don't worry about idempotency, lost messages, or duplicate messages.

I suspect the advice to avoid it because of performance has become invalid for
all but extreme use-cases. My company has dozens of high activity 1-100M item
queues in single postgresql databases. It works great.

------
eddd
Queue is not the holy grail, especially if you want to put something on the
queue during the db transaction. Be aware, that once you do I/O (some RPC or
queue interaction) you loose the ACID in the db, you will make the system much
less reliable this way.

The way to do this, it to publish on queue _after_ transaction, but also be
sure that the action won't be lost in the process.

[https://martin.kleppmann.com/2015/04/23/bottled-water-
real-t...](https://martin.kleppmann.com/2015/04/23/bottled-water-real-time-
postgresql-kafka.html)

[http://dataintensive.net/](http://dataintensive.net/)

------
nurettin
The term SOA is always mentioned with the underlying connotation that a client
requests a resource and somehow there is an extra step of "service discovery"
which has it's own set of problems that branch out into various fields of
physics and mathematics.

When I see such branching complexity, I often think that the architecture is
somehow wrongly backwards and try a simple reversal of responsibilities. In
this case, the services would be looking for a job to complete, in effect
turning the afromentioned step into "job discovery" which is indeed the kind
of architecture I've been applying the past decade.

Seems to be working well so far. Back pressure is handled at the front gate as
none of the services pick up on the job, involuntary synchronization is still
a problem, but avoidable by cleverly re-ordering the job queue. The job
completion is communicated back to the front gate through pub/sub and the
anecdotal evidence so far has been great.

~~~
sciurus
Isn't your architecture exactly what the author is arguing against?

~~~
nurettin
Because I use pub/sub instead of polling the database? I thought that was just
common sense.

------
Terr_
Dealing with fairly low-volume stuff, my concern about using the database
alone has to do with who owns what schema and how changes are managed.

Being able to send off a rich asynchronous message is nice because it means
you do not need to have some co-owned table in a shared database that two
different components are reading and writing from.

Or, worse, a widespread pattern of every service exposing a piece of its
database to other services with an unsatisfying level of logging or control
for what really happens.

~~~
woodrowbarlow
not all databases require rigid schemas. a filesystem, for instance.

~~~
Terr_
I don't understand what distinction you're trying to raise.

My point is that a message-queue allows you decouple systems, whereas having
two systems share the same database tables has its own kind of peril.

------
asmosoinio
Money quote in my humble opinion:

> In practice, a message broker is a service that transforms network errors
> and machine failures into filled disks.

~~~
Terr_
Of course, the alternative before that may have been "any network error or
machine error brings everything down".

When it's asychronous, it's easier to tolerate random errors or downtime, but
the cost is that you have to store the message _somewhere_...

------
mavhc
Expected story about 2001, was disappointed.

Good overview from my perspective of not knowing much about such systems.

~~~
cpeterso
I was expecting a geometry puzzle about subdividing a 1:4:9 monolith into
smaller 1:4:9 monoliths. :)

------
FigmentEngine
just before the "l"

------
douche
Hmm, I submitted this yesterday, but it got lost in the scrum.

It's always interesting to hear real reports from the trenches that aren't
essentially ads for technology XYZ. I'm afraid that all too often, we make
things more complicated for bad reasons; whether ignorance, chasing the latest
trend, or resume-driven-development...

~~~
robertlagrant
Message-oriented stuff has been around a long time, so I don't think it's a
fad.

It's basically taking the concept of "integration" (or API) itself and
creating a product for it, just as a database is a product for the concept of
persistence. Thus just as not every application has to reinvent a database,
with a messaging product not every product has to reinvent queueing up
integration calls if the target system isn't available.

The article also mentions request-reply being "what you really want", I think
that this is a) not true, as a lot of the time you can fire and forget, and b)
when you need it the products generally provide a request-reply API on top of
their lower-level APIs. No need to reinvent.

~~~
scaryclam
I got the impression that the author didn't really want a message based system
at all, but rather a request response system that they tried to impliment
within a message broker. Of course that's going to create more headaches than
it solves; it's the wrong solution.

Both message brokers and request-response code have their places in
distributed systems, but they really need to learn when each is appropriate.

I agree, saying that request-reply is "what you really want" was kind of
silly, especially after the opening paragraph that states "it depends".

~~~
ambicapter
Can you elucidate the differences between message-broker and request-response
use cases in your eyes?

~~~
marcosdumay
In my eyes (where the GP makes perfect sense), a message broker is
asynchronous, there's no implicit wait while your consumers work on our
request. A request-response interface will stop the producer until the
consumer is done.

