Hacker News new | past | comments | ask | show | jobs | submit login

Having spent 7 year of my life implementing an Exactly-Once-In-Order (EOIO) messaging system in SQL Server Service Broker[0] (SSB), I take somehow exception to the author claim.

Here is how SSB achieves EOIO:

- initiator establishes intent to communicate using BEGIN DIALOG[1] statement (SSB dialogs are the equivalent of a durable, long lived, TCP session). This creates the necessary state in the database by creating a row in sys.conversation_endpoints, with initial send_sequence_number 0.

- sender issues SEND[2] statement. The message is assigned the current send_sequence_number (0), the sys.conversation_endpoint send_sequence_number is incremented to 1, and the message is inserted in sys.transmission_queue

- after commit of SEND transaction a background transmitter reads the message from sys.transmission_queue, connects to destination, delivers the message over wire (TCP).

- target reconstructs message from wire, in a single transaction creates a row in it sys.conversation_endpoints (receive_sequence_number is 0) and delivers the message into the destination queue

- after commit of the message delivery, the target constructs an acknowledgement message and sends it back to initiator over the wire

- the sender gets the acknowledgement of message 0 and deletes the message from sys.transmission_queue

- sender may retry delivery periodically if it does not receive the ack

- target will send back an ack immediately on receipt of a duplicate (message sequence number is less than current receive_sequence_number)

What this protocol achieves is idempotency of delivery, hidden from the messaging application. Database WAL ensures stability in presence of crashes. Eg. if the target crashes in the middle of enqueueing the message into destination queue then the entire partial processing of the enqueue is rolled back on recovery and next retry from sender will succeed. If target crashes after processing is committed but before sending the ack then on recovery the enqueue is successful and the next retry from initiator will immediately send ack an ack, allowing initiator to delete the retried message and make progress (send the next message in sequence). Note that there is no two-phase-commit involved.

Retries of unacknowledged messages occur for the dialog lifetime, which can be days, months, even years. Databases are good at keeping state for so long. SSB uses logical names for destination (from 'foo' to 'bar') and routing can be reconfigured mid-flight (ie. the location where 'bar' is hosted can be changed transparently). Long lived state and routing allow for transparent reconfiguraiton of network topologies, handle downtime, manage disaster (target is lost and rebuild from backups). Most of the time this is transparent to the SSB application.

Furthermore, the guarantees can be extended to application semantics as well. Applications dequeue messages using RECEIVE[3] statement. In a single transaction the application would issue a RECEIVE to dequeue the next available message, lookup app state perteining the message, modify the state, send a response using SEND[2], commit. Again WAL guarantees consistency, after a crash everything is rolled back and the application would go again through exactly the same sequence (the response SEND cannot communicate anything on the wire until after commit, see above).

So EOIO is possible.

One has to understand the trade offs implied. Something like SSB will trade off latency for durability. Applications need not worry about retries, duplicates, missing messages, routing etc as long as they are capable of handling responses comming back hours (or maybe weeks) after the request was sent. And application processing of a message is idempotent (RECEIVE -> process -> crash -> rollback -> RECEIVE -> re-process) only as long as the processing is entirely database bound (update states in some app tables, not make REST calls). Yet such apps are not unusual: they use some database to store state and communicate with some other app that also uses a database to store state. Unlike most messaging systems, SSB stores the messages in the database thus achiving WAL consitency along with the app state. Many SSB applications are entirely contained in the database, the code itself is contained. They use SSB activation [4] to react to incoming message, without keeping any state in memory.

In SSB both the initiator and the sender are monolitic, SMP systems (not distributed). Together the two form a distributed system. It trades of availability over partitioning, but one has to understand how this trade off occurs. In case of parittioning (target is unreacheable) the application continues to be available locally (the SEND statement succeeds). If the netowrk paritioning is not resolved over the lifetime of the dialog, then the applicaiton will see an error. If the paritioning is resolved then message flows resumes and the applicaiton layer responses start showing up in the queue. Again, activation and durable state make this easy to handle, as long as a latency of potentially days makes sense in the business. Shorter lifetimes (hours, minutes) are certainly possible and in such cases, if network partitioning is not resolved in time, the timeout error will occur sooner.

  [0] https://msdn.microsoft.com/en-us/library/bb522893.aspx
  [1] https://msdn.microsoft.com/en-us/library/ms187377.aspx
  [2] https://msdn.microsoft.com/en-us/library/ms188407.aspx
  [3] https://msdn.microsoft.com/en-us/library/ms186963.aspx
  [4] https://technet.microsoft.com/en-us/library/ms171617.aspx

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact