We've started using message queues to decouple legacy code from new code in order to migrate from one platform to another. I modelled our implementation after AMQP so we could trivially switch to either RabbitMQ, Azure Service Bus or similar down the line.
However one thing that seems missing from the standard is retry policies. Sure you could just abandon the message and rely on the timeout, but I'd prefer an exponential backoff in case some external service is down for an extended period to avoid things bogging down.
Are there some standard ways I've missed, or do folks rely on proprietary extensions or extra services for this?
It's not exponential backoff, but I've done this in RabbitMQ with some queue weirdness.
I have the main queue with a reasonably short timeout (30 seconds). That queue is set up with has a dead-letter queue where failed messages that don't get ACKed get moved to.
The dead-letter queue has a TTL of ~5 minutes, where it's dead letter queue is the original queue.
So basically, if a message fails a worker, it gets kicked over to the dead-letter queue, which then moves it back to the main queue after the TTL times out. This foes mean a crashing message will fail forever (so you have to keep a careful eye on how many messages are in the dead-letter queue), but I've managed to work around this so far. Or you can use proprietary extensions (x-delivery-attempts).
> Are there some standard ways I've missed, or do folks rely on proprietary extensions or extra services for this?
As a hack, you can always have your library run its own retry by doing an atomic ack-and-resend-to-the-future (though you need to have bits for retry count if you want exponential back off). And there's situations where it doesn't work well, if the message handler itself crashes too hard on failure.
I mean yea I can do a lot myself given it's my implementation, but I was hoping to keep our messaging code fairly generic so it'd be easy to use either RabbitMQ or Service Bus, depending on if customer wanted on-prem or hosted installation for example.
However one thing that seems missing from the standard is retry policies. Sure you could just abandon the message and rely on the timeout, but I'd prefer an exponential backoff in case some external service is down for an extended period to avoid things bogging down.
Are there some standard ways I've missed, or do folks rely on proprietary extensions or extra services for this?