
Show HN: Sandglass – Distributed, scalable, persistent time-sorted message queue - celrenheit
https://github.com/celrenheit/sandglass
======
menacingly
It triggers my eyebrow when a highly available message queue offers either
exactly-once delivery or reliable time sorting.

I'm not saying I wouldn't give this project a closer look, but I would much
rather a product make it painfully obvious what compromises were made to offer
its availability.

My instinct is: either you need strict constraints on ordering and delivery,
in which case you use rabbit, or you need at-least-once-no-matter-what
semantics, in which case you use something else and make your app less
fragile.

~~~
meebs
Hey, what would "something else" be in this scenario?

~~~
menacingly
Usually it would advertise "at-least-once" somewhere prominently, like SQS,
NSQ, etc.

It's really hard to be extremely available and to also guarantee there won't
be dupes or out-of-order messages. Like databases, someone saying they offer
both is usually hiding a trade-off.

EDIT: I think here "really hard" is a polite way of saying "impossible"

------
manigandham
Looks interesting, might be a good replacement for the more enterprise
messages/queue systems that have all the typical ack/redelivery/scheduling
features as seen here.

It's worth mentioning the new Apache Pulsar messaging system which can replace
Kafka with pub/sub and queueing semantics while providing better scalability
and per-message acks, probably better suited to those who want a combined
system.

~~~
arcbyte
I checked out Pulsar and got completely lost in the multiple hierarchical
Zookeeper clusters.

~~~
manigandham
Pulsar supports multiple regions natively which requires separate Zookeeper
clusters for each region to manage the global and local cluster state (ie:
replicating messages from DC1 => DC2 but not DC3).

If you dont need/want that, then its just a single ZK cluster as with Kafka or
anything else. ZK + Brokers + Bookies = Pulsar.

------
adrinavarro
This looks promising.

I'm currently dealing with a queuing-related issue.

I have a series of tasks running across servers that consume from a queue and
run a task.

Often, these tasks die mid-execution (but can be resumed by any other server).
So, the queue is a database, and the running tasks "touch" a timestamp in the
database if they are still executing. When a database document hasn't been
updated for a while, the "consumption query" makes it so that it is
'redelivered' to an available server listening to the "queue".

Of course, this is subpar, but we haven't yet come across an elegant (and not
too over-engineered) way to replace this.

~~~
taspeotis
> we haven't yet come across an elegant (and not too over-engineered) way to
> replace this.

It's built into some RDBMS. SQL Server has READPAST [1, 2], so you can do:

    
    
        BEGIN TRANSACTION;
        DELETE TOP (1) QueueTable WITH (READPAST) OUTPUT deleted.* ORDER BY QueueId;
        -- Do your work
        COMMIT TRANSACTION;
    

And if your process dies midway through, the transaction is rolled back and
the row is immediately visible to another worker.

[1] [https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-
tra...](https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-
table) _READPAST is primarily used to reduce locking contention when
implementing a work queue that uses a SQL Server table. A queue reader that
uses READPAST skips past queue entries locked by other transactions to the
next available queue entry, without having to wait until the other
transactions release their locks._

[2] [https://docs.microsoft.com/en-us/sql/t-sql/queries/output-
cl...](https://docs.microsoft.com/en-us/sql/t-sql/queries/output-clause-
transact-sql#queues)

~~~
macdice
Also known as SKIP LOCKED (Oracle, PostgreSQL, MySQL).

------
eddd
These buzzword product descriptions are terrible. Documentation should say
first, what does this piece of code do and when should I use it. The actual
rationale for this project is buried down in the middle of the documentation
with a sparse 3 sentences.

> The first is to be able to track each message individually (i.e. not using a
> single commit offset) to make suitable for asynchronous tasks. > The second
> is the ability to schedule messages to be consumed in the future. This make
> it suitable for retries.

That's a start - I'd love to see how did you solve the problem? How your
solution compares to other similar products? Why should I care about stuff you
are mentioning? It is not an issue of not mature product - I think one should
_start_ with defining exactly and precisely what is being solved here.

When solving a technical problem you always have to tailor your solution to a
specific set of requirements and it is never like: "distributed, horizontally
scalable, persistent, time sorted message queue." So please stop using such
buzzwords.

~~~
aquadrop
Those aren't buzzwords, when used right. It's dense searchable technical terms
and are perfect for title. Do you think title should consists of two
paragraphs?

~~~
falsedan
I think that product pages should clearly communicate what the product is to
the intended audience. These are 100% buzzwords, so the message I get is, "you
will need to spend more time with this project to see whether it is worth
looking into".

The most convincing technologies have clear, simple value that's immediately
apparent (like zeromq not needing a central coordinator/exchange), or some
real-world use-cases which have been improved by using this technology
(actual, already-happened use-cases, not hypothetical: I'm more convinced by
words from people who've integrated with the product rather than the authors +
their innate bias).

I would have put the title as, 'open-source proof-of-concept messaging queue
(Go, single author)'

------
jwr
Aphyr's Jepsen or it didn't happen :-)

------
agnivade
Seems similar to faktory ?
[https://github.com/contribsys/faktory](https://github.com/contribsys/faktory)

~~~
richardknop
Obligatory plug of my own job queue in Go:
[https://github.com/RichardKnop/machinery](https://github.com/RichardKnop/machinery)

~~~
espadrine
Faktory is CouchDB, Machinery is Redis/Memcache/MongoDB, Sandglass is a custom
Raft on top of RocksDB…

I'd love to see comparative stress tests, Jepsen-like, to assess the ability
to survive partitions, corruption, node restart and node loss, to better
estimate the probability that a job gets lost.

~~~
richardknop
Machinery is basically a Go implementation of celery (popular Python task
queue).

It has two core components: broker (AMQP or Redis supported) and backend
(Memcache, Redis or MongoDB or even no backend if you don't care about storing
task states).

I think comparison with Sandglass would not be valid as Machinery is a higher
level job/task queue while Sandglass is a lower level library (basically a
message queue such as RabbitMQ which would be just a component of Machinery).

Faktory vs Machinery could be compared as they are on more or less same level
of abstraction.

------
nicois
I would think the imminent Redis streams data type would provide this better.
It is battle-tested and allows great customisation to a range of use cases.

~~~
skrebbel
How can it be both imminent and battle-tested?

------
dvdplm
What does Sandglass use for persistence? Is it using something like Rocksdb
under the hood or is the WAL and VL "homegrown"?

~~~
debarshri
It seems like it uses rocksdb for persistance.

[https://github.com/celrenheit/sandglass/blob/dev/storage/roc...](https://github.com/celrenheit/sandglass/blob/dev/storage/rocksdb/rocksdb.go)

------
tmp123tmp123
Nodejs sever? Segfault will cause data loss.

~~~
erulabs
er? This appears to be all Go, but regardless, dunno how a segfault means more
or less dataloss for Node than for anything else?

------
lloydatkinson
Why does everyone keep alluding to this replacing kafka? If anything this is
more similar to RabbitMQ.

------
velodrome
How does this compare with NATS?

~~~
manigandham
NATS is a purely pub/sub system. It has no persistence or messaging queuing
semantics outside of a "queue group" which is a simple round robin delivery of
messages sent to a topic.

~~~
piotrkubisa
Note: There is NATS streaming [1] which implements persistence and redelivery.

[1]: [http://nats.io/documentation/streaming/nats-streaming-
intro/](http://nats.io/documentation/streaming/nats-streaming-intro/)

[2]: [https://github.com/nats-io/nats-streaming-
server](https://github.com/nats-io/nats-streaming-server)

------
ninjamayo
Looking good. Question is how is it going to convince people to move over from
Kafka.

~~~
buro9
> Question is how is it going to convince people to move over from Kafka

That's a bit unfair given that the project didn't mention Kafka in it's README
and different products have different suitability at different scale and this
could simply be a "if your traffic is low and you need this functionality,
this will suffice" thing or just an academic interest in producing an ordered
distributed message queue.

But as you're asking: Proven ability to consume ~5+ million messages per
second with a similar or less hardware requirement than Kafka and high
reliability. Well document set of edge cases / compromises where applicable
and a high degree of observability. Well understood operational requirements,
and SRE runbooks (or just a lot of Github issues that go into how to handle
various scenario). An active community of people to assist, and more than 1
committer.

That's the "off the top of my head" thing. YMMV.

~~~
moreless
I have no opinion about each of these projects, but his caught my eye:

> Well document set of edge cases / compromises where applicable and a high
> degree of observability. Well understood operational requirements, and SRE
> runbooks (or just a lot of Github issues that go into how to handle various
> scenario). An active community of people to assist, and more than 1
> committer.

Are you talking about Sandglass or Kafka? Because Sandglass seems 3 months
old, has 1 contributor and is featured here as "Show HN"... So it probably
isn't as mature solution as Kafka is. Or am I missing something?

------
yassinebenyahia
this is awesome, is it meant to replace kafka ?

~~~
ddorian43
No:

The first is to be able to track each message individually (i.e. not using a
single commit offset) to make suitable for asynchronous tasks.

The second is the ability to schedule messages to be consumed in the future.
This make it suitable for retries.

