
Show HN: Workq – Job Server in Go - iamduo
https://github.com/iamduo/workq
======
AdamN
Workq would be far better if it followed the AMQP
([https://www.amqp.org/](https://www.amqp.org/)) or SQS
([https://aws.amazon.com/sqs/](https://aws.amazon.com/sqs/)) specs and
terminology. There's already a large body of work on those two protocols. SQS
is much simpler than AMQP so it's a good first-step.

~~~
AdamN
Just to clarify my own comment, I would want to see attributes like TTL made
part of the queue, not part of the job.

~~~
gtaylor
Google PubSub has TTLs on each message, with the queue having the default
value for new messages. I've found this to work really well in practice.

------
brianolson
I wrote one of these, almost exactly, at my last $dayjob, had to check and see
if it was them but opensourcing it. Alas not. Oh wait, found it, here's
another Go task-queue-with-priority-and-stuff daemon:
[https://github.com/diffeo/go-coordinate](https://github.com/diffeo/go-
coordinate)

------
chrishacken
How would you specify a job with multiple parameters? You use "ping" as the
example. If I wanted to run "ping -c 20 10.10.10.10", how would I accomplish
that? It's not immediately obvious if that is possible.

From going through the source, it looks like the payload is the cmd. Can those
be multiple words or will it read each word as a separate argument.

handler := s.Router.Handler(cmd.Name)

reply, err := handler.Exec(cmd)

~~~
noselasd
This is more or less a message queue, one side (client) sends a message, the
other side (worker) receive and act on it. But structured around the common
use case of someone submitting a "job" to be performed(whenever possible, or
at a certain time) and workers picking up and performing the job, reporting
back the outcome.

"ping" here is just a message, not the name of a system command/executable -
in this case the worker receives the "ping" request, and just replies with a
"pong" message - both sides are code you need to write.

You can encode whatever you want in the payload, json, simple plaintext - it's
up to the client and the worker to agree on the meaning.

~~~
chrishacken
Thank you, that clears up my confusion.

------
bryanlarsen
The synchronous job processing is something we had to add to our beanstalkd
client, but none of the other enhancements are useful to us.

The biggest limitation of beanstalkd IMO is the fact that robustness features
have to be handled by the client. That's why we're considering switching to
disque or alternatives. It doesn't look like workq supports this, and it's
likely something that should be designed in from the ground floor.

But beanstalkd's proven history is a huge mark in its favour that we'd be
loathe to give up.

~~~
iamduo
Can you clarify the missing robustness features?

~~~
bryanlarsen
failover

------
alooPotato
Would it be possible to model dependencies between jobs using this? I.e. only
run job X if job Y succeeds?

We're building an in house CI system (we have some weird requirements and
can't use off the shelf ones) and we'd love to add an entire job graph to this
queue and be able to query the state of it.

~~~
iamduo
It is possible, but explicitly through workers. You can have a worker block on
the "result" command (it allows for a wait-timeout) and wait for the
successfully completion of Job X and enqueue Job Y.

There is no way to _define_ the dependency automatically, the workers can
create any type of workflow though, including if Job X fails.

------
andrewfromx
What about graceful restarts? Like can I kill -USR2 the pid of this and have
it stop listening on that port, launch another version of itself (new binary)
that does net.FileListener(f) vs. net.Listen("tcp", url) but keep running
until all current jobs are done?

~~~
iamduo
A big yes to this! Signal handling will be covered in a future update. This is
critical to Workq, being that it is intended to run as a standalone. A zero
downtime deployment is required. There will be a more in-depth roadmap in the
repo soon. Someone earlier just asked for it also.

~~~
andrewfromx
cool, i just wrote all this go code for paradise if u wanna steal it :)
[https://github.com/andrewarrow/paradise_ftp/blob/master/serv...](https://github.com/andrewarrow/paradise_ftp/blob/master/server/starter.go)

------
jalfresi
This appears to be similar to beanstalkd but with some nice additions
(scheduled time for job to run, jab max existence time).

One of the things I awlays wanted beanstalkd to have was an atomic move-tube
command, so you could emulate a state machine using queues and tubes

~~~
iamduo
That is correct, beanstalkd was an inspiration for this project. I've even
credited it here:
[https://github.com/iamduo/workq#credits](https://github.com/iamduo/workq#credits).
Could you describe what the use case of the move-tube command was? More
specifically on what it was trying to accomplish within the state machine?

Internally Workq, for simplicity, does not have any separates "tubes" however.
Just a job pinned by its name.

~~~
jalfresi
The idea was that you would reserve a job in a tube, do some work, then move
that job to another tube upon success atomically, without the need to delete
the job, then create the job in the new target tube. The problem is that if
you create the job in the new tube before deleting, the TTL could kick in and
return the job to the original tube, meaning you have the same job in both
tubes. The other alternative is to delete the job then put it in the target
tube. However, if your worker process dies during this step, you may end up
losing the job.

~~~
snovv_crash
It sounds like what you need is some sort of transaction support so that
deletes and adds only happen if both succeed.

~~~
jalfresi
Which would make things a lot more complicated - vs. issuing a move command on
a job id where the server would be responsible for making sure the atomic move
succeeded.

It's not a difficult feature to implemented (in fact if I recall there is a
pull request open for this feature in beanstalkd) and IMHO would open up a lot
of interesting use cases.

~~~
iamduo
Is there a reason why there aren't 2 types of jobs? One for each stage? You
mentioned the first stage there is some work performed, then it sounds like
you need to pass on the work again to another worker for the second stage.

------
stevewilhelm
May I suggest NATS Queuing as an alternative. [1]

[1] [http://nats.io/documentation/tutorials/nats-
queueing/](http://nats.io/documentation/tutorials/nats-queueing/)

~~~
fasteo
We evaluated NATS some years ago, but the lack of delayed messages stopped us
from replacing our beloved beanstalkd.

Looked very good though

------
caleblloyd
Very nice work, I love the simplicity. With a reliable persistence layer, this
could be a viable option for a production job queue.

What are the plans for persistence? Persist to disk? Or pluggable storage
backhends? Disk, Redis, and SQL options would be cool!

~~~
iamduo
Thanks. Simplicity was the main goal and I took many passes eliminating cruft.
The initial plans for persistence is to disk, in the form of a Command Log.
Similar to VoltDB's Command Log or Redis' AOF. A simple approach to
durability.

There will be some sort of interface for the storage. I'll keep in mind
pluggability. It is something Gearman had back in the day also[0]. Most likely
it will be persistence to disk for some time and once more clarity comes out,
possibly pluggability.

[0]
[http://gearman.org/manual/job_server/](http://gearman.org/manual/job_server/)

------
iamd3vil
If I am not wrong, a job means a message you are sending such that different
workers can pick it up. Correct me if I am wrong. What's the difference
between something like RabbitMQ and this? Genuinely curious.

~~~
iamduo
RabbitMQ can do much of what Workq can do from a purely messaging standpoint
since input and output looks about the same.

Workq is built on the higher level concept of a job so the feature set is
refined around what a job is. In Workq, a job must successfully complete or
fail, and optionally a result passed back to the client. A job can be retried
when it has timed out or even when it has explicitly failed outright (maybe
there was an temporary error with your API provider..etc). You can say: retry
the job if it has timed out up to 5x BUT let it explicitly fail only once.
These small refinements help streamlined the concept of processing a job
fully, not just a blob of message.

Also there are some other key things such as job scheduling based on time
which don't exist in RabbitMQ, but are usually offered in libraries such as
DelayedJob..etc.

------
karmakaze
Just curious what advantages a TCP text interface has over HTTP with JSON
which would typically be my default.

~~~
iamduo
The text interface primarily has simplicity on its side. The goal was to
implement a set of commands with a small footprint. HTTP would have provided
"too much" for me to worry about in terms of designing the commands. However,
HTTP2 was considered at one point, but it proved to be "too much" at the time.

From my own experience, the text commands are easier to test against,
especially its boundaries (inputs to the server) which makes client
development significantly easier.

~~~
karmakaze
I can see how designing the command language can be simpler and cleaner in
text than say XML. HTTP offers so much tooling, I might have gone with text
over HTTP or JSON with a `command` entry.

As for HTTP2, isn't that handled by the HTTP server/client implementations and
is from the application code the same as HTTP?

~~~
iamduo
I can definitely agree on the HTTP tooling!

As for the HTTP2 portion, the server details would be abstracted out
especially with HTTP2 support in Go 1.6. At the time I looked, HTTP2 clients
for various languages were still popping up and stabilizing (I think they
still are). I didn't want that to be a factor when I was developing clients
outside of Go (for example PHP). In addition, an important goal was to develop
extremely small clients, where I understood exactly what was going over the
wire.

If there are enough direct tooling benefits that HTTP2 can offer, it would be
fun to experiment with it as an alternative interface. Funny enough, the first
prototype name of the project was "httpq".

------
spriggan3
> In-memory only for now, disk backed durability is on the roadmap.

> Job payload & results are limited to 1 MiB each.

> Workq servers are standalone and do not speak to each other.

i.e. don't use it in production. This is a nice proof of concept, but let's
not pretend that it is a professional grade product at the moment.

~~~
saghul
I'm not the author, but I don't think anyone here pretended that it's a
"professional grade product". From the actual README: "Workq is in alpha
status and not yet stable."

~~~
bryanlarsen
Sure, but it looks like he intends it to eventually be a professional grade
product. Disk-backed durability can be added, but adding distributed/failover
capability should be designed in from the beginning.

And nobody's ever going to trust your distributed capability if you implement
RAFT yourself. You need to build upon something trusted or convince Aphyr to
run Jepsen on your implementation. etcd is a common thing to build upon, and
even it is only partially trusted.

~~~
eitland
> And nobody's ever going to trust your ...

I guess that could be said for a number of things that people successfully do?

Edit: for one great example of someone who didn't get discouraged by
naysayers, check caddyserver.

~~~
bryanlarsen
Great example. Nobody's going to use caddyserver in production for a major
site for a couple years for that reason. Right now it's widely used on 'hobby'
sites. A couple of years of good track record on hobby sites will let it be
trusted enough to run on more mission critical sites.

And caddyserver isn't a distributed service, so the level of trust required is
much lower.

Distributed services are difficult to get right for a wide variety of reasons,
as shown by Aphyr's Jepsen tests.

~~~
eitland
It is however already starting to get developer traction and mindshare, even a
few donations if I am representative for the userbase, something it would
never do now if sat waiting for the perfect timing to release a perfect
product.

Also I think the README of workq didn't mention anything about raft, which is
fine, you can go very far without it.

My point is: encourage people to write code! Don't infect people with
paralysis-by-analysis.

------
fasteo
It seems to offer the same feature set as beanstalkd.

We have been using beanstalkd for years to process gazillions of jobs without
an issue and I guess many readers are in the same position.

Why would I choose Workq over beanstalkd ?

~~~
iamduo
Workq is similar to beanstalkd and was modeled after many of its concepts
especially TTR and reserve
([https://github.com/iamduo/workq#credits](https://github.com/iamduo/workq#credits)).

The one feature which may not be obvious yet is the ability for workers to
mark a job successfully completed or failed with a result and then retrieve it
later. The workflow looks like this:

* Client A: Backgrounds a Job A

* Client A: Backgrounds a Job B

* Client A: Backgrounds a Job C

\----

* Worker A: Picks up Job A + Completes

* Worker B: Picks up Job B + Completes

* Worker C: Picks up Job C + Completes

\----

* Client A: Picks up the result for Job A,B,C.

This allows a single client to concurrently process multiple jobs within a
single process and retrieve its result. This is what I like to call "Gearman
mode"[0], as it was modeled after that project also. Useful in languages that
do not have well defined concurrency. This is a niche use case and may not be
needed by everyone, but very useful as soon as you need it. This will become
more obvious when I have clients for these languages.

Lastly there are some subtle enhancements such as retry support and
synchronous processing (submit and wait for result).

Thanks for the great question. This is a very popular question and I will FAQ
it.

[0] [http://gearman.org](http://gearman.org)

------
deforciant
thanks, currently working on one hobby project and skipped job queue part
because I thought I will need some more consideration :) Now it looks like
this could fill that gap. Regarding persistence - how about adding
[https://github.com/docker/libkv](https://github.com/docker/libkv) ? For
single node - BoltDB backend is more than enough and once you want to go
distributed - just switch to Consul/Etcd.

------
hendler
related: go message queue at [http://nsq.io/](http://nsq.io/)

------
tbarbugli
Why did you decide to build the queuing/messaging part? There are plenty of
solid options out there (eg. RabbitMQ)

------
sfrailsdev
Hmm...are we at the point now where someone is going to reimplement the redis
server in Go?

~~~
iamduo
Heh yes... [http://ledisdb.com](http://ledisdb.com)

------
amelius
Can it stream data from jobs? For example, to monitor progress?

~~~
iamduo
There is no individual progress streaming at the moment. It was on the drawing
board, but was slashed for simplicity and time. Just curious, what would you
use it for in your case?

~~~
amelius
I would use it to show the progress to a user :)

For example: 68% done

Or: ETA: 15 minutes

------
igtztorrero
I like it

------
ilackarms
How does this differ from Mesos or Docker Swarm?

~~~
cdnsteve
It's a job queue with server and clients.

