
A dull, resilient stream processor in Go - ngaut
https://github.com/Jeffail/benthos
======
jeffail
Hey everyone, author here. We use Benthos as a general stream swiss army knife
for all the dull tasks, but you can also use it as a framework for writing
your own stream processors in Go.

It doesn't provide any tools for idempotent calculations yet, just at-least-
once delivery guarantees. But when you use it as a framework you get the
benefit of all the configurable processors in your new service.

~~~
techno_modus
After reading the section about processors it is not clear whether it can do
stateful computations, for example, running sum or moving average. Can I
define sliding windows? Can I join two or more streams?

~~~
asavinov
For stateful processing you need something like:

* [https://kafka.apache.org/documentation/streams/](https://kafka.apache.org/documentation/streams/) \- Kafka Streams (and its KTable)

* [https://flink.apache.org/](https://flink.apache.org/) \- Flink

* [https://spark.apache.org/streaming/](https://spark.apache.org/streaming/) \- Spark Streaming

* [https://github.com/asavinov/bistro](https://github.com/asavinov/bistro) \- Bistro Streams

The focus of Benthod seems to be performance and resilience (implemented
without persistence).

~~~
spooneybarger
Or to toot the horn of the project I work on:

* [https://github.com/wallaroolabs/wallaroo](https://github.com/wallaroolabs/wallaroo) \- Wallaroo

------
gbrayut
Interesting! We use NSQ at work to collect logs/messages from various systems
and forward them to various endpoints (Kafka, ELK, archives, Prometheus). This
looks like a more advanced/flexible system, where as we just use a handful of
single purpose daemons writen in go (nsq2kafka, nsq2es, nsqarchive, nsqstream-
metrics).

I gave a quick intro to nsq talk at our local go meetup recently. If you
haven't used NSQ before I highly recommend giving it a try.

[https://docs.google.com/presentation/d/1e9yIm-0aNba_H1gX_u7D...](https://docs.google.com/presentation/d/1e9yIm-0aNba_H1gX_u7D1Qe5VWeU26FCjg7EEzoUztE/edit?usp=sharing)

------
devbug
Talk about timing!

I'm in the process of rejigging our live telemetry pipeline (dealing with
petabytes) to improve flexibility and resiliency and have been looking for
something to replace our similar systems that have grown organically. This
fits the bill exactly; it's designed exactly how I imagined our new system
should look.

I'm wondering if HN has their personal favorite for these sorts of problems.

~~~
retzkek
We've been using logstash as our go-to stream processing glue, its large
selection of input, output, and filter plugins lets it handle just about
anything, reliably. My one complaint would be the use of the jvm, particularly
the slow startup time it causes when doing development.

------
ngaut
Supported Sources & Sinks: Amazon (S3, SQS), Elasticsearch (output only),
File, HTTP(S), Kafka, MQTT, Nanomsg, NATS, NATS Streaming, NSQ, RabbitMQ (AMQP
0.91), Redis, Stdin/Stdout, Websocket, ZMQ4.

------
xer0x
I must be missing something. It doesn't look dull.

------
clon
Looks like this could be useful for webhooks delivery, a problem our team is
working on at the moment.

We have 100's of tenants, each of which could get their own stream with some
delivery guarantees set. If one tenant's endpoint is down or another tenant
fills the stream with 1M messages, it should not affect delivery rates of
other tenants. Seems to fit the bill.

I see the HTTP output mentions some retries, but I guess these run as part of
the delivery step, blocking this goroutine as opposed to rescheduling the
message? Sometimes it takes hours for clients to restore their receiving
systems and it would be great if messages for past N hours would still be
delivered..

~~~
jeffail
Hey, the HTTP output has a fixed number of retries, after which you could
either have some mechanism in place to fall back on or by default it will
simply continue the retries again whilst blocking upstream. You might also be
interested in running Benthos in streams mode:
[https://github.com/Jeffail/benthos/tree/master/docs/streams](https://github.com/Jeffail/benthos/tree/master/docs/streams)

Streams mode lets you run as many isolated stream pipelines as you want in the
same process, which in your case could be a simple queue -> webhook bridge.
You can manage these pipelines either statically in config files, or
dynamically through a REST API.

~~~
clon
That seems really promising, especially configuring pipelines with API calls.
Many wheels may be left uninvented.

Thank you for making your work available!

------
guywhocodes
This seems pretty similar to Wallaroo but with more focus on connecting to
everything. And of course golang vs Pony

~~~
jeffail
Hey, Benthos doesn't yet provide any general tooling for exactly-once
processing like Wallaroo does, that's possibly a goal for the future.

My main focus has been providing general purpose stateless processors. So you
can build a your own stream processor focusing on what makes it unique, and
then the moment it's compiled and packaged it can read and write to anything,
and convert any kind of payload to anything else just through configuration.

~~~
spooneybarger
Hi Wallaroo developer here,

I'm curious, does Benthos manage state for the user/application like we do in
Wallaroo or is it purely stateless computations at this point?

~~~
jeffail
Processor implementations are able to carry their own state, or share state
across worker threads or deployments using their own mechanisms, but Benthos
doesn't provide any tooling for that. None of the processors you get out of
the box need computational state. You currently get an ALO stream (provided
you use ALO protocols), vertical & horizontal scaling as per your config, and
any glue you need between services.

It would be a nice stretch goal to have standard tooling within Benthos to
share distributed state, perhaps with some ability to do exactly-once
processing, but that's not the focus of the project right now.

------
jdormit
Is "dull" a technical term or does it literally just mean boring here?

~~~
jeffail
Yeah just boring. Benthos is mostly a collection of standard processors for
doing boring stuff.

------
advanced__pizza
In what situations would a tool like this be useful? Asking honestly. Thanks.

------
packetized
This seems quite a bit like Heka. Very excited.

------
kanwisher
Super exciting, I can imagine using this instead of Kafka for more simple use
cases. Wonder if they will add a full logging system so we can dump Kafka
completely

~~~
pram
That doesn’t seem to fit with the project goal, it’s not Kafka-centric. It’s
more like an alternative for Kafka Connect in that regard imo.

------
agallego
did you think of using the Go apache beam API ? any comparisons with oss
alternatives?

