Hey everyone, author here. We use Benthos as a general stream swiss army knife for all the dull tasks, but you can also use it as a framework for writing your own stream processors in Go.
It doesn't provide any tools for idempotent calculations yet, just at-least-once delivery guarantees. But when you use it as a framework you get the benefit of all the configurable processors in your new service.
Now that is how you advertise at me. I've seen more "minimal flexible generic configurable simple performant powerful easy extensible non-bloated" libraries than I can shake a stick at. But "this helps with the dull stuff"... that's a sales pitch I can get behind!
(A constructive hint to those describing their libraries: At this point, use of pretty much any of those words in your summary sentence is a massive negative to me. They are of null content, and you just burned the most valuable marketing space you had on these null-content words. Everybody says that about their libraries. Tell me what it does in the summary. If you really want to make the point for one of those words, do it later in the documentation, because if your summary sounds like you copy/pasted my quoted bit above, I'm not even going to read your extended documentation. And when you do, provide evidence; if you're "performant", show me competitive benchmarks. I get that they're going to be biased, but it at least shows me what's important. If you're "simple" show me in the sample code. Etc.)
As someone who is getting into Golang quite a bit I found it very interesting to peer into the thought process that led to Benthos and how it was/is implemented.
After reading the section about processors it is not clear whether it can do stateful computations, for example, running sum or moving average. Can I define sliding windows? Can I join two or more streams?
All Benthos processors are stateless, except for a few such as batch, combine, etc, which do basic tasks such as building up batches from consecutive messages. It's possible to create sliding window processors on top of Benthos, but there aren't any general tools for doing that yet.
Interesting! We use NSQ at work to collect logs/messages from various systems and forward them to various endpoints (Kafka, ELK, archives, Prometheus). This looks like a more advanced/flexible system, where as we just use a handful of single purpose daemons writen in go (nsq2kafka, nsq2es, nsqarchive, nsqstream-metrics).
I gave a quick intro to nsq talk at our local go meetup recently. If you haven't used NSQ before I highly recommend giving it a try.
I'm in the process of rejigging our live telemetry pipeline (dealing with petabytes) to improve flexibility and resiliency and have been looking for something to replace our similar systems that have grown organically. This fits the bill exactly; it's designed exactly how I imagined our new system should look.
I'm wondering if HN has their personal favorite for these sorts of problems.
We've been using logstash as our go-to stream processing glue, its large selection of input, output, and filter plugins lets it handle just about anything, reliably. My one complaint would be the use of the jvm, particularly the slow startup time it causes when doing development.
Looks like this could be useful for webhooks delivery, a problem our team is working on at the moment.
We have 100's of tenants, each of which could get their own stream with some delivery guarantees set. If one tenant's endpoint is down or another tenant fills the stream with 1M messages, it should not affect delivery rates of other tenants. Seems to fit the bill.
I see the HTTP output mentions some retries, but I guess these run as part of the delivery step, blocking this goroutine as opposed to rescheduling the message? Sometimes it takes hours for clients to restore their receiving systems and it would be great if messages for past N hours would still be delivered..
Hey, the HTTP output has a fixed number of retries, after which you could either have some mechanism in place to fall back on or by default it will simply continue the retries again whilst blocking upstream. You might also be interested in running Benthos in streams mode: https://github.com/Jeffail/benthos/tree/master/docs/streams
Streams mode lets you run as many isolated stream pipelines as you want in the same process, which in your case could be a simple queue -> webhook bridge. You can manage these pipelines either statically in config files, or dynamically through a REST API.
Hey, Benthos doesn't yet provide any general tooling for exactly-once processing like Wallaroo does, that's possibly a goal for the future.
My main focus has been providing general purpose stateless processors. So you can build a your own stream processor focusing on what makes it unique, and then the moment it's compiled and packaged it can read and write to anything, and convert any kind of payload to anything else just through configuration.
Processor implementations are able to carry their own state, or share state across worker threads or deployments using their own mechanisms, but Benthos doesn't provide any tooling for that. None of the processors you get out of the box need computational state. You currently get an ALO stream (provided you use ALO protocols), vertical & horizontal scaling as per your config, and any glue you need between services.
It would be a nice stretch goal to have standard tooling within Benthos to share distributed state, perhaps with some ability to do exactly-once processing, but that's not the focus of the project right now.
That is a very interesting prospect actually, any plans on making python integration simple? Some parts of my pipelines rely on very large python modules I don't see being rebuilt in anything else in the next few years.
Super exciting, I can imagine using this instead of Kafka for more simple use cases. Wonder if they will add a full logging system so we can dump Kafka completely
It doesn't provide any tools for idempotent calculations yet, just at-least-once delivery guarantees. But when you use it as a framework you get the benefit of all the configurable processors in your new service.