Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Benthos Studio – A modern take on Yahoo Pipes (benthos.dev)
145 points by mihaitodor on Nov 23, 2022 | hide | past | favorite | 30 comments
Benthos Studio lets you plug and play various components to build a Data Streaming pipeline through a graphic interface.

It also allows you to mock inputs to emit dummy data and run the rest of the pipeline to inspect the output of each step.

The project is running https://www.benthos.dev/ under the hood.




I've been using Benthos for some one-off bulk ETL needs and it's been very easy to write a config to shovel data from here-to-there.

I'll keep an eye on Benthos Studio, though I confess I'd prefer a self-hostable version at some point.


Didn’t look too much into Benthos but what’s the difference to Nifi?


I don't know much about Nifi, but from re-reviewing the documentation, it looks like Nifi is a stateful application with orchestration and multi-user access, etc.

Benthos requires something else handle orchestration and source control and multi-user access and the conditions that trigger an ETL run.

On the plus side, the stateless nature of Benthos means there isn't any big setup process or cluster of containers to login to or anything. You just call benthos from the cli and pass in a config file.

Last week I needed to get 120k rows of data from an Oracle view that would crash on a few bad rows, and shove rows that worked into a SQL Server table.

Writing a Benthos config to tick through every id, fetch the relevant rows one at a time, annotate with batch information, and drop error rows and error messages in a log and healthy rows into SQL Server was, all told, about 120 lines of YAML. Roughly 3 hours to write while consulting the extensive documentation, and then trying performance tweaks until I could saturate my network connection (increase threads to 256) only took another 30 minutes.

Tweaking that YAML to cover a similar second scenario as a new config then was only another 15 minutes.

The week before that, I was trying Benthos at home to stress test MQTT on a Raspberry Pi with synthetic data to see which messages got dropped or mis-ordered.

(Depends on your QoS and in-flight settings, obviously)

I've never had "basic" performance testing be so simple.

If you are stuck on a Windows platform, it even works there, which is nice for one-off runs for dev work, troubleshooting, or break-fix work.


Nice, thanks for sharing this. I’ll have a deeper look into Benthos, sounds interesting and I certainly like the simple/easy part :-)


Really glad to hear the Oracle driver I added to it is proving useful! <3


Let me tell you Benthos is a real breath of fresh air after fighting a few different ETL tools that are more GUI, chrome and administrivia than function.

Thanks for contributing to Benthos!


Thanks for sharing, this is awesome to hear!


Benthos is much simpler, since it's stateless and it's a single static binary written in Go. I don't really know much about NiFi, but if you need to use a messages bus with Benthos, such as Kafka, you can. However, you don't have to.


Benthos looks like a cool project but thanks for turning me on to Nifi. Nifi has a lot more processors out of the box.

Edit: I’m speechless but see below


Which ones would you need? Happy to add more to Benthos. Feel free to open issues here: https://github.com/benthosdev/benthos/issues


Google Sheets, Drive and Slack. I'm not comfortable asking for it because I can't contribute.


OK, that's not a problem. There are a bunch of other channels that you can use to propose new features. Feel free to reach out via https://www.benthos.dev/community

LE: I took note of those 3 and they should be quite straightforward to add. Thanks!


I liked Yahoo! pipes and the idea of being able to glue things together on the internet, but unfortunately many web sites simply don't have good APIs to enable this. In my experience pipes workflows were also brittle since any site could easily break them.

The lack of stable API support on (many/most) web sites (and no incentive to provide it), the likely low user/developer base, the apparent lack of killer apps, and Yahoo! itself probably all combined to prevent pipes from becoming a big thing.


That was my takeaway after building API Blocks. The goal was to display arbitrary data from APIs on a dashboard.

APIs just aren't standardized or stable enough, and it took too much effort to maintain the API library. The UX just wasn't up to scratch because almost every week a block on your dashboard would be broken because the API changed,or your auth expired or broke. The cost to maintain was not worth it.

An interconnected public web as data would be incredible but we just haven't built that.

Not to detract from the concept here with Benthos Studio, the use case is a bit different. But it is something to keep in mind, the end user might not be technical and APIs constantly changing might not be something they expected.


Smart contracts do provide a stable (immutable!) api which is designed for composable functionality, although it’s not used across different peoples contracts that much. A pipes like UI that binds things together could be interesting. If the connective contracts are also on chain you could in theory develop flash bots that execute everything in the same block.


At least some sites provide somewhat-stable and documented APIs nowadays... Definitely a far cry from what was being promised 15 years ago, but at least most of them moved away from SOAP. I guess it's still used in some places, unfortunately.


I like the playful vibe of the site, but it’s incredibly unclear what this actually is. Needs more specific description of the problems being solved.


Benthos Studio is an application that provides visual editing capabilities to the Benthos (https://www.benthos.dev) stream processor, which lets you craft and test yaml-based configurations that you can then run using Benthos.

Benthos itself is a stateless command line (CLI) app written in Go. It supports quite a few types of "macro" building blocks (aka components) which are various flavours of inputs, outputs, processors, caches, rate limits, buffers, metrics, tracers and loggers. The most important processor is the `mapping` one which lets you execute Bloblang code against each message which passes through it. Bloblang is a functional programming language embedded in Benthos as a DSL for manipulating structured data. You can read more about it over here: https://www.benthos.dev/docs/guides/bloblang/about Also, if you'd like to use it outside of Benthos, you can import it as a library: https://pkg.go.dev/github.com/benthosdev/benthos/v4@v4.10.0/...

Since I mentioned importing Bloblang as a library, you can import the entire Benthos framework as a library and inject your own custom plugins to create a custom Benthos build with whatever components and extra functionality you need. It's also a great way to slim down the existing distribution and only import the components that you require. See some examples here: https://github.com/benthosdev/benthos-plugin-example


why is Pulsar disabled?


Some history here https://github.com/benthosdev/benthos/issues/1184#issuecomme... but the TL; DR version is that the client library has caused a bit of friction in the past. They had several releases where one of its dependencies produced various compiler warnings and there was this dbus zombie processes issue which caused some concern... However, if you wish to run this at scale in production, you might want to look at https://github.com/benthosdev/benthos-plugin-example and craft your own custom Benthos build, where you import just the components you need, such as the Pulsar ones.


Yeah that's my bad, the site assumes you already know what benthos is as it's an early stages UI for it. The best place to start is https://www.benthos.dev, or if you like dumb videos: https://youtu.be/88DSzCFV4Ng


This looks nice! Which library are you using for rendering the node based UI? Or is it a custom solution? (just added benthos here: https://github.com/wbkd/awesome-node-based-uis)


Hey! The visuals are all via https://reactflow.dev/, which has been pretty awesome to use.


I am glad to hear that :)


I believe I've seen this before, but did this used to be called blobfish? Or am I misremembering that due to the mascot (which is excellent)


It was called Benthos from the very first commit (Aug 25th, 2015): https://github.com/benthosdev/benthos/commit/e4933adf7c07460...

LE: I should've mentioned that the official mascot is called The Benthos Blobfish: https://www.benthos.dev/blobfish/


> I should've mentioned that the official mascot is called The Benthos Blobfish.

The mascot really really creeps me out.

This is entirely subjective, of course, and I really don't want to be mean about it, but I thought it might be a helpful data point for you to collate in case others feel the same way: If I were forced to use benthos, and had that ugly thing on my screen all day long due to working with benthos documentation etc. I really don't know whether I could handle it.


Sorry to hear that, unfortunately you can't please everyone whilst also having fun, and my open source work is unapologetically fueled by fun.


What is the difference to https://dagster.io/ ?


It's really hard to tell without doing a proper deep dive into Dagster, but even if there is a lot of overlap, there's a lot of reading that one must do before even starting with basic workflows. Is there a one-click demo UI that I can run which produces a valid config that I can just copy / paste and then do smth like `dagster -c config.yaml` to run it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: