Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Gaia – Build pipelines in any programming language (gaia-pipeline.io)
231 points by michelvocks 4 months ago | hide | past | web | favorite | 67 comments



I agree with other commenters that it would be really helpful to understand exactly what you mean by "pipeline", and how they can help/improve a project or workflow. What problem does this solve, and how does it solve that problem better than other solutions?

Also, very minor nitpick, but your programming language icons shouldn't be in <a> tags. They appear to be clickable, but do nothing, which gives the appearance that the site is not working as intended.


Looks like it runs a series of (small) self-contained applications written in any language as long as it implements the GRPC interface in a specific order and with consolidated state tracking and logs.

The individual steps are jobs and the whole thing is a pipeline. Since it executes arbitrary code, it has the flexibility to do anything from ETL to CI.


Pretty good description. Seems like I still have a lot to learn on how to present an idea right. :-)


Thanks for the feedback. You are absolutely right.

I think we have answered those questions on our github readme page (https://github.com/gaia-pipeline/gaia) but somehow missed that on our webpage. We will work on that!

Edit: Oh and thanks. I will fix the links of the language icons. :-)


Not sure your Github readme covers it all that well, either:

>What is a pipeline? A pipeline is a real application with at least one function (we call it Job). Every programming language can be used as long as gRPC is supported. We offer SDKs to support the development.

What is this and how would I use it? More importantly, why do I want to use it? Every application is "real", but it's not clear what the function takes as input and outputs, or how it's dispatched, or... well, anything.


> but somehow missed that on our webpage.

Every web page I've ever seen that looks like yours has had the same problem.


You are right. Often I'm so drown in what I'm doing that I lose the overview. I will do better in the future!


for what its worth: i'm the target audience (devops engineer) and understood it within moments of opening your landing page...

And after looking at the gif, i resolved myself to take a look in a few weeks for a personal project.

I'm however missing an excerpt about the developer/-s of this project.

Who is building it and why? how likely is development to be abandoned? is there paid support available or planned in the short term?

from what i could tell from the contributors its pretty much a solo endeavour from you, with some help from @skarlso in June/July. Are you planning to create a GmbH and sell this as a product, or is this just a personal project you're doing on the side?


I think it will be an alternative to Github's Actions, except it is not tied to git events.


This looks great. To echo other folks’ sentiment - I think you mean CI-like pipelines specifically, though it could be extended to do some other stuff. You probably want to list some concrete use cases on the main site (e.g. CI, cron jobs, ETL?).

I’m curious what differences/trade offs there are in Gaia vs something like Argo (https://github.com/argoproj/argo) or Buildkite (https://buildkite.com/). It looks like at least one difference is an actual API for steps rather than bash commands. Is there anything interesting in terms of cross job caching (e.g. saving npm install data) or persistent runners? And obviously - what else am I not thinking about?


Really good idea. Website definitively needs something like that. I will work on this!

Argo is pretty cool but I think it's different. Imagine developing an automation workflow with 20-30 different steps. In Argo you basically have to develop 20-30 different applications, compile them, build a separate docker image and push all these images to a registry.

In my opinion that's too much overhead and a configuration management nightmare as well. Additionally, what if one step needs to share information with another one? What if when this information is not trivial like a binary for example?


Some differences at a glance: argo appears to take a dependency on k8s and buildkite appears to be closed source.


Buildkite agent, Cloudformation stacks, etc. are open source.

Buildkite controller and UI is closed source.


Raises the question "what is a pipeline"? I'm assuming you don't mean the "prog | sort" kind of pipeline, but I was no clearer after reading it what you do mean.

Also, make the page display something without 2 different Javascript sources having to be enabled.


Thanks for the feedback. I will add that!

To answer your question, a pipeline is a real application which consists of at least one function (we call it job). You can compare it with a Jenkins Pipeline (https://jenkins.io/solutions/pipeline/). Therefore, a pipeline is one flow of automation task which consists of one or more steps.


"a pipeline is a real application which consists of at least one function"

So.... every application is a pipeline? This really clears very little up for someone not already familiar with the concept. The Jenkins link is about testing and deploying code. Is that the intended use case for this tool as well?


I think they meant it is a replacement for existing CI/CD pipelines. As someone who deals with them every day, I like the simplicity of Gaia pipelines. But it seems it only supports running on the host that Gaia is installed on, whereas Jenkins and Gitlab CI will have the ability to run on multiple agents/build slaves.

Edit: gRPC for events is dope too.


It can be used as an replacement for existing CI/CD tools but not exclusive. In my opinion Gaia can be used for every possible automation task. :-)

We are already working on "agents/build slaves" (we call them worker): https://github.com/gaia-pipeline/gaia/issues/107


Very cool. Will definitely keep an eye on your progress.


Yeah, you definitely need a "What are pipelines?" section. From looking over the examples, my impression is this is some kind of job scheduler, like sidekiq.


Thanks for the feedback. You are absolutely right, this needs some more work!

We have already defined a "Q&A" section in our github README file (https://github.com/gaia-pipeline/gaia#questions-and-answers-...). Does this briefly answer your question or is still too vague?


Some might opine that having to search through upwards of 75% of the length of a page before finding out what problem a tool addresses is a situation possessed of wonderful and bountiful opportunity to become even clearer.


Thanks for the feedback. I will work on that! :-)


"any programming language" is * Golang * Java * Python * C++ ?


Those are the languages for which they provide an SDK.

To quote the github readme: "any programming language as long as gRPC is supported"

You can use gaia without an official sdk. You just need to handle the gRPC integration yourself.


Gaia is just a few month old so we currently only officially support those languages. Feel free to open an issue on our github page to help us figuring out which languages are missing/needed.


I really don't get it.

From looking through the front page of the site, I see a bunch of code examples, but no explanation of how this differs from using Unix with a bunch of programs that each read from stdin and write to stdout.

I'm sure there's actually something useful about this product, but the website really doesn't give any indication of what that useful thing is.


Like a line of pipes.


Looks like a great idea. What's the roadmap for any programming language SDKs in your "Develop Pipelines" section, such as for Rust?


We have just added the support for C++ and currently looking at Rust but it's a bit tricky. There is no official gRPC SDK for Rust but I think we will manage it soon. :-)



I may have misunderstood the readme, but what I got was

> integration of services is hard because not everything uses grpc.

> this assumes everything uses grpc


Would it be correct to compare this with airflow?


Yes, except it's more lightweight and supports basically any programming language and not only python.


Providing a warning about the alpha edition in the README was a responsible thing to do. As Gaia is being actively promoted, I assume that the project has reached a certain level of maturity since the warning was written. Is this the case? In other words, does the warning still apply?


Echoing what others have said regarding 'what is this' ... is this Jenkins? Jenkins but instead of writing a Jenkinsfile (groovy) you write it in your own language.


Ok, this is really good, but i would put the two sections here:

https://github.com/gaia-pipeline/gaia#motivation

https://github.com/gaia-pipeline/gaia#how-does-it-work

On the main website.


YAML in devop is essentially being used as a declarative language. They do have their place in domain specific applications, and can often be quite powerful.

While being able to write in any language gives you flexibility, I do wonder about maintainability down the line. How would this be addressed? Or it doesn't matter as this sort of code tends to be thrown away?


The issue with YAML is that the abstraction ceiling is way too low for its use cases. How do you maintain a 500 line YAML file? You have no tools to simplify it apart from super basic replacements

Meanwhile if I give you a 500 line python function you have a lot of tools at your disposal to refactor to improve maintainability (you could actually write basic tests for your CI pipeline config instead of having failures when the config gets pushed out)


I remember people doing this for seismc data analysis piggy-backed UNIX pipes 40 years ago. They would send and transform JSON-like metadata down the pipe while the massive seismic data would remain in disk files or shared core memory. I think variants of these packages are the most widely used seismic freeware to this day.


So, a task/process queue? Is that what this is? How does this vary from the quite mature task/process queues used by media production companies? (Examples include Deadline, and dozens of tested-by-production proprietary queues at every production studio.)


Hey bsenftner. Could you provide a link or an example of what you mean? :-) I googled "Deadline" but what I've found looks not correct.

I usually compare Gaia with tools like Jenkins and Spinnaker. For example, many people use Jenkins Pipeline (https://jenkins.io/solutions/pipeline/) which allows you to write CI/CD tasks in Groovy. In my opinion, Gaia fulfills this job way better because it doesn't force you to use a specific language. It's also super fast and provides features like the automatic (re)build of your pipelines.


> Gaia fulfills this job way better because it doesn't force you to use a specific language

CI typically seems much more dependent on bash/unixy sorts of tools to get things done. This seems to not really support that workflow, requiring code to define pipelines instead.

If it's intended to do CI, how do you deal with CI-style tasks, like shuffling around files between pipeline steps? Or the corollary, what does this do that makes it easier to do CI in practice than with typical unix command based workflows? Inherently, it seems like "Create a golang script that can start a subprocess that runs a test suite" is more overhead than "run a test suite".

At first glance, it looks like more of a competitor to, say, AWS step functions, but it doesn't sound like that's what you're targeting.


In the past you simply had to compile your application, package it and push it to a remote server. Nowadays, it's not that simple anymore. You often find yourself writing scripts to create Kubernetes resources, manage remote APIs to create services which are needed by your application, talk to remote services (like HashiCorp Vault) to get credentials or secrets. Gaia does a great job here because you can directly use client SDKs in your preferred language to communicate with those remote APIs.

Have a look at the Kubernetes deployment tutorial, this might clear things up for you: https://medium.com/@michelvocks/automatic-kubernetes-deploym...


Here's Deadline: https://deadline.thinkboxsoftware.com/ I used it in the past, along with the render queues at a few VFX studios. Very feature rich process scheduling and scaling managers are in heavy use by media productions.


Thanks. In the end it is similar to Gaia (a task scheduler) but also comparable with Jenkins/TravisCI/CircleCI and probably hundreds more schedulers. :-) The basic idea is not new.

In my opinion Gaia is perfect for programmers which are "forced" to write automation tasks. It allows you to write automation tasks in your preferred programming language and makes it super easy for you to schedule them because Gaia comes with an automatic build feature (just provide the git url of your source code and Gaia does the rest). Additionally, Gaia is super fast, lightweight and open-source.


There was a fair amount of discussion on this about 6 months ago too: https://news.ycombinator.com/item?id=17495732

It looks like the project has progressed quite a bit!


How does this compare to airflow? Is it free or how does it generate revenue?


It does not generate any revenue. It's free and open-source. :-)

In my opinion it's comparable with airflow except it's more lightweight and supports basically any programming language and not only python.


I agree with the questions about what a pipeline is. Also, have you considered whether this competes with Airflow/Luigi which are used more in terms of etl pipelines?


How does it compete with something like Apache Flink?


After checking the README, this actually looks like a pretty cool build tool. I'd love to have Rust support in the future


How does this manage processes? From the example code it looks like your pipeline keeps running?


Good question! :-) Gaia is basically a scheduler. It automatically starts your pipeline, executes the functions defined in the pipeline and is also responsible for terminating the pipeline process.


Is Gaia designed modular enough for someone to swap grpc for thrift, or support both?


Yes, that is indeed possible and shouldn't be a huge problem. Would love to talk about this idea in a separate github issue! :-)


Build powerful pipelines in any programming language we support?

perhaps a cultural thing but I find people gushing on themselves suspicious and distasteful.

yet another but what about X

[X] https://www.commonwl.org/


Is there an example video somewhere on how this looks in usage?


Hey simplify. Yes, actually when you open the website (https://gaia-pipeline.io) it should automatically start the video. Otherwise, have a look here: https://gaia-pipeline.io/video/gaia.mp4


The text in video is unreadable and I hardly understand what happens there actually, just jumping thru some CRUDs without commentary and no way to stop. Dunno, maybe some "Use cases"/"Case study" section could show some killer real project usage.


Oh right, I saw that. The reason I forgot about it is because I couldn't find a connection between that and the example code written in several languages.


How does this compare to Apache Airflow?


At the end Gaia is a task scheduler. You can compare it with AWS stepflow but also with Jenkins/CircleCI/TravisCI/Bamboo/Codeship and many more.

In my opinion Gaia is perfect for programmers which are "forced" to write automation tasks. It allows you to write automation tasks in your preferred programming language and makes it super easy for you to schedule them because Gaia comes with an automatic build feature (just provide the git url of your source code and Gaia does the rest). Additionally, Gaia is super fast, lightweight and open-source.


Is there any equivalent for C#?


Gaia is only a few month old so we currently only support Go, Java, Python and C++. Feel free to open an issue on our github page to rise some awareness that C# support is appreciated. :-)


like aws stepflow?


At the end Gaia is a task scheduler. You can compare it with AWS stepflow but also with Jenkins/CircleCI/TravisCI/Bamboo/Codeship and many more.

In my opinion Gaia is perfect for programmers which are "forced" to write automation tasks. It allows you to write automation tasks in your preferred programming language and makes it super easy for you to schedule them because Gaia comes with an automatic build feature (just provide the git url of your source code and Gaia does the rest). Additionally, Gaia is super fast, lightweight and open-source.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: