
Show HN: Gaia – Build pipelines in any programming language - michelvocks
https://gaia-pipeline.io/
======
mtrpcic
I agree with other commenters that it would be really helpful to understand
exactly what you mean by "pipeline", and how they can help/improve a project
or workflow. What problem does this solve, and how does it solve that problem
better than other solutions?

Also, very minor nitpick, but your programming language icons shouldn't be in
<a> tags. They appear to be clickable, but do nothing, which gives the
appearance that the site is not working as intended.

~~~
michelvocks
Thanks for the feedback. You are absolutely right.

I think we have answered those questions on our github readme page
([https://github.com/gaia-pipeline/gaia](https://github.com/gaia-
pipeline/gaia)) but somehow missed that on our webpage. We will work on that!

Edit: Oh and thanks. I will fix the links of the language icons. :-)

~~~
AnIdiotOnTheNet
> but somehow missed that on our webpage.

Every web page I've ever seen that looks like yours has had the same problem.

~~~
michelvocks
You are right. Often I'm so drown in what I'm doing that I lose the overview.
I will do better in the future!

~~~
y4mi
for what its worth: i'm the target audience (devops engineer) and understood
it within moments of opening your landing page...

And after looking at the gif, i resolved myself to take a look in a few weeks
for a personal project.

I'm however missing an excerpt about the developer/-s of this project.

Who is building it and why? how likely is development to be abandoned? is
there paid support available or planned in the short term?

from what i could tell from the contributors its pretty much a solo endeavour
from you, with some help from @skarlso in June/July. Are you planning to
create a GmbH and sell this as a product, or is this just a personal project
you're doing on the side?

------
NightMKoder
This looks great. To echo other folks’ sentiment - I _think_ you mean CI-like
pipelines specifically, though it could be extended to do some other stuff.
You probably want to list some concrete use cases on the main site (e.g. CI,
cron jobs, ETL?).

I’m curious what differences/trade offs there are in Gaia vs something like
Argo ([https://github.com/argoproj/argo](https://github.com/argoproj/argo)) or
Buildkite ([https://buildkite.com/](https://buildkite.com/)). It looks like at
least one difference is an actual API for steps rather than bash commands. Is
there anything interesting in terms of cross job caching (e.g. saving npm
install data) or persistent runners? And obviously - what else am I not
thinking about?

~~~
weberc2
Some differences at a glance: argo appears to take a dependency on k8s and
buildkite appears to be closed source.

~~~
paulddraper
Buildkite agent, Cloudformation stacks, etc. are open source.

Buildkite controller and UI is closed source.

------
rwmj
Raises the question "what is a pipeline"? I'm assuming you don't mean the
"prog | sort" kind of pipeline, but I was no clearer after reading it what you
do mean.

Also, make the page display something without 2 different Javascript sources
having to be enabled.

~~~
michelvocks
Thanks for the feedback. I will add that!

To answer your question, a pipeline is a real application which consists of at
least one function (we call it job). You can compare it with a Jenkins
Pipeline
([https://jenkins.io/solutions/pipeline/](https://jenkins.io/solutions/pipeline/)).
Therefore, a pipeline is one flow of automation task which consists of one or
more steps.

~~~
lalaithion
"a pipeline is a real application which consists of at least one function"

So.... every application is a pipeline? This really clears very little up for
someone not already familiar with the concept. The Jenkins link is about
testing and deploying code. Is that the intended use case for this tool as
well?

~~~
0x8BADF00D
I think they meant it is a replacement for existing CI/CD pipelines. As
someone who deals with them every day, I like the simplicity of Gaia
pipelines. But it seems it only supports running on the host that Gaia is
installed on, whereas Jenkins and Gitlab CI will have the ability to run on
multiple agents/build slaves.

Edit: gRPC for events is dope too.

~~~
michelvocks
It can be used as an replacement for existing CI/CD tools but not exclusive.
In my opinion Gaia can be used for every possible automation task. :-)

We are already working on "agents/build slaves" (we call them worker):
[https://github.com/gaia-pipeline/gaia/issues/107](https://github.com/gaia-
pipeline/gaia/issues/107)

~~~
0x8BADF00D
Very cool. Will definitely keep an eye on your progress.

------
sam0x17
Yeah, you definitely need a "What are pipelines?" section. From looking over
the examples, my impression is this is some kind of job scheduler, like
sidekiq.

~~~
michelvocks
Thanks for the feedback. You are absolutely right, this needs some more work!

We have already defined a "Q&A" section in our github README file
([https://github.com/gaia-pipeline/gaia#questions-and-
answers-...](https://github.com/gaia-pipeline/gaia#questions-and-answers-qa)).
Does this briefly answer your question or is still too vague?

~~~
Kalium
Some might opine that having to search through upwards of 75% of the length of
a page before finding out what problem a tool addresses is a situation
possessed of wonderful and bountiful opportunity to become even clearer.

~~~
michelvocks
Thanks for the feedback. I will work on that! :-)

------
otherflavors
"any programming language" is * Golang * Java * Python * C++ ?

~~~
certifiedloud
Those are the languages for which they provide an SDK.

To quote the github readme: "any programming language as long as gRPC is
supported"

You can use gaia without an official sdk. You just need to handle the gRPC
integration yourself.

------
peterkelly
I really don't get it.

From looking through the front page of the site, I see a bunch of code
examples, but no explanation of how this differs from using Unix with a bunch
of programs that each read from stdin and write to stdout.

I'm sure there's actually something useful about this product, but the website
really doesn't give any indication of what that useful thing is.

~~~
Terretta
Like a line of pipes.

------
jph
Looks like a great idea. What's the roadmap for any programming language SDKs
in your "Develop Pipelines" section, such as for Rust?

~~~
michelvocks
We have just added the support for C++ and currently looking at Rust but it's
a bit tricky. There is no official gRPC SDK for Rust but I think we will
manage it soon. :-)

~~~
demoray
[https://crates.io/crates/grpc](https://crates.io/crates/grpc)

------
gcb0
I may have misunderstood the readme, but what I got was

> integration of services is hard because not everything uses grpc.

> this assumes everything uses grpc

------
Dowwie
Would it be correct to compare this with airflow?

~~~
michelvocks
Yes, except it's more lightweight and supports basically any programming
language and not only python.

~~~
Dowwie
Providing a warning about the alpha edition in the README was a responsible
thing to do. As Gaia is being actively promoted, I assume that the project has
reached a certain level of maturity since the warning was written. Is this the
case? In other words, does the warning still apply?

------
whalesalad
Echoing what others have said regarding 'what is this' ... is this Jenkins?
Jenkins but instead of writing a Jenkinsfile (groovy) you write it in your own
language.

------
dana321
Ok, this is really good, but i would put the two sections here:

[https://github.com/gaia-pipeline/gaia#motivation](https://github.com/gaia-
pipeline/gaia#motivation)

[https://github.com/gaia-pipeline/gaia#how-does-it-
work](https://github.com/gaia-pipeline/gaia#how-does-it-work)

On the main website.

------
iamwil
YAML in devop is essentially being used as a declarative language. They do
have their place in domain specific applications, and can often be quite
powerful.

While being able to write in any language gives you flexibility, I do wonder
about maintainability down the line. How would this be addressed? Or it
doesn't matter as this sort of code tends to be thrown away?

~~~
rtpg
The issue with YAML is that the abstraction ceiling is way too low for its use
cases. How do you maintain a 500 line YAML file? You have no tools to simplify
it apart from super basic replacements

Meanwhile if I give you a 500 line python function you have a lot of tools at
your disposal to refactor to improve maintainability (you could actually write
basic tests for your CI pipeline config instead of having failures when the
config gets pushed out)

------
peter303
I remember people doing this for seismc data analysis piggy-backed UNIX pipes
40 years ago. They would send and transform JSON-like metadata down the pipe
while the massive seismic data would remain in disk files or shared core
memory. I think variants of these packages are the most widely used seismic
freeware to this day.

------
bsenftner
So, a task/process queue? Is that what this is? How does this vary from the
quite mature task/process queues used by media production companies? (Examples
include Deadline, and dozens of tested-by-production proprietary queues at
every production studio.)

~~~
michelvocks
Hey bsenftner. Could you provide a link or an example of what you mean? :-) I
googled "Deadline" but what I've found looks not correct.

I usually compare Gaia with tools like Jenkins and Spinnaker. For example,
many people use Jenkins Pipeline
([https://jenkins.io/solutions/pipeline/](https://jenkins.io/solutions/pipeline/))
which allows you to write CI/CD tasks in Groovy. In my opinion, Gaia fulfills
this job way better because it doesn't force you to use a specific language.
It's also super fast and provides features like the automatic (re)build of
your pipelines.

~~~
bpicolo
> Gaia fulfills this job way better because it doesn't force you to use a
> specific language

CI typically seems much more dependent on bash/unixy sorts of tools to get
things done. This seems to not really support that workflow, requiring code to
define pipelines instead.

If it's intended to do CI, how do you deal with CI-style tasks, like shuffling
around files between pipeline steps? Or the corollary, what does this do that
makes it easier to do CI in practice than with typical unix command based
workflows? Inherently, it seems like "Create a golang script that can start a
subprocess that runs a test suite" is more overhead than "run a test suite".

At first glance, it looks like more of a competitor to, say, AWS step
functions, but it doesn't sound like that's what you're targeting.

~~~
michelvocks
In the past you simply had to compile your application, package it and push it
to a remote server. Nowadays, it's not that simple anymore. You often find
yourself writing scripts to create Kubernetes resources, manage remote APIs to
create services which are needed by your application, talk to remote services
(like HashiCorp Vault) to get credentials or secrets. Gaia does a great job
here because you can directly use client SDKs in your preferred language to
communicate with those remote APIs.

Have a look at the Kubernetes deployment tutorial, this might clear things up
for you: [https://medium.com/@michelvocks/automatic-kubernetes-
deploym...](https://medium.com/@michelvocks/automatic-kubernetes-deployment-
with-gaia-and-hashicorp-vault-73b882e40741)

------
azhenley
There was a fair amount of discussion on this about 6 months ago too:
[https://news.ycombinator.com/item?id=17495732](https://news.ycombinator.com/item?id=17495732)

It looks like the project has progressed quite a bit!

------
iblaine
How does this compare to airflow? Is it free or how does it generate revenue?

~~~
michelvocks
It does not generate any revenue. It's free and open-source. :-)

In my opinion it's comparable with airflow except it's more lightweight and
supports basically any programming language and not only python.

------
rahulbhatiak
I agree with the questions about what a pipeline is. Also, have you considered
whether this competes with Airflow/Luigi which are used more in terms of etl
pipelines?

------
damsdu78
How does it compete with something like Apache Flink?

------
neurotrace
After checking the README, this actually looks like a pretty cool build tool.
I'd love to have Rust support in the future

------
ah-
How does this manage processes? From the example code it looks like your
pipeline keeps running?

~~~
michelvocks
Good question! :-) Gaia is basically a scheduler. It automatically starts your
pipeline, executes the functions defined in the pipeline and is also
responsible for terminating the pipeline process.

------
Dowwie
Is Gaia designed modular enough for someone to swap grpc for thrift, or
support both?

~~~
michelvocks
Yes, that is indeed possible and shouldn't be a huge problem. Would love to
talk about this idea in a separate github issue! :-)

------
tejtm
Build powerful pipelines in any programming language we support?

perhaps a cultural thing but I find people gushing on themselves suspicious
and distasteful.

yet another but what about X

[X] [https://www.commonwl.org/](https://www.commonwl.org/)

------
simplify
Is there an example video somewhere on how this looks in usage?

~~~
michelvocks
Hey simplify. Yes, actually when you open the website ([https://gaia-
pipeline.io](https://gaia-pipeline.io)) it should automatically start the
video. Otherwise, have a look here: [https://gaia-
pipeline.io/video/gaia.mp4](https://gaia-pipeline.io/video/gaia.mp4)

~~~
imhoguy
The text in video is unreadable and I hardly understand what happens there
actually, just jumping thru some CRUDs without commentary and no way to stop.
Dunno, maybe some "Use cases"/"Case study" section could show some killer real
project usage.

------
black-tea
How does this compare to Apache Airflow?

~~~
michelvocks
At the end Gaia is a task scheduler. You can compare it with AWS stepflow but
also with Jenkins/CircleCI/TravisCI/Bamboo/Codeship and many more.

In my opinion Gaia is perfect for programmers which are "forced" to write
automation tasks. It allows you to write automation tasks in your preferred
programming language and makes it super easy for you to schedule them because
Gaia comes with an automatic build feature (just provide the git url of your
source code and Gaia does the rest). Additionally, Gaia is super fast,
lightweight and open-source.

------
melenaos
Is there any equivalent for C#?

~~~
michelvocks
Gaia is only a few month old so we currently only support Go, Java, Python and
C++. Feel free to open an issue on our github page to rise some awareness that
C# support is appreciated. :-)

------
xmly
like aws stepflow?

~~~
michelvocks
At the end Gaia is a task scheduler. You can compare it with AWS stepflow but
also with Jenkins/CircleCI/TravisCI/Bamboo/Codeship and many more.

In my opinion Gaia is perfect for programmers which are "forced" to write
automation tasks. It allows you to write automation tasks in your preferred
programming language and makes it super easy for you to schedule them because
Gaia comes with an automatic build feature (just provide the git url of your
source code and Gaia does the rest). Additionally, Gaia is super fast,
lightweight and open-source.

