Hacker News new | comments | show | ask | jobs | submit login
Show HN: Gaia – Build powerful pipelines in any programming language (github.com)
133 points by michelvocks 11 days ago | hide | past | web | favorite | 40 comments





"Any programming language" but only Go is currently available and it's not clear whether other languages need anything besides gRPC support in order to be eligible.

Cool idea, early code, we'll see whether it goes anywhere.


Hello myWindoonn. First of all, sorry for the late reply.

Actually, gRPC is the only requirement for pipelines. I hope can make this more clear in the future! :-)


the screenshot has go, java, c++, and nodeJS as choosable languages....

From the first few paragraphs of the README, "Develop pipelines with the help of SDKs (currently only Go) and simply check-in your code into a git repository." If they've added those others in the intervening time, then great!

https://github.com/gaia-pipeline/gaia/issues/15 2 days ago

> Sadly, gaia is in alpha phase and we currently only support pipelines written in go.


The language used in the README is extremely odd. What do you mean by pipeline? The term 'priority' is confusing because what you describe is not a classical interpretation of priority. That sounds more like job ordering with a limited DAG form of dependencies (sequential, fan out, fan in). How does data flow from one job to the next and how does it get combined for a fan-in? What if I have two jobs that get fanned in to? More documentation coverage of the basics of what you're talking about would help.

Hey jsd1982. Sorry for the late reply. You are right, I didn't describe the term "pipeline" explicitly. A pipeline is a compilation of functions which do "something". We used/use "priority" to determine to order of execution of these functions. We already noticed that priority is probably a bad name and we actually need something different (https://github.com/gaia-pipeline/gaia/issues/19). We are currently working hard on the documentation part (https://gaia-pipeline.io). The project is really early. I hope we can improve within the next weeks/month a lot! :-)

There's a pretty interesting apache project called NiFi that's kind of similar, it's for data flow across different softwares. Apparently was developed and is still used at the NSA.

https://nifi.apache.org/docs/nifi-docs/html/overview.html


Very interesting. The provenance tracking is a nice feature that I haven't seen a lot of in other systems.

Any examples of this being used in the wild?


Can anyone give an example of what this would be useful for? I don't really understand its use case and the one sort of example in an image, of creating a k8s namespace and deployment, is covered by our CI/CD system. If this is meant to replace that and be part of the CI/CD I don't see anything talking about how to trigger the pipelines.

I feel like I am missing a small but critical piece to understand how this should be used and where.


Hey regnerba. Sorry for the late reply. I don't know which CI/CD system you use but please let me explain my current experience. :-) At my company, we often had the requirement to do complex stuff besides the deployments. For example, we wanted to offer a service for developers that they can via "One-Button-Click" deploy our big monolith into Kubernetes from their code-fork. To get this working, we had to use Spinnaker and Jenkins. We had a really poor experience with Spinnaker cause everything needs to be configured manually and there is no way to actually "code" it. Jenkins supports this via Jenkins Pipelines. Those pipelines are actually nice but they force you to write them in Groovy. Gaia solves this problem. Any programming language can be potentially be used to develop such pipelines.

@michelvocks -- the Armory team would love to hear more about the challenges you had with Spinnaker. (We're commercializing an enterprise version of OSS Spinnaker).

Also, we've created a 'pipelines as code' feature called Dinghy that may have helped. And our installer & configurator provides a much smoother install & configure experience. Details at www.Armory.io

Hit us up at hello@armory.io if you have more specific feedback (our exec team reads emails to that addy).

DROdio


This could be used as an ETL process; one task is to fetch the data, another task to clean and do quality control, and then branching tasks to put the data in its final destination.

Really any task that can be broken into concrete and independent steps could be made into a pipeline for scaling and reliability.


Wait, didn't we have BPMN more than a decade ago? BTW, if "any" programming language necesarily means Go for you, here is one BPMN implementation in Go: https://zeebe.io

BPMN is the most under appreciated technology in our field IMO.

Agreed. And the ideal language to build a backend interpreter for bpmn - I am doing so - is elixir imho (a language to build build massively scalable soft real-time systems with requirements on high availability, see f.e ds.cs.ut.ee/courses/course-files/To303nis%20Pool%20.pdf ).

I think "any" is signalling: "hey we want to support any language, contribution welcome", that is better than wondering on the project and thinking "oh it's for go, nevermind, I need it for python" :)

AFAICS this still requires a Java broker. Only the client is in go.

Zeebe is indeed a Java application, but it's possible to use Zeebe without writing any Java. Broker operations can be managed with a CLI (https://github.com/zeebe-io/zbctl). Along with the Go client, there are experimental Node, Ruby, and C# clients, but these are community contributions that aren't yet officially supported by the project.

Hello smarx007. Sorry for the late reply. Gaia is not BPMN. You can actually compare it with Jenkins Pipelines but with the advantage that you can basically use any programming language. And no, "any" programming language does not mean Go. Gaia is alpha, that is the reason why we currently only support Go. We will support other languages soon. :-)

From the Zeebe README: "Zeebe is currently a tech preview and not meant for production use".

I'm on the team that's building Zeebe and want to mention that we release new "dev preview" versions on average once per month and are aiming to have a production ready 1.0 by the end of 2018. If anyone has questions or ideas, or wants to build a prototype with Zeebe, it'd be great to hear from you. There are various ways to contact us here: https://zeebe.io/community/

Is this basically a CI tool? It looks very similar to Gitlab CI or Concourse or Jenkins Flow. If so, odd to not see that initialism anywhere; "pipeline" is pretty generic.

'Jenkins Flow' is now called 'Jenkins Pipeline'. Gitlab CI also uses pipelines in its terminology (https://docs.gitlab.com/ee/ci/pipelines.html). I think the use is appropriate here.

Thank you for that link. Gaia's README assumes readers are familiar with what they mean by 'pipeline' but, because I'd never seen the word used in this way, I was scratching my head wondering whether they were talking about stages of data processing or similar.

Hey rahimnathwani. Sorry for the late reply. You are absolutely right, this is my fault. I should have explained the term "pipeline" a bit more. I will work on this!


I’d prefer to have a dependency graph to a priority system, personally. It’s much easier to reason about and maintain.

Hey empath75. Sorry for the late reply. We also discussed this (https://github.com/gaia-pipeline/gaia/issues/19) and you are right. We will switch soon to a dependency system instead of a priority system. :-)

This looks nice enough. It may be worth mentioning that the real challenge for creating a generalized pipeline is rarely the control flow. Usually performance is not achieved due to; poor intermediate data locality, poor system utility balancing.

Unless pipeline code has the ability to match estimated job utility to device capabilities - it won't be useful in many non trivial cases.

Unless there is an automated way to store intermediate assets such that data locality between stages is (at least somewhat) optimal, significant amounts of all time will be spent in process migration.


I can't give an unbiased review because I love Concourse. As I see it, the pros are the availability of the type system and testing ecosystem.

I don't mind YAML myself, but I could I use something like Enaml[0] and get the same benefits as you see them?

[0] https://github.com/enaml-ops/enaml


any references/comparisons for Concourse?

Could you elaborate on what you're after?

The canonical reference is the main website: https://concourse-ci.org/


Looks like there is no way to downloaded the results of any pipeline that has executed. For example if I wanted to write a pipeline that would:

* Compile a program. * Run some tests. * Create a Debian binary package.

It looks like I'd have to write a final task to upload to a staging repository in the pipeline itself.


Hey stevekemp. Sorry for the late reply. You are right, Gaia is alpha so this feature is currently missing. I hope we can provide this soon. :-)

The design on the product screenshot is too complex. Why that fat curvy font on the page title? We need more brutalism

Hey rs86. Sorry for the late reply. I actually like the design a lot. I think many developers also enjoy a fresh modern design, too :-)

Thanks. Looks like something I've been looking for. Interesting idea! Will evaluate that.

Hey oldgun. Sorry for the late reply. Good to hear that. Let me know if you have any questions! :-)

It would be nice to see this mature as an open-source alternative to Informatica PowerCenter.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: