Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Free hosted open-source alternative to Zapier/Airflow (titanoboa.io)
279 points by newcrobuzon on Sept 8, 2020 | hide | past | favorite | 79 comments



I've used Zapier a fair amount and I wrote an article about Airflow so I have a fairly good understanding of that too.

I would never really consider them alternatives though? To me Zapier is a low/no code tool that offers a bazillion integrations and Airflow is a workflow orchestration tool.

So comparing to both of them confuses me and I guess choosing one would give you a more nieche audience but one that you can connect better to as well.


Very true. We really should stop overloading the term Workflow in software.

As of now these are the broad categories that abuse the term workflow.

1. State Machines

For example in Jira a bug/ticket moves through different states to reach a final stage. This type of state-machines can be found in a lot of different softwares - most CMS/bug-trackers/CRMs where the entity is different (document/bug/lead).

The motive of these systems who call themselves as workflow engines - is to provide a structure to an otherwise ad-hoc movement of entities so that a lead/manager is able to ensure a process and collect statistics.

2. Automations

For example "apple workflow" app or Zapier or Zoho Flow. These softwares define a sequence of steps that are triggered when an event occurs in the system.

The motive of these systems are to enable automation and integrations between different software components with zero code (thus, no-code).

3. Process Designers (Bad term but can't think of anything better at the moment)

For example Airflow/Camunda. These systems are not necessarily low-code but they mostly deal with arranging individual components of code such that a process can be assembled as quickly as possible. These systems usually are accompanied by a visual designer like what Zapier has, but the intentions are mostly to ease out the process, than being a complete no-code tool to create automations. However, their marketing tries to sell themselves as no-code platform for business folks.

The motive is not yet very clear to me but from my initial intuition they can be used to initiate some data-processing pipeline, I guess? If anybody can throw more clarity, please leave a reply.

Now as you can see, much like how a "Process" can mean many things in many different context, the term "Workflow" can mean a lot depending on the context. Any software that calls itself the ultimate workflow solution is just a lie. It's like calling something an "ultimate process engine" - doesn't make sense.


I'd like you to listen to this file, which is your first 'headline' read by `say`: https://www.dropbox.com/s/fanpqs8lv2d9nvl/say.m4a?dl=0 Now imagine yourself in the shoes of a vision-impaired person relying on a technology like VoiceOver. Do you understand how unbelievably frustrating that is?


Thanks for pointing this out. Never knew this would break screen readers. Sorry. I'm changing the font right now.


[flagged]


What is your point? zacwest isn't asking them to completely rewrite the page—the problem they pointed out is trivial to solve and simply needed to be recognized. If you can spend 20 minutes to drastically improve the usability of your product for "dozens" of users, why wouldn't you?


I am not saying it is bad.

I was just commenting that the scale is offset. Get it?


"However, their marketing tries to sell themselves as no-code platform for business folks."

Huh? I've never seen Airflow described as no code or tried to sell itself that way, in fact all the pipelines are written in python and you can do some really complex orchestration.

I get you're not saying Airflow is no code but that the category you've put it in is typically low code or marketed as low code, but then I don't think Airflow belongs in that category or rather, and maybe more accurately, no/low code is not really a major defining quality of the bucket you're calling "Process Designers".

I've also been calling them "Process Schedulers" because typically it involves translating a more manual, but well defined process into it's automated phase.


I would agree.

The no-code is an illusion in the enterprise realm - before you know it, you are waist deep in the custom code.

No-code can really work only for small businesses imo.

I come from enterprise background and that is one of the reasons I built Titanoboa - to make something that makes it easy to rapidly prototype new integrations on the fly.

I summed up some of my thoughts on this topic here: https://www.titanoboa.io/repl.html

The main point I am trying to test with Titanoboa is this however:

State Machines <-> Process Designers is a spectrum and one product could handle the entire spectrum (or part of it).

Titanoboa makes it possible to pre-define workflow steps and make it "no code" while also making complex custom integrations possible from the same environment with the same concepts. Plus also distributed data handling is in the mix.

I guess now the challenge is how to market this versatility or whether it could create more confusion...


I see what you are getting at. Yes State Machines to Process Designers is a spectrum and has a quite a bit of overlappings.

For example these are the lowest common denominators I see.

#1 Graph: All of these systems allow you to visually design/represent the process as a graph. You yourself has abstracted these into graph problems and have come out with a simpler non-verbose BPMN alternative - which is great.

#3 Computability: Since the base is a definitive graph, essentially a graph that could execute is a finite-automata. That is, all of these systems put the power back to the end-users to create their own machines (without actually coding) hence the relatability with low-code. So at the end, broadly even the motive aligns from computability perspective.

But I'm still not convinced these denominators justify an all-in-one one-size-fits-all solution to this spectrum. I'm not saying one product shouldn't attack them all, but it's better to appropriately categorize them and develop unique features on top of each of them. At least that is what I feel at this point of time.


This is not true in many use cases. There are tons of ways to handle low code/no code. It is a very hard problem to solve, and vendors end up building basic "low-code" wrappers around API endpoints, and that's why it looks like a lost cause.

Done right, (we are a living proof it can be done, at Syncari), one can do a lot of stuff done without any coding


Do you have any links to some blog posts discussing it (the "doing it right" approach)? I would agree that it definitely depends on the use case.

I will definitely check out Syncari - just opened the landing page and it looks great!


Thanks! Haven't had a chance to write a lot about this (heads down building and selling), but https://syncari.com/a-brief-history-of-todays-data-woes-and-... touches on it a bit.

For us, it is about:

* implementing deep integrations that are commonplace

* not spraying too thin in the quest of supporting hundreds of systems

* thinking from a data model/data/eventually consistent system perspective

* completely dropping the reactive/trigger based/if-this-then-that point-to-point model.


Thanks Neelesh for the link, I like the way of thinking with the focus on data.

It is similar with what I am seeing - i.e. lots of older integration systems are terrible with data simply because they force you to use some way of data modeling (e.g. their OOP data models, WSDLs/XSDs etc.) while the newer ones just rely on json which is good but can lack the (sometimes necessary) complexity. To do some data cleansing on the way then seems like an unachievable task (there is certainly such a thing as overdose on XSLT ;) ).

I also like the approach you took with centralized data dictionary - it certainly is something the industry might need, I would wonder how it impacts change management though (especially in bigger companies).

Wishing you good luck!


Which bucket temporal.io and Azure Durable Functions that support `workflow as code` do fall into?


This is a very good point.

My current line of thinking is: A workflow is a workflow is a workflow. Titanoboa is built the way that pretty much everything it shuffles is data - and either it can be big (Airflow) or the steps can have side effects (iPaaS/Zapier).

The "no code" is achievable since adding more predefined steps is not that hard (there's not that many at the moment though) - see for instance https://github.com/mikub/titanoboa/wiki/Getting-Started-with... - you dont code, you just fill in properties, so it is very straightforward, pretty much like Zappier.

My aim in the iPaaS space is more at the enterprise level, where the predefined steps won't do anymore and you have to custom develop them anyway - so in that are I think Titanoboa shines since you really can rapidly prototype steps on the fly, a bit like Repl.it and Zapier together...

I agree that it might be good to pick one audience, at this stage I am just experimenting if somehow I can market to both and don't mess up and confuse both groups.


Keep converging and building.

I have been experimenting with combinations of huggin, camunda, airflow and others to try and achieve an integrated workflow/state/process management.

There exists a gap in the enterprise space between all these tools.. and it has been further exacerbated by the disruption introduced to many industries by covid. There is a real opportunity for small and medium businesses to be able to access tooling that is beyond what a base line Zap, etc can accomplish.

Would be happy to connect offline.


Thanks! Will reach out when the traffic subsides :) Or feel free to shoot me an email, my email address is on the web page.


There's also a problem with describing your product as "an alternative to". I don't know what Zapier and Airflow is, but now I want to find out, which pushes me towards the competition, rather than just explaining what your product does and have me view the competition as the alternative.


Hi HN peeps! I single-handedly built Titanoboa (a workflow automation for JVM) and now I am experimenting with hosting dockerized Titanoboa instances - so I built a free hosting service. If you want to play with Titanoboa in your browser without installing anything, give it a try.

Feel free to check out Titanoboa on Github: https://github.com/mikub/titanoboa

Also: This is an early beta so please do let me know if something breaks or you spot a bug. Atm I have load-balancers set up only on West coast and in Europe, so apologies to folks from down under & similar locations, let me know if it's too laggy :)


Why do you describe it as "automation for JVM" instead of "automation that runs on JVM"?


Good point, not a native speaker so I thought this would roughly mean the same.

The main message I wanted to convey was that this runs on JVM and multiple JVM languages (java and clojure) could be used.


Wow, this has blown up a bit, so I have added few more servers (now two servers on the West coast and two in Europe), but ultimately I think there is a limit cap of 100 Titanoboa instances in each geography in parallel.

So if we break that level please don't be mad if you don't get your instance :)

Instead give me a star on github and come back later :)

Cheers Miro


Ok, I have just reached 100 concurrent containers in Europe, which is max at this stage and I don't think I can change the aws limit on the fly (will check though) so I might start killing the oldest instances.

Apologies for inconvenience folks, but I am sure people rarely play with each instance for the whole 3 hours anyway at this stage.

Honestly did not expect this kind of "public beta" rollout, still surprised how well the service holds so far under the load.


Approaching 100 containers cap in US, so if something does not work, dont be mad :)


If anybody's curious about this "public beta hosting - HN bear hug" performance load test:

I just (sequentially) restarted all 4 servers behind my 2 load balancers - seems that by reaching 100 containers in each geography the provisioning threads may have died after many (unsuccessful) retries, so need to restart the thread pools.

That is one more error I need to handle (or increasing my limit in AWS).


Heya - just a note that I build these kinds of PaaS platforms for a living - if you need a hand building a super-massive cluster for lots of users, let me know!


Wishing you success, fellow Miro.


First impression: rename it now while the cost is low. The name is unwieldy.


Ti-ta-no-bo-a

5 syllables.

Agreed that this is unwieldy to say in normal speech.


I mostly considered naming to be a distraction from getting the real work done.

I also am probably not good with names as my other alternative was Megalodon (4 syllables) - I just wanted to have some megafauna name.

Happy to hear what you folks would suggest :)


Gdoc form for suggestions. 1 mail to all users (small enticement, reward for winner, pro account for life or something else really nice). Remove any lewd suggestions. Then pick one you love, or if you can't choose a poll.


Titano is nice, but may have some trademark issues. Tanobo is easy to say, if a bit generic. Just playing with syllables.


"Dinornis" or "Moa" in maori.


If you're going with the former, you might as well spell it "deinornis", it looks cooler and is more correct.


A fauna focused synonym for canopy?


Ti-tan-o-bo-a


Nice work!

How does it compare to n8n? [0]

n8n is the closest OSS alternative to Zapier I've seen so far.

[0] https://n8n.io/


I just tried titanoboa and it's not even close to Zapier. n8n comes pretty close and it's much more convenient to hack with nodejs.

n8n is pretty neat as a low-code or no-code tool and comes close to zapier in many cases.

As someone with zero Java experience, this tool seems like a very steep learning curve.

Zapier requires zero coding knowledge for most part and is a great no-code tool.


A bit similar, but on JVM. The main focus of Titanoboa is however as follows:

- make rapid prototyping of new steps possible during runtime, without any need to restart/redeploy

- focus on distributed processing, where in master-less Titanoboa cluster you can have pretty much any number of nodes


It's license (fair code) sucks.


That is a fair criticism, I am aware of the AGPL license shortcomings.

I just picked up the more restrictive license at the beginning - being a sole funder and not working on this full time etc. I simply did not want somebody (e.g. a big company with a big team) grabbing my code along the way and running away with it.

Since now Titanoboa got to the shape I envisioned it to be in I am starting to focus more on adoption, so yes I am definitely thinking about switching to less restrictive license since it will probably help.

Also at the beginning I was not aware how badly AGPL is perceived (I always thought if it was good for Mongo it could work for me, but I may have been wrong).


You misunderstood, it's not you who's being criticized.

n8n used to write "open source" on their webpage while publishing the product with a proprietary license. A lot of discussions ensued.

https://github.com/n8n-io/n8n/issues/40


Oh really? Wasn't aware of that. Thanks for the link!


>>I just picked up the more restrictive license at the beginning - being a sole funder and not working on this full time etc. I simply did not want somebody (e.g. a big company with a big team) grabbing my code along the way and running away with it.

Sounds like a great reason to use the AGPL! Can always switch the license for later releases as you gain traction.


Why exactly? I myself have been researching the options that exist to write open source software but at the same time prevent industrial leeches from benefiting from it (aka. avoiding Amazon playing an Elasticsearch move EDIT - or was it Mongo?).

So far I've stumbled upon using dual AGPL + commercial for those who don't like the copyleft; using something like Mongo's SSPL; MariaDB's BSL; and now, Commons Clause.

On the surface (I didn't yet study carefully the intricacies of all options) all these look to me as a great way to publish code and contribute to the whole of our common knowledge, while at the same time being able to maybe make a living from it, something for which it's important to prevent some bad actors from bundling it and profiting from it on your behalf. Otherwise that code wouldn't really exist at all to start with...

I might write an Ask HN becaude this topic is complex.


@newcrobuzon you have some kind of comment string output at the top of your html body tag. Looks like something you do not want exposed.

Congratulations on reaching public beta and for submitting to HN! Looks very promising!


Thanks! If it is the recapcha string then it is just an old public string for local testing.

But I would upvote you 100x or send you a medal or something if I could, this is really super nice of you letting me know!


Impressive feat of engineering for one person!

But is it really an alternative to Zapier? I mean, while Zapier is obviously about automation, it always felt that their big selling point was the sheer number of integrations they provide.


My aim is to make this easier to customize - i.e. develop custom steps on the run and have an instant feedback loop as you develop or customize existing integrations.

Target audience might be slightly different ultimately, but we'll see - I would envision this to be more useful in an enterprise environment where you pretty much always have to customize the integrations that were provided out of the box.


This looks very interesting. The visual component in particular seems very well done.

I have a deep interest in DAG-structured ETL tooling and had a couple of questions that the documentation didn't seem to address...

1. Can I execute workflows without a server running? Something like... $ java -jar titanoboa.jar MyWorkFlowName arg1 arg2 ... ...and then my workflow executes, as a program, on my machine, until it's done and then exits? Or does every workflow always execute within the context of a running server?

2. Is there any notion of resuming a partially failed workflow? As a point of comparison, Luigi structures its DAG concept using Tasks which create Targets, and invoking a Task whose Target already exists is a no-op, so if you have a big execution graph that gets 80% finished and then dies, you can easily restart it. I find that many competing tools are missing this concept.


Now that it's Saturday I've had time to play with this and I believe I've found the answers to my questions...

1. Based on the Clojure REPL example in the main README, I think the answer is YES, though not exactly the way I had imagined. It seems what you would need to do is write a top-level script (or java "main") that starts a "system", starts "workers", runs your job using that system, then stops the system and exits. A little clunky compared to how Luigi does it, but usable.

2. Best as I can tell the answer is NO. Neither the documented API, nor the implementation of the API in src/clj/titanoboa/handler.clj contain any hint of an ability to operate on a job id, beyond retrieving the result of its execution.

Additional commentary: 1. Resuming failed jobs As implied by my question above, the ability to resume a failed job is essential. One of the major reasons to adopt DAG-structured code is parallel execution, and Titanoboa has that. But the OTHER major reason is to allow partially-failed computation to retry/resume without repeating already-completed work. In particular in the ETL space, we often have job graphs composed of hundreds of nodes, with total runtimes measured in hours. If my 100-node job graph fails due to an error in node #78, preventing an additional 15 downstream nodes from running, I don't want want to run all 100 nodes again after I fix the problem. I want to resume executing my graph at #78, and expect only the 16 total affected nodes to execute, since everything else ran correctly the first time (and presumably persisted their outputs). Luigi gets this one right. Airflow sorta tries but it's clunky and you can tell it's not a priority.

2. Flow/Dependency direction When designing a workflow, either in the GUI or as EDN, you tell Titanoboa what jobs are "next". This is intuitive because it comports with our notion of execution flow through a graph of jobs, but it gets things backwards. That is, when we write A->B->C, we are thinking that A will execute, and then B, and then C (perhaps results will be passed from step to step). It is often better though to describe this as A<-B<-C, which reads as C depends on B, B depends on A, and A depends on nothing. Structuring our thinking in this way focuses the mind on what inputs a node requires in order to perform its effect or compute its output, rather than on what operations should follow it in time. Luigi and Airflow both get this one right.

3. Properties The way Titanoboa defines workflow-level "properties", into which job-level properties are merged, and the way properties flow along the path of execution, is very nice. A constant problem with Luigi is how to flow values from one Task to the next without using an excessive number of Parameters. I can't say for sure that Titanoboa's properties construct doesn't have the same problems, without taking the time to actually use it to build a large project, but on the surface it looks good.

4. Logging I noticed that when a step's function returns a map, to be integrated into "properties", that return value is not logged. The message in the log is like "Step [my-cool-step-name] finshed with result []" which is both unhelpful, and not even literally true, as it most certainly did have a result! When a step returns a scalar value, it does get logged. I found this inconsistency frustrating.

Also, the stdout/stderr of each step function apparently goes to /dev/null. I find this odd as the placeholder function when you build a new workflow is (println "Hello World!") but if you actually execute that you'll discover that our classic greeting vanishes into the void. This is a major shortcoming. As a point of comparison, one of the biggest value-adds of using Jenkins as a job scheduler is how it automatically captures the output of anything you run, saves it in a durable log file, AND lets you view it in real time. Job orchestration systems that don't match that level of log-friendliness drive me nuts.

5. Versioning The built-in versioning system is great. Two thumbs up. I don't know how it would work if I were writing my jobs in proper Clojure or Java code in their own repo, but I kinda don't care because the value of storing and versioning what I do in the UI is so great.

6. UI -> data I love the way the interactive UI is just there to generate EDN. In a way this mirrors how Jenkins' UI builds its job XML files, but you have to go hunting for those and they're hard to read (because XML). Being able to see what EDN is generated by your actions in the UI, _right there in the UI_, is fantastic.

7. UI issues The UI is great but it has quite a bit of low-hanging-fruit improvements that could be made. - the run job popup forces you to choose a system every time, even if there is only one - being able to draw arrows in the visualization is cool, but I could not figure out how to delete them there. Needs work. - the UI doesn't lay out well on small screens (I'm on an old 13" Air), I had to zoom to 80% just to be able to see the X to delete a property. It would help if the Workflows panel on the left (the least important UI element by far!) could be collapsed (edit: it can be collapsed, but the collapse button is on the other side of the screen which makes no sense) - the box that pops up after starting a job has nothing clickable in it. I have to close it and go to the jobs tab - the jobs tab doesn't refresh when it loads, even if I just started a job, which needlessly adds clicks to the main workflow - the jobs tab has an "archived" sub-tab but no apparent way to actually move a job to the archive

Overall, there's a lot of promise here, and it's amazing to me that you built this by yourself. Still, it has a long way to go. I recommend spending some time with Luigi, which I still think is the best general way of DAG-structuring real world workflow code, and with Jenkins which remains far and away the best UI-driven job orchestration system. It seems you're already familiar with Airflow, but I would recommend you treat it mainly as an example of what not to do.


Really appreciate your time looking into this and apologies, missed your original post. You got the answeres right. Re 2 - it shouldn't be that hard to add, since everything is just data. It is however partially covered by the retry property on each step. Will read through your additional comments and will respond tomorrow (busy day, plus it's midnight here in Europe)! Cheers Miro



To be honest Prefect just got on my radar last week, so I will need to look at it more closely, I was not aware of them till then.

Obviously this is for JVM, plus I am not sure how Prefect addresses following two points I mainly focus on:

- make rapid prototyping of new steps possible during runtime, without any need to restart/redeploy

- focus on distributed processing, where in master-less Titanoboa cluster you can have pretty much any number of nodes


Prefect is like an hour just to put up the kubernetes environment required for it. The way you have it you can just do a simple docker run and start coding this is the comparison and you're coding on the web no need to steup VSCode or nothing it's fast.


This looks really well done, newcrobuzon. Very impressed.

I especially love your web UI. It makes it very easy to start experimenting with workflows without the overhead of having to set up a local development environment.

I am curious how you store secrets (e.g. AWS access key id, secret access key). There isn't a login wall and it's not clear that the values will be protected from you, so I am loathe to put my credentials on there.

Great work overall. Wish you the best.


Thanks for the kind words!

Yes, I would still be careful with passwords in the public beta.

It goes via SSL and you get a unique UUID URL so all-in-all it is kinda secured (plus time-to-live of the instance is only 3 hours which limits any time to hack it) but still this is not 100% secure and I would not recommend it for any kind of production use (including any use of passwords you dont want exposed).

The free instances are not (yet) password protected (other then the UUID) - this on the other hand is useful if you want to share the envrionment with somebody, just send them the link... This is just a beginning of the public hosting so I will need to think through further improvements that would go into the hosting, security-wise and other aspects as well.

My main objective at this stage for the free hosted instances was to give people way to play with Titnaoboa or quickly test something, especially if something is not working in their local environment (say dependencies) and they want quickly re-test in vanilla environment.

If you want to add aws secrets or what not I would suggest to just download/install your local version, it is super easy: https://github.com/mikub/titanoboa#installation

Or if you need a public IP to integration just grab the docker instance and spin it in your AWS ECS or something.


After clicking logout, I had no way of logging back in. :)


Ooops, good catch, I will fix that :)


Very interesting! What library did you use for the SVG powered diagram tool?


I think this blog post answers my question: https://titanoboa.io/visualizing-workflows.html

Very cool to see how you use clojurescript for this.


I am using D3. AFAIK nobody has used it for worfklow visualization so far: https://www.titanoboa.io/visualizing-workflows.html


Luigi does some workflow visualization with d3

https://luigi.readthedocs.io/en/stable/index.html


Thanks, wasn't aware of them!


It's a very code forward one, but I've certainly found it useful and easier to understand than something like airflow.


Being a JVM friendly tool, it would be great to unlock other JVM syntaxes to allow scripting in any lanugage.. Jython for Python, Quercus for PHP, JRuby for Ruby, etc.


yes, I am considering this and it should not be that hard (also Kotlin, Nashorn javascript...), but at this stage Titanoboa is just a one-man-show - so patience, I might get there ;)


FWIW, I would have to check all the ones listed but if they're sane scripting languages on the jvm then they'll implement the JSR-223 interface allowing one to pick them at runtime without a huge amount of drama: https://jcp.org/aboutJava/communityprocess/final/jsr223/inde...


newcrobuzon you're a genius my friend someone should give you an award I loved your idea and how you approached it it's fantastic for developers. I'm a root Java dev and I have to say you did fantastic work to make developers life easier, you didn't take any approach you made your own this is awesome. If you get Javascript working I feel like I'm getting back into AppJet days! This is fantastic stuff. If you show me how I can help get the javascript stuff going I would be glad to help you in this initative since I have adopted NodeJS recently like you said, being a one man army you have to be efficient and JS makes me more efficient.


Thanks ramon! Appreciate your kind words! Feel free to shoot me an email or reach me on twitter and let's stay in touch!


For sure. Being a one man army has it's perks and challenges. I hope this project has a community form around it.

The JVM is more underrated/unknown than it should be in the startup space, tools like what you're making make it far more accessible.



is there some recipes or some common scenarios one can 'import' in to check it out?



Honestly I really like this, I can't wait to use it, but my concerns are around certain features being behind a paywall, especially HA and Clustering on the self-hosted version.

I understand that the cloud hosted version should cost more for these features but without a way to get pricing for the self-hosted HA and clustered version it will be a tough sell for me to deploy it, use it and have to scrap it in the future.


I liked the video demo really nice and I would say this should be called not low-code or no-code but rather aPaaS which is Application Platform as a Service.

So the title should be "Host your own aPaaS" instead.


How does it compare with Cadence/Temporal?


[flagged]


This comment breaks HN's rules badly. Please don't post like this again, regardless of how someone's website is.

https://news.ycombinator.com/newsguidelines.html

https://news.ycombinator.com/showhn.html


[flagged]


Well yes, this is for JVM, so maven ecosystem, sorry.

I have been contemplating also doing a clone on node.js (titanoboa is written in clojure so it would just be a move from clojure to clojurescript), but I did hear that the npm ecosystem is a bit uninviting too...

What dependencies tools would you like to see?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: