I've used Zapier a fair amount and I wrote an article about Airflow so I have a fairly good understanding of that too.
I would never really consider them alternatives though? To me Zapier is a low/no code tool that offers a bazillion integrations and Airflow is a workflow orchestration tool.
So comparing to both of them confuses me and I guess choosing one would give you a more nieche audience but one that you can connect better to as well.
Very true. We really should stop overloading the term Workflow in software.
As of now these are the broad categories that abuse the term workflow.
1. State Machines
For example in Jira a bug/ticket moves through different states to reach a final stage. This type of state-machines can be found in a lot of different softwares - most CMS/bug-trackers/CRMs where the entity is different (document/bug/lead).
The motive of these systems who call themselves as workflow engines - is to provide a structure to an otherwise ad-hoc movement of entities so that a lead/manager is able to ensure a process and collect statistics.
2. Automations
For example "apple workflow" app or Zapier or Zoho Flow. These softwares define a sequence of steps that are triggered when an event occurs in the system.
The motive of these systems are to enable automation and integrations between different software components with zero code (thus, no-code).
3. Process Designers (Bad term but can't think of anything better at the moment)
For example Airflow/Camunda. These systems are not necessarily low-code but they mostly deal with arranging individual components of code such that a process can be assembled as quickly as possible. These systems usually are accompanied by a visual designer like what Zapier has, but the intentions are mostly to ease out the process, than being a complete no-code tool to create automations. However, their marketing tries to sell themselves as no-code platform for business folks.
The motive is not yet very clear to me but from my initial intuition they can be used to initiate some data-processing pipeline, I guess? If anybody can throw more clarity, please leave a reply.
Now as you can see, much like how a "Process" can mean many things in many different context, the term "Workflow" can mean a lot depending on the context. Any software that calls itself the ultimate workflow solution is just a lie. It's like calling something an "ultimate process engine" - doesn't make sense.
I'd like you to listen to this file, which is your first 'headline' read by `say`: https://www.dropbox.com/s/fanpqs8lv2d9nvl/say.m4a?dl=0 Now imagine yourself in the shoes of a vision-impaired person relying on a technology like VoiceOver. Do you understand how unbelievably frustrating that is?
What is your point? zacwest isn't asking them to completely rewrite the page—the problem they pointed out is trivial to solve and simply needed to be recognized. If you can spend 20 minutes to drastically improve the usability of your product for "dozens" of users, why wouldn't you?
"However, their marketing tries to sell themselves as no-code platform for business folks."
Huh? I've never seen Airflow described as no code or tried to sell itself that way, in fact all the pipelines are written in python and you can do some really complex orchestration.
I get you're not saying Airflow is no code but that the category you've put it in is typically low code or marketed as low code, but then I don't think Airflow belongs in that category or rather, and maybe more accurately, no/low code is not really a major defining quality of the bucket you're calling "Process Designers".
I've also been calling them "Process Schedulers" because typically it involves translating a more manual, but well defined process into it's automated phase.
The no-code is an illusion in the enterprise realm - before you know it, you are waist deep in the custom code.
No-code can really work only for small businesses imo.
I come from enterprise background and that is one of the reasons I built Titanoboa - to make something that makes it easy to rapidly prototype new integrations on the fly.
The main point I am trying to test with Titanoboa is this however:
State Machines <-> Process Designers is a spectrum and one product could handle the entire spectrum (or part of it).
Titanoboa makes it possible to pre-define workflow steps and make it "no code" while also making complex custom integrations possible from the same environment with the same concepts. Plus also distributed data handling is in the mix.
I guess now the challenge is how to market this versatility or whether it could create more confusion...
I see what you are getting at. Yes State Machines to Process Designers is a spectrum and has a quite a bit of overlappings.
For example these are the lowest common denominators I see.
#1 Graph: All of these systems allow you to visually design/represent the process as a graph. You yourself has abstracted these into graph problems and have come out with a simpler non-verbose BPMN alternative - which is great.
#3 Computability: Since the base is a definitive graph, essentially a graph that could execute is a finite-automata. That is, all of these systems put the power back to the end-users to create their own machines (without actually coding) hence the relatability with low-code. So at the end, broadly even the motive aligns from computability perspective.
But I'm still not convinced these denominators justify an all-in-one one-size-fits-all solution to this spectrum. I'm not saying one product shouldn't attack them all, but it's better to appropriately categorize them and develop unique features on top of each of them. At least that is what I feel at this point of time.
This is not true in many use cases. There are tons of ways to handle low code/no code. It is a very hard problem to solve, and vendors end up building basic "low-code" wrappers around API endpoints, and that's why it looks like a lost cause.
Done right, (we are a living proof it can be done, at Syncari), one can do a lot of stuff done without any coding
Thanks Neelesh for the link, I like the way of thinking with the focus on data.
It is similar with what I am seeing - i.e. lots of older integration systems are terrible with data simply because they force you to use some way of data modeling (e.g. their OOP data models, WSDLs/XSDs etc.) while the newer ones just rely on json which is good but can lack the (sometimes necessary) complexity. To do some data cleansing on the way then seems like an unachievable task (there is certainly such a thing as overdose on XSLT ;) ).
I also like the approach you took with centralized data dictionary - it certainly is something the industry might need, I would wonder how it impacts change management though (especially in bigger companies).
My current line of thinking is: A workflow is a workflow is a workflow. Titanoboa is built the way that pretty much everything it shuffles is data - and either it can be big (Airflow) or the steps can have side effects (iPaaS/Zapier).
The "no code" is achievable since adding more predefined steps is not that hard (there's not that many at the moment though) - see for instance https://github.com/mikub/titanoboa/wiki/Getting-Started-with... - you dont code, you just fill in properties, so it is very straightforward, pretty much like Zappier.
My aim in the iPaaS space is more at the enterprise level, where the predefined steps won't do anymore and you have to custom develop them anyway - so in that are I think Titanoboa shines since you really can rapidly prototype steps on the fly, a bit like Repl.it and Zapier together...
I agree that it might be good to pick one audience, at this stage I am just experimenting if somehow I can market to both and don't mess up and confuse both groups.
I have been experimenting with combinations of huggin, camunda, airflow and others to try and achieve an integrated workflow/state/process management.
There exists a gap in the enterprise space between all these tools.. and it has been further exacerbated by the disruption introduced to many industries by covid. There is a real opportunity for small and medium businesses to be able to access tooling that is beyond what a base line Zap, etc can accomplish.
There's also a problem with describing your product as "an alternative to". I don't know what Zapier and Airflow is, but now I want to find out, which pushes me towards the competition, rather than just explaining what your product does and have me view the competition as the alternative.
Hi HN peeps! I single-handedly built Titanoboa (a workflow automation for JVM) and now I am experimenting with hosting dockerized Titanoboa instances - so I built a free hosting service. If you want to play with Titanoboa in your browser without installing anything, give it a try.
Also: This is an early beta so please do let me know if something breaks or you spot a bug. Atm I have load-balancers set up only on West coast and in Europe, so apologies to folks from down under & similar locations, let me know if it's too laggy :)
Wow, this has blown up a bit, so I have added few more servers (now two servers on the West coast and two in Europe), but ultimately I think there is a limit cap of 100 Titanoboa instances in each geography in parallel.
So if we break that level please don't be mad if you don't get your instance :)
Instead give me a star on github and come back later :)
Ok, I have just reached 100 concurrent containers in Europe, which is max at this stage and I don't think I can change the aws limit on the fly (will check though) so I might start killing the oldest instances.
Apologies for inconvenience folks, but I am sure people rarely play with each instance for the whole 3 hours anyway at this stage.
Honestly did not expect this kind of "public beta" rollout, still surprised how well the service holds so far under the load.
If anybody's curious about this "public beta hosting - HN bear hug" performance load test:
I just (sequentially) restarted all 4 servers behind my 2 load balancers - seems that by reaching 100 containers in each geography the provisioning threads may have died after many (unsuccessful) retries, so need to restart the thread pools.
That is one more error I need to handle (or increasing my limit in AWS).
Heya - just a note that I build these kinds of PaaS platforms for a living - if you need a hand building a super-massive cluster for lots of users, let me know!
Gdoc form for suggestions. 1 mail to all users (small enticement, reward for winner, pro account for life or something else really nice). Remove any lewd suggestions. Then pick one you love, or if you can't choose a poll.
That is a fair criticism, I am aware of the AGPL license shortcomings.
I just picked up the more restrictive license at the beginning - being a sole funder and not working on this full time etc. I simply did not want somebody (e.g. a big company with a big team) grabbing my code along the way and running away with it.
Since now Titanoboa got to the shape I envisioned it to be in I am starting to focus more on adoption, so yes I am definitely thinking about switching to less restrictive license since it will probably help.
Also at the beginning I was not aware how badly AGPL is perceived (I always thought if it was good for Mongo it could work for me, but I may have been wrong).
>>I just picked up the more restrictive license at the beginning - being a sole funder and not working on this full time etc. I simply did not want somebody (e.g. a big company with a big team) grabbing my code along the way and running away with it.
Sounds like a great reason to use the AGPL! Can always switch the license for later releases as you gain traction.
Why exactly? I myself have been researching the options that exist to write open source software but at the same time prevent industrial leeches from benefiting from it (aka. avoiding Amazon playing an Elasticsearch move EDIT - or was it Mongo?).
So far I've stumbled upon using dual AGPL + commercial for those who don't like the copyleft; using something like Mongo's SSPL; MariaDB's BSL; and now, Commons Clause.
On the surface (I didn't yet study carefully the intricacies of all options) all these look to me as a great way to publish code and contribute to the whole of our common knowledge, while at the same time being able to maybe make a living from it, something for which it's important to prevent some bad actors from bundling it and profiting from it on your behalf. Otherwise that code wouldn't really exist at all to start with...
I might write an Ask HN becaude this topic is complex.
But is it really an alternative to Zapier? I mean, while Zapier is obviously about automation, it always felt that their big selling point was the sheer number of integrations they provide.
My aim is to make this easier to customize - i.e. develop custom steps on the run and have an instant feedback loop as you develop or customize existing integrations.
Target audience might be slightly different ultimately, but we'll see - I would envision this to be more useful in an enterprise environment where you pretty much always have to customize the integrations that were provided out of the box.
This looks very interesting. The visual component in particular seems very well done.
I have a deep interest in DAG-structured ETL tooling and had a couple of questions that the documentation didn't seem to address...
1. Can I execute workflows without a server running? Something like...
$ java -jar titanoboa.jar MyWorkFlowName arg1 arg2 ...
...and then my workflow executes, as a program, on my machine, until it's done and then exits? Or does every workflow always execute within the context of a running server?
2. Is there any notion of resuming a partially failed workflow? As a point of comparison, Luigi structures its DAG concept using Tasks which create Targets, and invoking a Task whose Target already exists is a no-op, so if you have a big execution graph that gets 80% finished and then dies, you can easily restart it. I find that many competing tools are missing this concept.
Now that it's Saturday I've had time to play with this and I believe I've found the answers to my questions...
1. Based on the Clojure REPL example in the main README, I think the answer is YES, though not exactly the way I had imagined. It seems what you would need to do is write a top-level script (or java "main") that starts a "system", starts "workers", runs your job using that system, then stops the system and exits. A little clunky compared to how Luigi does it, but usable.
2. Best as I can tell the answer is NO. Neither the documented API, nor the implementation of the API in src/clj/titanoboa/handler.clj contain any hint of an ability to operate on a job id, beyond retrieving the result of its execution.
Additional commentary:
1. Resuming failed jobs
As implied by my question above, the ability to resume a failed job is essential. One of the major reasons to adopt DAG-structured code is parallel execution, and Titanoboa has that. But the OTHER major reason is to allow partially-failed computation to retry/resume without repeating already-completed work. In particular in the ETL space, we often have job graphs composed of hundreds of nodes, with total runtimes measured in hours. If my 100-node job graph fails due to an error in node #78, preventing an additional 15 downstream nodes from running, I don't want want to run all 100 nodes again after I fix the problem. I want to resume executing my graph at #78, and expect only the 16 total affected nodes to execute, since everything else ran correctly the first time (and presumably persisted their outputs). Luigi gets this one right. Airflow sorta tries but it's clunky and you can tell it's not a priority.
2. Flow/Dependency direction
When designing a workflow, either in the GUI or as EDN, you tell Titanoboa what jobs are "next". This is intuitive because it comports with our notion of execution flow through a graph of jobs, but it gets things backwards. That is, when we write A->B->C, we are thinking that A will execute, and then B, and then C (perhaps results will be passed from step to step). It is often better though to describe this as A<-B<-C, which reads as C depends on B, B depends on A, and A depends on nothing. Structuring our thinking in this way focuses the mind on what inputs a node requires in order to perform its effect or compute its output, rather than on what operations should follow it in time. Luigi and Airflow both get this one right.
3. Properties
The way Titanoboa defines workflow-level "properties", into which job-level properties are merged, and the way properties flow along the path of execution, is very nice. A constant problem with Luigi is how to flow values from one Task to the next without using an excessive number of Parameters. I can't say for sure that Titanoboa's properties construct doesn't have the same problems, without taking the time to actually use it to build a large project, but on the surface it looks good.
4. Logging
I noticed that when a step's function returns a map, to be integrated into "properties", that return value is not logged. The message in the log is like "Step [my-cool-step-name] finshed with result []" which is both unhelpful, and not even literally true, as it most certainly did have a result! When a step returns a scalar value, it does get logged. I found this inconsistency frustrating.
Also, the stdout/stderr of each step function apparently goes to /dev/null. I find this odd as the placeholder function when you build a new workflow is (println "Hello World!") but if you actually execute that you'll discover that our classic greeting vanishes into the void. This is a major shortcoming. As a point of comparison, one of the biggest value-adds of using Jenkins as a job scheduler is how it automatically captures the output of anything you run, saves it in a durable log file, AND lets you view it in real time. Job orchestration systems that don't match that level of log-friendliness drive me nuts.
5. Versioning
The built-in versioning system is great. Two thumbs up. I don't know how it would work if I were writing my jobs in proper Clojure or Java code in their own repo, but I kinda don't care because the value of storing and versioning what I do in the UI is so great.
6. UI -> data
I love the way the interactive UI is just there to generate EDN. In a way this mirrors how Jenkins' UI builds its job XML files, but you have to go hunting for those and they're hard to read (because XML). Being able to see what EDN is generated by your actions in the UI, _right there in the UI_, is fantastic.
7. UI issues
The UI is great but it has quite a bit of low-hanging-fruit improvements that could be made.
- the run job popup forces you to choose a system every time, even if there is only one
- being able to draw arrows in the visualization is cool, but I could not figure out how to delete them there. Needs work.
- the UI doesn't lay out well on small screens (I'm on an old 13" Air), I had to zoom to 80% just to be able to see the X to delete a property. It would help if the Workflows panel on the left (the least important UI element by far!) could be collapsed (edit: it can be collapsed, but the collapse button is on the other side of the screen which makes no sense)
- the box that pops up after starting a job has nothing clickable in it. I have to close it and go to the jobs tab
- the jobs tab doesn't refresh when it loads, even if I just started a job, which needlessly adds clicks to the main workflow
- the jobs tab has an "archived" sub-tab but no apparent way to actually move a job to the archive
Overall, there's a lot of promise here, and it's amazing to me that you built this by yourself. Still, it has a long way to go. I recommend spending some time with Luigi, which I still think is the best general way of DAG-structuring real world workflow code, and with Jenkins which remains far and away the best UI-driven job orchestration system. It seems you're already familiar with Airflow, but I would recommend you treat it mainly as an example of what not to do.
Really appreciate your time looking into this and apologies, missed your original post. You got the answeres right. Re 2 - it shouldn't be that hard to add, since everything is just data. It is however partially covered by the retry property on each step.
Will read through your additional comments and will respond tomorrow (busy day, plus it's midnight here in Europe)!
Cheers
Miro
Prefect is like an hour just to put up the kubernetes environment required for it.
The way you have it you can just do a simple docker run and start coding this is the comparison and you're coding on the web no need to steup VSCode or nothing it's fast.
This looks really well done, newcrobuzon. Very impressed.
I especially love your web UI. It makes it very easy to start experimenting with workflows without the overhead of having to set up a local development environment.
I am curious how you store secrets (e.g. AWS access key id, secret access key). There isn't a login wall and it's not clear that the values will be protected from you, so I am loathe to put my credentials on there.
Yes, I would still be careful with passwords in the public beta.
It goes via SSL and you get a unique UUID URL so all-in-all it is kinda secured (plus time-to-live of the instance is only 3 hours which limits any time to hack it) but still this is not 100% secure and I would not recommend it for any kind of production use (including any use of passwords you dont want exposed).
The free instances are not (yet) password protected (other then the UUID) - this on the other hand is useful if you want to share the envrionment with somebody, just send them the link...
This is just a beginning of the public hosting so I will need to think through further improvements that would go into the hosting, security-wise and other aspects as well.
My main objective at this stage for the free hosted instances was to give people way to play with Titnaoboa or quickly test something, especially if something is not working in their local environment (say dependencies) and they want quickly re-test in vanilla environment.
Being a JVM friendly tool, it would be great to unlock other JVM syntaxes to allow scripting in any lanugage.. Jython for Python, Quercus for PHP, JRuby for Ruby, etc.
yes, I am considering this and it should not be that hard (also Kotlin, Nashorn javascript...), but at this stage Titanoboa is just a one-man-show - so patience, I might get there ;)
FWIW, I would have to check all the ones listed but if they're sane scripting languages on the jvm then they'll implement the JSR-223 interface allowing one to pick them at runtime without a huge amount of drama: https://jcp.org/aboutJava/communityprocess/final/jsr223/inde...
newcrobuzon you're a genius my friend someone should give you an award I loved your idea and how you approached it it's fantastic for developers. I'm a root Java dev and I have to say you did fantastic work to make developers life easier, you didn't take any approach you made your own this is awesome.
If you get Javascript working I feel like I'm getting back into AppJet days! This is fantastic stuff.
If you show me how I can help get the javascript stuff going I would be glad to help you in this initative since I have adopted NodeJS recently like you said, being a one man army you have to be efficient and JS makes me more efficient.
Honestly I really like this, I can't wait to use it, but my concerns are around certain features being behind a paywall, especially HA and Clustering on the self-hosted version.
I understand that the cloud hosted version should cost more for these features but without a way to get pricing for the self-hosted HA and clustered version it will be a tough sell for me to deploy it, use it and have to scrap it in the future.
I liked the video demo really nice and I would say this should be called not low-code or no-code but rather aPaaS which is Application Platform as a Service.
So the title should be "Host your own aPaaS" instead.
Well yes, this is for JVM, so maven ecosystem, sorry.
I have been contemplating also doing a clone on node.js (titanoboa is written in clojure so it would just be a move from clojure to clojurescript), but I did hear that the npm ecosystem is a bit uninviting too...
I would never really consider them alternatives though? To me Zapier is a low/no code tool that offers a bazillion integrations and Airflow is a workflow orchestration tool.
So comparing to both of them confuses me and I guess choosing one would give you a more nieche audience but one that you can connect better to as well.