Vito is at dagger.io now so hopefully we can expect some good stuff in the CI sp...

solatic · on Sept 23, 2023

Sadly, Dagger doesn't get it either. It's so focused on portability between the underlying infrastructure providers, on not being the underlying infrastructure provider, and therefore it doesn't solve the real problem, which is the underlying infrastructure provider.

(a) Consider Dagger's integration with GitHub Actions: https://docs.dagger.io/cookbook#github-actions where you anyway need to run setup-node, npm ci, etc. just to start the Dagger pipeline. So Dagger isn't saving you from the having to deal with GitHub Actions' caching layer and always-blank starting point - it's unavoidable. Well, if I can't avoid it, why should I use Dagger in the first place - why not embrace it?

(b) Consider a usecase where I want to parallelize the computation onto a number of machines dynamically chosen at run-time. Maybe I want to allow a test suite to run on an increasing number of machines without needing to periodically manually increase the number of machines in a configuration file, or maybe I'm using Terraform workspaces where I want to run terraform apply for each workspace on a different VM to let the number of workspaces scale horizontally. This is fundamentally impossible with something like Dagger (also impossible in GitHub Actions) because it would require Dagger to communicate with the infrastructure provider to tell it to scale up compute to handle the parallel jobs, and then scale down once those jobs finish.

This was achievable with Concourse by having Concourse pipelines generate other Concourse pipelines, and running the underlying Concourse workers as an autoscaling Kubernetes statefulset/deployment, combined with other Kubernetes implements like cluster autoscaler.

shykes · on Sept 24, 2023

Hi! Dagger co-founder here. I thought I’d share a few clarifying points - and acknowledge that we should explain certain aspects of Dagger’s design better, to avoid having to clarify in the first place!

> you anyway need to run setup-node, npm ci, etc. just to start the Dagger pipeline

You do need to write CI configuration to run a dagger pipeline, but it’s a very small and standardized snippet (“install dagger, the run this command”) and it typically replaces a much larger custom mess of yaml and shell scripts.

The main benefit though is that your pipeline logic is decoupled from CI altogether. You can run the same Dagger pipeline post-push (in CI) but also pre-push. Similar to running a makefile or shell script, except it’s real code.

> * Dagger isn't saving you from the having to deal with GitHub Actions' caching layer and always-blank starting point*

Dagger most definitely does do that :) We use Dagger and Github Actions ourselves, and have completely stopped using GHA’s caching system. Why bother, when Dagger caches everything automatically?

> Well, if I can't avoid it, why should I use Dagger in the first place - why not embrace it?

I think that’s your Stockholm syndrome talking. The terrible experience that is CI - the “push and pray” development loop; the drift between post-push yaml and pre-push scripts; the primitive caching system; the lack of good composition system; the total impossibility of testing your pipelines - that pain is avoidable, and you shouldn’t have to embrace it. You deserve better!

> This is fundamentally impossible with something like Dagger (also impossible in GitHub Actions) because it would require Dagger to communicate with the infrastructure provider to tell it to scale up compute to handle the parallel jobs, and then scale down once those jobs finish.

This is possible with Dagger. It certainly shouldn’t be a core feature of the engine, but the beauty of a programmable system is that you can build infinite capabilities on top of it. You do need a truly programmable system though, which Github Actions is not.

> This was achievable with Concourse by having Concourse pipelines generate other Concourse pipelines

Dagger pipelines can dynamically run new pipelines, at arbitrary depth. In other words nodes in the DAG can add more nodes at runtime.

Give vito some credit, he is an incredibly talented engineer who built a product that you love. Maybe he saw in Dagger the potential to build something that you will love too :) He blogged about his thought process here: https://dev.to/vito/why-i-joined-dagger-43gb

I will concede that Dagger’s clustering capabilities are not great yet. Which is why piggyback on CI infrastructure for that part… for now!

solatic · on Sept 24, 2023

> Why bother, when Dagger caches everything automatically?

The fear with needing to run `npm ci` (or better, `pnpm install`) before running dagger is on the amount of time required to get this step to run. Sure, in the early days, trying out toy examples, when the only dependencies are from dagger upstream, very little time at all. But what happens when I start pulling more and more dependencies from the Node ecosystem to build the Dagger pipeline? Your documentation includes examples like pulling in `@google-cloud/run` as a dependency: https://docs.dagger.io/620941/github-google-cloud#step-3-cre... and similar for Azure: https://docs.dagger.io/620301/azure-pipelines-container-inst... . The more dependencies brought in - the longer `npm ci` is going to take on GitHub Actions. And it's pretty predictable that, in a complicated pipeline, the list of dependencies is going to get pretty big - at least a dependency per infrastructure provider we use, plus inevitably all the random Node dependencies that work their way into any Node project, like eslint, dotenv, prettier, testing dependencies... I think I have a reasonable fear that `npm ci` just for the Dagger pipeline will hit multiple minutes, and then developers who expect linting and similar short-run jobs to finish within 30 seconds are going to wonder why they're dealing with this overhead.

It's worth noting that one of Concourse's problems was, even with webhooks setup for GitHub to notify Concourse to begin a build, Concourse's design required it to dump the contents of the webhook and query the GitHub API for the same information (whether there were new commits) before starting a pipeline and cloning the repository (see: https://github.com/concourse/concourse/issues/2240 ). And that was for a CI/CD system where, for all YAML's faults, for sure one of its strengths is that it doesn't require running `npm ci`, with all its associated slowness. So please take it on faith that, if even a relatively small source of latency like that was felt in Concourse, for sure the latency from running `npm ci` will be felt, and Dagger's users (DevOps) will be put in an uncomfortable place where they need to defend the choice of Dagger from their users (developers) who go home and build a toy example on AlternateCI which runs what they need much faster.

> I will concede that Dagger’s clustering capabilities are not great yet

Herein my argument. It's not that I'm not convinced that building pipelines in a general-purpose programming language is a better approach compared to YAML, it's that building pipelines is tightly coupled with the infrastructure that runs the pipelines. One aspect of that is scaling up compute to meet the requirements dictated by the pipeline. But another aspect is that `npm ci` should not be run before submitting the pipeline code to Dagger, but after submitting the pipeline code to Dagger. Dagger should be responsible for running `npm ci`, just like Concourse was responsible for doing all the interpolation of the `((var))` syntax (i.e. you didn't need to run some kind of templating before submitting the YAML to Concourse). If Dagger is responsible for running `npm ci` (really, `pnpm install`), then it can maintain its own local pnpm store / pipeline dependency caching, which would be much faster, and overcome any shortcomings in the caching system of GitHub Actions or whatever else is triggering it.

shykes · on Sept 24, 2023

> I think I have a reasonable fear that `npm ci` just for the Dagger pipeline will hit multiple minutes, and then developers who expect linting and similar short-run jobs to finish within 30 seconds are going to wonder why they're dealing with this overhead.

I was going to reply that you misunderstand how Dagger works: that your pipeline logic typically requires very few dependencies, if any, since it can already have dagger build, download and execute any container it needs - with free caching. So your “npm ci”, although technically not free, is a drop in the bucket and easily offset by the 2x or even 5x speedup that is typical when daggerizing an existing pipeline.

All the above is true… But I realize that the documentation you link to contradicts it. Reading these guides I understand your fear. Just know that these guides are, in that respect, giving you the wrong idea. In a more representative example, instead of importing a GCP client library, the script would have dagger run a GCP client in a container, with the appropriate input flags, config files and, if needed, artifacts. We will fix those guides accordingly, thank you for bringing this to my attention.

Also, soon Dagger will do exactly what you were expecting: it will execute the “npm ci” itself, in a container. So whether that script’s overhead is 3 seconds in the typical scenario I described to you, or a few minutes in the scary scenario you described to me: either way it will get cached, and either way it will have no host dependency on npm or any other language tooling: just the dagger CLI.

solatic · on Sept 24, 2023

> We will fix those guides accordingly, thank you for bringing this to my attention.

Great :D

I read a lot more of the documentation this morning, in general I really like what I see, but the clustering bit is the key missing point for me at the moment. My current employer produces an Electron desktop application - we need to run 90% of the tests on Linux (most cost-effective), and 5% each on a Windows and macOS machine. I see the Dagger CLI runs on both macOS and Windows, but the architecture as it stands today expects that the Dagger pipeline will run all its workloads on the same VM. If I want to run pipeline tasks on other VMs, I can do it from Dagger, treating the Dagger Engine as the orchestration layer which makes some kind of an IaaS or PaaS call to schedule the workload on a separate macOS or Windows VM, outside the Dagger architecture. But... today I expect that workload scheduling to happen within the CI/CD architecture (it's trivially expressible within GitHub Actions, particularly as GitHub Actions hosts both Windows and macOS runners), and needing to hoist it outside the CI/CD architecture in order to use Dagger is a needless complication.