Hi! I've browsed the docs quickly, and I have a few questions.
Seems to assume that all CI/CD workflows work in a single container at a time pattern. How about testing when I need to spin up an associated database container for my e2e tests. Is it possible, and just omitted from the documentation?
Not familiar with cue, but can I import/define a common action that is used across multiple jobs? For example on GitHub I get to duplicate the dependency installation/caching/build across various jobs. (yes, I'm aware that now you can makeshift on GitHub a composite action to reuse)
Can you do conditional execution of actions based on passed in input value/env variable?
> Seems to assume that all CI/CD workflows work in a single container at a time pattern.
Dagger runs your workflows as a DAG, where each node is an action running in its own container. The dependency graph is detected automatically, and all containers that can be parallelized (based on their dependencies) will be parallelized. If you specify 10 actions to run, and they don't depend on each other, they will all run in parallel.
> How about testing when I need to spin up an associated database container for my e2e tests. Is it possible, and just omitted from the documentation?
It is possible, but not yet convenient (you need to connect to an external docker engine, via a docker CLI wrapped in a container) We are working on a more pleasant API that will support long-running containers (like your test DB) and more advanced synchronization primitives (wait for an action; terminate; etc.)
> For example on GitHub I get to duplicate the dependency installation/caching/build across various jobs. (yes, I'm aware that now you can makeshift on GitHub a composite action to reuse)
Yes code reuse across projects is where Dagger really shines, thanks to CUE + the portable nature of the buildkit API.
Note: you won't need to configure caching though, because Dagger automatically caches all actions out of the box :)
> Can you do conditional execution of actions based on passed in input value/env variable?
Yes, that is supported.
> Any public roadmap of upcoming features?
For now we rely on raw Github issues, with some labels for crude prioritization. But we started using the new Github projects beta (which is a layer over issues), and plan to open that to the community as well.
Generally, we develop Dagger in the open. Even as a team, we use public Discord channels (text and voice) by default, unless there is a specific reason not to (confidential information, etc.)
Thank you for the detailed response. I appreciate you taking the time. One last question/note.
> Note: you won't need to configure caching though, because Dagger automatically caches all actions out of the box :)
Is this strictly because it's using Docker underneath and layers can be reused? If so, unless those intermediary layers are somehow pushed/pulled by the dagger github action (or any associated CI/CD tool equivalent), experience on hosting server is going to be slow.
Sidenote, around 2013 I've worked on a hacky custom container automation workflow within Jenkins for ~100 projects, and spent considerable effort in setting up policies to prune intermediary images.
Thus on certain types of workflows without any prunning a local development machine can be polluted with hundreds of images, unless the user is specifically made aware of stale images. Does/will dagger keep track of the images it builds? I think a command like git gc could make sense.
> > Note: you won't need to configure caching though, because Dagger automatically caches all actions out of the box :)
> Is this strictly because it's using Docker underneath and layers can be reused?
Not exactly: we use Buildkit under the hood, not Docker. When you run a Dagger action, it is compiled to a DAG, and run by buildkit. Each node in the DAG has content-addressed inputs. If the same node has been executed with the same inputs, buildkit will cache it. This is the same mechanism that powers caching in "docker build", but generalized to any operation.
The buildkit cache does need to be persisted between runs for this to work. It supports a variety of storage backends, including posix filesystem, a docker registry, or even proprietary key-value services like the Github storage API. If buildkit supports it, Dagger supports it.
Don't let the "docker registry" option confuse you: buildkit cache data isn't the same as docker images, so it doesn't carry the same garbage collection and tag pruning problems.
> Don't let the "docker registry" option confuse you: buildkit cache data isn't the same as docker images, so it doesn't carry the same garbage collection and tag pruning problems.
IIRC doesn't buildkit store its cache data as fake layer blobs + manifest?
I don't see how it can avoid the garbage collection and tag pruning problems since those are limitations of the registry implementation itself.
You still need to manage the size of your cache, since in theory it can grow infinitely. But it’s a different problem than managing regular Docker images, because there are no named references to worry about: just blobs that may or may not be reused in the future. The penalty for removing the “wrong” blob is a possible cache miss, not a broken image.
Dagger currently doesn’t help you remove blobs from your cache, but if/when it does, it will work the same way regardless of where the blobs are stored (except for the blob storage driver).
> In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.
Re: SBOM: Software Bill of Materials,
OSV (CloudFuzz), CycloneDX, LinkedData, ld-proofs, sigstore, and software supply chain security:
"Podman can transfer container images without a registry"
https://news.ycombinator.com/item?id=30681387
Can Dagger cache the (layer/task-merged) SBOM for all of the {CodeMeta, SEON OWL} schema.org/Thing s?
Seems to assume that all CI/CD workflows work in a single container at a time pattern. How about testing when I need to spin up an associated database container for my e2e tests. Is it possible, and just omitted from the documentation?
Not familiar with cue, but can I import/define a common action that is used across multiple jobs? For example on GitHub I get to duplicate the dependency installation/caching/build across various jobs. (yes, I'm aware that now you can makeshift on GitHub a composite action to reuse)
Can you do conditional execution of actions based on passed in input value/env variable?
Any public roadmap of upcoming features?