Hacker News new | past | comments | ask | show | jobs | submit login
GitHub Actions could be so much better (yossarian.net)
412 points by woodruffw on Sept 22, 2023 | hide | past | favorite | 230 comments



There are two types of github actions workflows you can build.

1) Program with github actions. Google "how can I send an email with github actions?" and then plug in some marketplace tool to do it. Your workflows grow to 500-1000 lines and start having all sorts of nonsense like conditionals and the YAML becomes disgusting and hard to understand. Github actions becomes a nightmare and you've invited vendor lock in.

2) Configure with github actions. Always ask yourself "can I push this YAML complexity into a script?" and do it if you can. Send an email? Yes, that can go in a script. Your workflow ends up being about 50-60 lines as a result and very rarely needs to be changed once you've set up. Github actions is suddenly fine and you rarely have to do that stupid push-debug-commit loop because you can debug the script locally.

Every time I join a new team I tell them that 1 is the way to madness and 2 is the sensible approach and they always tepidly agree with me and yet about half of the time they still do 1.

The thing is, the lack of debugging tools provided by Microsoft is also really not much of a problem if you do 2, vendor lock in is lower if you do 2, debugging is easier if you do 2 but still nobody does 2.


This is a great perspective, and one I agree with -- many of the woes associated with GitHub Actions can be eliminated by treating it just as a task substrate, and not trying to program in YAML.

At the same time, I've found that it often isn't sufficient to push everything into a proper programming language: I do sometimes (even frequently) need to use vendor-specific functionality in GHA, mark dependencies between jobs, invoke REST APIs that are already well abstracted as actions, etc. Re-implementing those things in a programming language of my choice is possible, but doesn't break the vendor dependency and is (IME) still brittle.

Essentially: the vendor lock-in value proposition for GHA is very, very strong. Convincing people that they should take option (2) means making a stronger value proposition, which is pretty hard!


No, you're right it's not necessarily a good idea to be anal about this rule. E.g. If an action is simple to use and already built I use it - I won't necessarily try to reimplement e.g. upload artifacts step in code.

Another thing I noticed is that if you do 1 sophisticated features like build caching and parallelization often becomes completely impractical whereas if you default to 2 you can probably do it with only a moderate amount of commit-push-debug.


I use yaml and gh actions to prepare the environment, define jobs and their dependencies and for git operations, everything else goes into scripts.


Option 2 also makes it easier for developers to run their builds locally, so you're essentially using the same build chain for local debugging than you do for your Test/Staging/Prod environments, instead of maintaining two different build processes.

It's not just true for GHA, but for any build server really: The build server should be a script runner that adds history, artifact management, and permissions/auditing, but should delegate the actual build process to the repository it's building.


Locally or if for some reason you need to move off of Github and have to use Jenkins or some other CI tool.


Good perspective. Unfortunately (1) is unavoidable when you're trying to automate GH itself (role assignments, tagging, etc.). But at this point, I would rather handle a lot of that manually than deal with GHA's awful debug loop.

FWIW, there's nektos/act[^1], which aims to duplicate GHA behavior locally, but I haven't tried it yet.

[^1]: https://github.com/nektos/act


> Unfortunately (1) is unavoidable when you're trying to automate GH itself (role assignments, tagging, etc.)

Can't you just use the Github API for that? The script would be triggered by the YAML, but all logic is inside the script.

But `act` is cool, I've used it for local debugging. Thing is its output is impossibly verbose, and they don't aim to support everything an action does (which is fine if you stick to (2)).


Yeah, I've done quite a bit of Github scripting via octokit and it's pretty simple. Using GHA's built-in functionality might turn a five line script into a one-liner, but I think being able to run the script directly is well worth the tradeoff.

The main thing that you can't decouple from GHA is pushing and pulling intermediate artifacts, which for some build pipelines is going to be a pretty big chunk of the logic.


How DO you debug your actions? I spend so long in the commit-action-debug-change loop it’s absurd. I agree with your point re: 2 wholeheartedly though, it makes debugging scripts so much easier too. CI should be runnable locally and GitHub actions, while supported with some tooling, still isn’t very easy to work with like that.


Using the same commit-push-debug loop you do. It just isnt painful if I do 2.


My GH Actions debugging usually devolves into `git commit -m "wtfqwehsjsidbfjdi"`


you could always do git commit -m "" --allow-empty


    git commit --amend --no-edit && git push -f


We may be splitting hairs given what this thread is going on about, but I strongly advocate for `--force-with-lease` as a sane default versus `-f` so that one does not blow away unexpectedly newer commits to the branch

The devil's in the details, etc, etc, but I think it's a _much_ more sane default, even for single-user setups/branches because accidents can happen and git DGAF


git commit -m "--allow-empty"


You can even allow empty messages.


There are ways to run GHA locally. I've tried out one or two of the tools. [0]

- [0] https://github.com/nektos/act


I tried Act at one point but couldn't get it to run the whole pipeline correctly, it might have improved since though so I'll try it out again soon


Act works pretty well to debug actions locally. It isn't perfect, but I find it handles about 90% of the write-test-repeat loop and therefore saves my teammates from dozens of tiny test PRs.


> saves my teammates from dozens of tiny test PRs

May have misread this but you know you can push to one branch and then run the action against it? Would reduce PRs if you're doing that to then check the action in master. You have to add a workflow_dispatch to the action: https://docs.github.com/en/actions/using-workflows/manually-...


Yeah most of the time that is a good way to test. There are some specific actions that aren't easily tested outside of the regular spot though. Mostly deployment related pieces due to the way our infrastructure is setup.


And if you're working on workflows that need to be in PRs, you can make a PR from your fork _to_ your fork.


I too wish I could find a nicer way than this to debug.


The main reason I aim for (2) is that I want to be able to drive my build locally if and when GitHub is down, and I want to be able to migrate away easily if I ever need to.

I think of it like this:

I write scripts (as portable as possible) to be able to build/test/sign/deploy/etc They should work locally always.

GitHub is for automating me setting up the environments where I can run those scripts and then actually running them.


Totally get what you're saying. I once switched our workflow to trigger on PRs to make testing easier. Now, I'm all about using scripts — they're just simpler to test and fix.

I recommend making these scripts cross-platform for flexibility. Use matrix: and env: to handle it. Go for Perl, JavaScript, or Python over OS shells and put file tasks in scripts to dodge path issues.

I've tried boxing these scripts into steps, but unless they're super generic for everyone, it doesn't seem worth it.


> still nobody does 2.

They don't seem to grasp how bad their setup is, and consequently are willing to understand awful programming conditions. Even punch cards were better as these people had the advantage of working with a real programming language with defined behaviour. "when exactly is this string interpolation step executed? in the anchor or when referenced? (well, it depends)". No it's black box tinkering (you might as well be prompt engineering)

the C in IaC is supposed to stand for code. Well, if you're supposed to code something you need to

   - be able to assert correctness before you commit, 
   - be able to step through the code
If the setup they give you doesn't even have these minimal requirements you're going to be in trouble regardless of how brilliant an engineer you are.

(sorry for the rant)


I agree overall, but you oversimplify the issue a bit.

> can I push this YAML complexity into a script?

- what language is the script written in?

- will developers use the same language for all those scripts?

- does it need dependencies?

- where are we going to host scripts used by multiple github actions?

- if we ended up putting those scripts in repositories, how do we update the actions once we release new version of the scripts?

- how do you track those versions?

- how much does it cost to write a separate script and maintain it versus locking us in with an external github action?

These are just the first questions that pop in my mind, but there is more. And some answers may not be that difficult, yet is still something to think about.

And I agree with the core idea (move logic outside pipeline configuration), but I can understand the tepid reaction you may get. Is not free and you compromise on some things


I think they framed it accurately and you are instead over complicating. Language for scripts is a decision that virtually every team ends up making regardless. The other questions are basically all irrelevant since the scripts and actions are both stored in repo, and therefore released together and versioned together.

I think the point about maintenance cost is valid, but the thesis of the comment that you are responding to is that the prebuilt actions are a complexity trap.


I think you are still envisioning a fundamentally incorrect approach. Build scripts for a project are part of that project, not some external thing. The scripts are stored in the repository, and pulled from the branch being built. Dependencies for your build scripts aren't any different from any other build-time dependencies for your project.


This is a whole lot of overthinking for something like

    #!/usr/bin/env bash
    set -ex

    aws send-email ...


Default to bash. If the task is too complex for bash, then use python or node. Most of these scripts aren't going to change very often once stable.


Default to babashka.


If build scripts or configuration is shared it might be one of the only times a git submodule is actually useful.


I've reached the same conclusion with Jenkins. It also helps if you ever have to port between CI systems.

A CI "special" language is almost by definition something that can't be run locally, which is really inconvenient for debugging.


I have a few open source projects that have lasted for 10+ years, and I can’t agree more with approach #2.

Ideally you want your scripting to handle of the weird gotchas of different versions of host OSes, etc. Granted my work is cross-platform so it is compounded.

So far I’ve found relying on extensive custom tooling has allowed me to handle transitions from local, to Travis, to AppVeyor, to CircleCI and now also GitHub Actions.

You really want your CI config to specify the host platform and possibly set some env vars. Then it should invoke a single CI wrapper script. Ideally this can also be run locally.


There’s a curve. Stringy, declarative DSLs have high utility when used in linear, unconditional, stateless programming contexts.

Adding state? Adding conditionals? Adding (more than a couple) procedure calls?

These concepts perform poorly without common programming tools: testing (via compilation or development runtime), static analysis, intellisense, etc etc

Imagine the curve:

X axis is (vaguely) LinesOfYaml (lines of dsl, really) Y axis is tool selection. Positive region of axis is “use a DSL”, lower region is “use a GeneralPurposeProgrammingLanguage”

The line starts at the origin, has a SMALL positive bump, than plummets downwards near vertically.

Gets it right? Tools like ocurrent (contrasted against GH actions) [1], cdk (contrasted against TF yaml) [2]

Gets it wrong? Well, see parent post. This made me so crazy at work (where seemingly everyone has been drinking the yaml dsl koolaide) that i built a local product simulator and yaml generator for their systems because “coding” against the product was so untenable.

[1] https://github.com/ocurrent/ocurrent/blob/master/doc/example... [2] https://docs.aws.amazon.com/cdk/v2/guide/getting_started.htm...


Your advice is sane and I can tell speaks from experience. Unfortunately, now that Github Actions are being exposed through Visual Studio, I fear that we are going to see an explosion of number 1, just because the process is going to be more disconnected from Github itself (no documentation or Github UI visible while working within Visual Studio).


Option 1 is required if you want to have steps on different runners, add approval processes, etc.

I always opt for option 2 where possible though.


I try to do (2), but I still run into annoyances. Like I'll write a script to do some part of my release process. But then I start a new project, and realize I need that script, so I copy it into the new repo. Then I fix a bug in that script, or ad some new functionality, and I need to go and update the script in the other repo too.

Maybe this means I should encapsulate this into an action, and check it in somewhere else. But I don't really feel like that; an action is a lot of overhead for a 20-line bash script. Not to mention that erases the lack of lock-in that the script alone gives me.

I guess I could check the script into a separate utility repo, and pull it into my other repos via git submodules? That's probably the least-bad solution. I'd still have to update the submodule refs when I make changes, but that's better than copy-pasting the scripts everywhere.


I agree, but of course all CI vendors build all their documentation and tutorials and 'best practices' 100% on the first option for lock-in and to get you to use more of their ecosystem, like expensive caching and parallel runners. Many github actions and circleci orbs could be replaced by few lines of shell script.

Independent tutorials unfortunately fall in the same bucket as they first look at official documentation to try to follow so-called best practices or just try to get their things working, and I would say also because shell scripts will seem more hacky for many people -unfairly-.


That's true for all CI services, do as little as possible in yaml, mostly just use it to start your own scripts, for the scripts use something like python or deno to cover Linux, Mac and Windows environments with the same code.


When GitHub actions came out, I felt bad about myself because I had no desire to learn their new programming language of breaking everything down into multiple small GitHub actions.

I think you explained quite well what I couldn't put my finger on last time: Building every simple workflow out of a pile of 3rd party apps creates a lot of unnecessary complexity.

Since then, I have used GitHub actions for a few projects, but mostly stayed away from re-using and combining actions (except for the obvious use cases of "check out this branch").


Github Actions basically only became usable once they started copying features from Gitlab CI. Before that it was an incomprehensible mess.

Compared to Gitlab CI, GH Actions still feels like a toy unfortuantely.


YAML is perfect for simple scenarios. But users produces with it really complex use cases.

Is it possible to write Python package that based on YAML specification produces Python API? User will code in Python and YAML will be the output.

I was working on YAML syntax for creating UI. I converted it to Python API and Im happy. For exmple, dynamic widgets in YAML were hard, in Python they are strightforward.


Absolutely agreed. Well said and I'll be stealing this explanation going forward. Hell, just local running with simplicity and ability to test is a massive win of #2, aside from just not dealing with complex YAML.


> our workflow ends up being about 50-60 lines as a result and very rarely needs to be changed once you've set up.

As in, use GitHub Actions as a YAML wrapper around bash/zsh/sh scripts?


It can be any scripting language, Python or Typescript via Deno are good choices because they have batteries-included cross-platform standard libs and are trivial to setup.

Python is actually preinstalled on Github CI runners.


1 is to build utilities for 2, IMO. It shouldn't have repository specific information inside and should be easily useable in other workflows.


Exactly, I showed here how we just write plain shell scripts. It gives you "PHP-like productivity", iterating 50 times a minute. Not one iteration every 5 minutes or 50 minutes.

https://lobste.rs/s/veoan6/github_actions_could_be_so_much_b...

Also, seamlessly interleaving shell and declarative JSON-like data -- without YAML -- is a main point of http://www.oilshell.org, and Hay

Hay Ain't YAML - https://www.oilshell.org/release/0.18.0/doc/hay.html


Github actions calling make commands is my bread and butter.


Turns out the real SaaS is Scripts as a Service.


I appreciate this perspective, however, after spending 6mo on a project that went (2) all the way, never again. CI/CD SHOULD NOT be using the same scripts you build with locally. Now, we have a commit that every dev must apply to the makefile to build locally, and if you accidentally push it, CI/CD will blow up (requiring an interactive rebase before every push). However, you can’t build locally without that commit.

I won’t go into the details on why it’s this way (build chain madness). It’s stupid and necessary.


This comment is hard to address without understanding the details of your project, but I will at least say that it doesn't mirror my experience.

Generally, I would use the same tools (e.g. ./gradlew build or docker build) to build stuff locally as on CI, and config params are typically enough to distinguish what needs to be different.

My CI scripts still tend up to be more complicated than I'd like to (due to things like caching, artifacts, code insights, triggers, etc.), but the main build logic at least is extracted.


Agreed. I want my builds reproducible. The CI binaries should be bit-for-bit identical to the locally-built ones.


The git commit, push, wait loop is terrible UX. Users deserve portable pipelines that run anywhere, including their local machines. I understand Act [1] goes some way to solving this headache but it's by and large not a true representation.

There are many pipelines you can't run locally, because they're production, for example, but there's no reason why we can't capture these workflows to run them locally at less-critical stages of development. Garden offers portable pipelines and then adds caching across your entire web of dependencies. Some of our customers see 80% or higher reductions in run times plus devs get that immediate feedback on what tests are failing or passing without pushing to git first using our Garden Workflows.

We're OSS. [2]

[1] https://github.com/nektos/act

[2] https://docs.garden.io


If folks just had actions target make or bash scripts instead of turning actions into bash scripts none of this would be an issue. Your CI/CD and your devs should all use the same targets/commands like `make release`.


I'm actually confused and scared on how often this isn't the case? What are people doing in their actions that isn't easily doable locally?


A huge portion of my actions are for things like caching or publishing artifacts, which are unique to actions itself.


I'd assume you would be able to publish and deploy locally before setting up actions. Such that those are likely targets in your build system?

Caching, I can mostly understand as unique there. Though, I think I'm living with whatever the default stuff in actions is. Slow for builds that don't happen often, of course, but not so slow that I care.


Unfortunately, my team has some builds that take ~25 min without caching and maybe 2 min with caching.

I'm still not entirely sure why it's the case, but the connection to the package registry is incredibly slow, so downloading all dependencies takes forever.


I'm fortunate that worrying about 25 minute builds just doesn't matter. The long pole on all builds is still the code review that goes with it, such that I just don't care about getting that time too low here.

That is, I am assuming that a CI build is not on the immediate dev loop, such that the person pushing it doesn't have to wait for the build before they prepare a review on it.


Why should caching in the cloud be any different than caching locally?


There isn’t any locally in GHA after the runner exits


It's the cloud. Runners are ephemeral (pretend, but still) with no persistent storage. This makes you either rebuild everything in every release stage (bad) or put artifacts in s3 or whatever (also bad) - this is especially painful for intermediate artifacts like dependency bundle caches etc.

As much as I like make it just doesn't work with the typical cloud stateless by default configs. If it works for you, your project is small enough and try to keep it this way.


Rebuilding at every stage shouldn't be too bad, with pinned dependencies. I can see problems with it, of course. That said, using a private code publishing location seems the correct path? That isn't too difficult to setup, is it?

That said, I'm still not clear on what difficulties folks are worried about. I'm also not clear I care on the mess of commits getting things working. The initial commits of getting anything working are almost always a mess. Such that worrying about that seems excessive.


> with no persistent storage

There's https://github.com/actions/cache though?


They're saying that unless you use actions, you don't get the cohesive cache and artifacts support. That replicating that in the cloud or locally is a PITA. Thus people are using the GH actions vendor specific tooling in that way.


Just run the GitHub cache action on your build directory and then run make inside it?


All the linting checks and end to end tests I don’t want to bother setting up locally for every repo I touch.


Aren't these just other targets in whatever build system you are using, though?


This is how it should be done. It was trivial to port my company's CI from Jenkins to Gitlab because we did this.

Confusion arises when developers don't realise they are using something in their local environment, though. It could be some build output that is gitignored, or some system interpreter like Python (especially needing a particular version of Python).

Luckily with something like Gitlab CI it's easi to run stuff locally in the same container as it will be run in CI.


Well… yeah?

My GitHub actions workflow consist of calls to make lint, make test, make build, etc. Everything is useable in local.

There’s just some specificities when it comes to boot the dependencies (I use a compose file in local and GitHub action services in CI, I have caching in CI, etc.) but all the flows use make.

This is not a technical problem, you’re just doing it wrong if you don’t have a Makefile or equivalent.


Yeah, it seems like we lost a lot of the "CI shouldn't be a snowflake" when we started creating teams that specialize in "DevOps" and "DevOps tools." Once something becomes a career, I think you've hit the turning point of "this thing is going to become too complicated." I see the same thing with capital-A Agile and all the career scrum masters needing something to do with their time.


Act's incompleteness has had me barking up the wrong tree many times. At this point I've temporarily abandoned using it in favor of the old cycle. I'm hoping it gets better in time!


I don't get why GitHub doesn't adopt it and make it a standard. Especially the lack of caches is annoying.


We need Terraform for build pipelines and God help you if you use Bitbucket lol


FYI garden.io’s landing page appears to be broken on iOS. It runs off the page to the right.


Thanks for flagging! We'll fix that.


I couldn't agree more with the pain of debugging a GH Actions run. The /only/ tool you have is the ability to re-run with debug on. That's it. I have so many "trash" commits trying to fix or debug a pipeline and so much of it's just throwing stuff at the wall to see if it sticks.

Very basic things, like having reusable logic, is needlessly complex or poorly documented. Once I figured out how to do it it was fairly easy but GitHub's docs were terrible for this. They made it seem like I had to publish an action to get any reusability or put it in a different repo. Turns out you can create new yaml files with reusable logic but make sure you put them in the root of the workflows folder or they won't work, go figure.

It's just incredibly painful to work on GH Actions but once you have them working they are such a joy. I really wish there was some kind of local runner or way to test your actions before committing and pushing.


> I have so many "trash" commits trying to fix or debug a pipeline and so much of it's just throwing stuff at the wall to see if it sticks.

One tool is to use draft PRs for this - you can run changes to your action YAML from the draft PR. When you are happy just squash the commits as you see fit on a "real" PR to merge the changes in without the mess.

I've found draft PRs for debugging/developing GH action logic to be pretty reasonable.


Since some action depend on the branch / tag you are on this is not always possible.


Indeed. I have sometimes made release workflows, hardcoded to the main branch.

You don't want to experiment too much on main because it dirties your commit history with 20 "Fix typo"-esque commits.

Or, if you try to emulate the main branch with a fake main branch (so you can squash it later), you're still going to have some test commits when do the find-replace back to main.

Neither are great.


Sometimes forking and using the main branch on the fork (or tags and releases) can help. And if you're on a team, nobody else needs to be aware of the noise that is you throwing trivial changes at the wall.

It gets painful if there are things you've only got on the main repo (e.g. custom runners, credentials, etc.) though.


Check this out. It doesn't do everything but it's decent https://github.com/nektos/act


If I'm fixing CI I always put it on a feature branch and do a squash merge once I'm done. Because it's never just one quick fix, it's always 3-10 commits.


> If I'm fixing CI I always put it on a feature branch and do a squash merge once I'm done. Because it's never just one quick fix, it's always 3-10 commits.

The problem is GA also does not allow you to commit a new workflow in a branch. It must first exist on your primary branch and then you may tweak it in another.


If I have to commit several trash commits, I’m happy to squash locally and then do

    git push —force-with-lease origin main


Not cool if you work in a bigger team. Especially if there are some people that are not super experienced with git, and have no idea how to fix that locally.

But if you're working alone or in a smaller team, that's perfectly fine.


I've tried running the GitHub runner image (or maybe an imitation) and it was really painful to setup and to get some things working. I just let it go after 2 days.

And it's not just Github. The others big CI platform are not really better in terms of workflow and integration.

Now I just script everything to the maximum.


This is the main reason we built Earthly: run your builds locally, and get consistency with the CI.


If only competitors could do better...

https://gitlab.com/gitlab-org/gitlab-runner/-/issues/2797


yeah... https://github.com/firecow/gitlab-ci-local is a good workaround but should be built-in. How do developers at GitLab/Github debug their workflows?


GitHub Actions is a horrible CI/CD system. You cannot run steps in parallel on the same VM; container-based workloads are a second-class citizen. The first problem means that setting up local credentials and other environment dependencies cannot be parallelized (I'm looking at you, google-github-actions/setup-gcloud, with your 1m+ runtime... grrr), the second makes it quite difficult to put a Dockerfile in a repository to represent setup of the CI environment, and have both (a) the CI rebuild the container images when the image would change, pausing workflows depending on that image until the image is rebuilt, (b) not attempting to rebuild the image when its contents did not change, and immediately running workflows inside that image, including all dependencies already installed.

No, in GitHub Actions, you will attempt to repopulate from cache on every single run. Of course, sometimes the cache isn't found, particularly because there's a 5 GB cache size limit (which cannot be enlarged, not even for payment) which cycles out FIFO. So if you go over the 5 GB cache, you might as well not have one.

I deeply miss Concourse. Too bad Pivotal dismantled the team and it barely gets even maintenance work. Parallel tasks. Custom pipeline triggers. SSH into the CI environment to debug. Run one-off tasks in CI without needing to set up a pipeline. Transfer directory contents from one job to another without needing to create artifacts (which cost money to store if you're not careful about how long they should stick around for).

GitHub Actions is a bastardized toy CI/CD system that only got popular because GitHub make it as simple as uploading a file to .github/workflows in any repository - no additional signup, no in-house ops required, everything you could want is just there. So let's be very clear about what GitHub Actions is good and what it's bad at - it's good at getting people to sign up, but it's not nearly powerful enough to be the "best" system once you start to outgrow the early features.


> Of course, sometimes the cache isn't found, particularly because there's a 5 GB cache size limit (which cannot be enlarged, not even for payment) which cycles out FIFO. So if you go over the 5 GB cache, you might as well not have one.

Looks like I can move on that "build caching mysteriously broken" issue now. Thanks for the heads up!


Concourse rocks. I didn't know the team had been dismantled, this sucks. Zito's communication style was the best.


Vito is at dagger.io now so hopefully we can expect some good stuff in the CI space there.


Sadly, Dagger doesn't get it either. It's so focused on portability between the underlying infrastructure providers, on not being the underlying infrastructure provider, and therefore it doesn't solve the real problem, which is the underlying infrastructure provider.

(a) Consider Dagger's integration with GitHub Actions: https://docs.dagger.io/cookbook#github-actions where you anyway need to run setup-node, npm ci, etc. just to start the Dagger pipeline. So Dagger isn't saving you from the having to deal with GitHub Actions' caching layer and always-blank starting point - it's unavoidable. Well, if I can't avoid it, why should I use Dagger in the first place - why not embrace it?

(b) Consider a usecase where I want to parallelize the computation onto a number of machines dynamically chosen at run-time. Maybe I want to allow a test suite to run on an increasing number of machines without needing to periodically manually increase the number of machines in a configuration file, or maybe I'm using Terraform workspaces where I want to run terraform apply for each workspace on a different VM to let the number of workspaces scale horizontally. This is fundamentally impossible with something like Dagger (also impossible in GitHub Actions) because it would require Dagger to communicate with the infrastructure provider to tell it to scale up compute to handle the parallel jobs, and then scale down once those jobs finish.

This was achievable with Concourse by having Concourse pipelines generate other Concourse pipelines, and running the underlying Concourse workers as an autoscaling Kubernetes statefulset/deployment, combined with other Kubernetes implements like cluster autoscaler.


Hi! Dagger co-founder here. I thought I’d share a few clarifying points - and acknowledge that we should explain certain aspects of Dagger’s design better, to avoid having to clarify in the first place!

> you anyway need to run setup-node, npm ci, etc. just to start the Dagger pipeline

You do need to write CI configuration to run a dagger pipeline, but it’s a very small and standardized snippet (“install dagger, the run this command”) and it typically replaces a much larger custom mess of yaml and shell scripts.

The main benefit though is that your pipeline logic is decoupled from CI altogether. You can run the same Dagger pipeline post-push (in CI) but also pre-push. Similar to running a makefile or shell script, except it’s real code.

> * Dagger isn't saving you from the having to deal with GitHub Actions' caching layer and always-blank starting point*

Dagger most definitely does do that :) We use Dagger and Github Actions ourselves, and have completely stopped using GHA’s caching system. Why bother, when Dagger caches everything automatically?

> Well, if I can't avoid it, why should I use Dagger in the first place - why not embrace it?

I think that’s your Stockholm syndrome talking. The terrible experience that is CI - the “push and pray” development loop; the drift between post-push yaml and pre-push scripts; the primitive caching system; the lack of good composition system; the total impossibility of testing your pipelines - that pain is avoidable, and you shouldn’t have to embrace it. You deserve better!

> This is fundamentally impossible with something like Dagger (also impossible in GitHub Actions) because it would require Dagger to communicate with the infrastructure provider to tell it to scale up compute to handle the parallel jobs, and then scale down once those jobs finish.

This is possible with Dagger. It certainly shouldn’t be a core feature of the engine, but the beauty of a programmable system is that you can build infinite capabilities on top of it. You do need a truly programmable system though, which Github Actions is not.

> This was achievable with Concourse by having Concourse pipelines generate other Concourse pipelines

Dagger pipelines can dynamically run new pipelines, at arbitrary depth. In other words nodes in the DAG can add more nodes at runtime.

Give vito some credit, he is an incredibly talented engineer who built a product that you love. Maybe he saw in Dagger the potential to build something that you will love too :) He blogged about his thought process here: https://dev.to/vito/why-i-joined-dagger-43gb

I will concede that Dagger’s clustering capabilities are not great yet. Which is why piggyback on CI infrastructure for that part… for now!


> Why bother, when Dagger caches everything automatically?

The fear with needing to run `npm ci` (or better, `pnpm install`) before running dagger is on the amount of time required to get this step to run. Sure, in the early days, trying out toy examples, when the only dependencies are from dagger upstream, very little time at all. But what happens when I start pulling more and more dependencies from the Node ecosystem to build the Dagger pipeline? Your documentation includes examples like pulling in `@google-cloud/run` as a dependency: https://docs.dagger.io/620941/github-google-cloud#step-3-cre... and similar for Azure: https://docs.dagger.io/620301/azure-pipelines-container-inst... . The more dependencies brought in - the longer `npm ci` is going to take on GitHub Actions. And it's pretty predictable that, in a complicated pipeline, the list of dependencies is going to get pretty big - at least a dependency per infrastructure provider we use, plus inevitably all the random Node dependencies that work their way into any Node project, like eslint, dotenv, prettier, testing dependencies... I think I have a reasonable fear that `npm ci` just for the Dagger pipeline will hit multiple minutes, and then developers who expect linting and similar short-run jobs to finish within 30 seconds are going to wonder why they're dealing with this overhead.

It's worth noting that one of Concourse's problems was, even with webhooks setup for GitHub to notify Concourse to begin a build, Concourse's design required it to dump the contents of the webhook and query the GitHub API for the same information (whether there were new commits) before starting a pipeline and cloning the repository (see: https://github.com/concourse/concourse/issues/2240 ). And that was for a CI/CD system where, for all YAML's faults, for sure one of its strengths is that it doesn't require running `npm ci`, with all its associated slowness. So please take it on faith that, if even a relatively small source of latency like that was felt in Concourse, for sure the latency from running `npm ci` will be felt, and Dagger's users (DevOps) will be put in an uncomfortable place where they need to defend the choice of Dagger from their users (developers) who go home and build a toy example on AlternateCI which runs what they need much faster.

> I will concede that Dagger’s clustering capabilities are not great yet

Herein my argument. It's not that I'm not convinced that building pipelines in a general-purpose programming language is a better approach compared to YAML, it's that building pipelines is tightly coupled with the infrastructure that runs the pipelines. One aspect of that is scaling up compute to meet the requirements dictated by the pipeline. But another aspect is that `npm ci` should not be run before submitting the pipeline code to Dagger, but after submitting the pipeline code to Dagger. Dagger should be responsible for running `npm ci`, just like Concourse was responsible for doing all the interpolation of the `((var))` syntax (i.e. you didn't need to run some kind of templating before submitting the YAML to Concourse). If Dagger is responsible for running `npm ci` (really, `pnpm install`), then it can maintain its own local pnpm store / pipeline dependency caching, which would be much faster, and overcome any shortcomings in the caching system of GitHub Actions or whatever else is triggering it.


> I think I have a reasonable fear that `npm ci` just for the Dagger pipeline will hit multiple minutes, and then developers who expect linting and similar short-run jobs to finish within 30 seconds are going to wonder why they're dealing with this overhead.

I was going to reply that you misunderstand how Dagger works: that your pipeline logic typically requires very few dependencies, if any, since it can already have dagger build, download and execute any container it needs - with free caching. So your “npm ci”, although technically not free, is a drop in the bucket and easily offset by the 2x or even 5x speedup that is typical when daggerizing an existing pipeline.

All the above is true… But I realize that the documentation you link to contradicts it. Reading these guides I understand your fear. Just know that these guides are, in that respect, giving you the wrong idea. In a more representative example, instead of importing a GCP client library, the script would have dagger run a GCP client in a container, with the appropriate input flags, config files and, if needed, artifacts. We will fix those guides accordingly, thank you for bringing this to my attention.

Also, soon Dagger will do exactly what you were expecting: it will execute the “npm ci” itself, in a container. So whether that script’s overhead is 3 seconds in the typical scenario I described to you, or a few minutes in the scary scenario you described to me: either way it will get cached, and either way it will have no host dependency on npm or any other language tooling: just the dagger CLI.


> We will fix those guides accordingly, thank you for bringing this to my attention.

Great :D

I read a lot more of the documentation this morning, in general I really like what I see, but the clustering bit is the key missing point for me at the moment. My current employer produces an Electron desktop application - we need to run 90% of the tests on Linux (most cost-effective), and 5% each on a Windows and macOS machine. I see the Dagger CLI runs on both macOS and Windows, but the architecture as it stands today expects that the Dagger pipeline will run all its workloads on the same VM. If I want to run pipeline tasks on other VMs, I can do it from Dagger, treating the Dagger Engine as the orchestration layer which makes some kind of an IaaS or PaaS call to schedule the workload on a separate macOS or Windows VM, outside the Dagger architecture. But... today I expect that workload scheduling to happen within the CI/CD architecture (it's trivially expressible within GitHub Actions, particularly as GitHub Actions hosts both Windows and macOS runners), and needing to hoist it outside the CI/CD architecture in order to use Dagger is a needless complication.


Less Pivotal and more VMWare post acquisition dismantling I'd say. There was a lot of love internally for Concourse (I left before the acquisition though).

SSH debugging and one off tasks absolutely dreamy.


Even SourceHut’s Spartan CI supports SSH debugging


Have you seen Tekton? (https://tekton.dev/)


solatic -- I have an existing solution that accounts for a lot of these issues you bring up. Would it be possible to pick your brain? Can you share your email or shoot me an email? lawnchair@lawnchair.net.


Try the sourcehut build server


It's a public alpha and clearly targeted for small hobbyists / FOSS work as a result. I'm looking for something where the creator has more confidence in its reliability...


> Because GitHub fails to distinguish between fork and non-fork SHA references, forks can bypass security settings on GitHub Actions that would otherwise restrict actions to only “trusted” sources (such as GitHub themselves or the repository’s own organization).

How is this not resolved?

Easily bypassing security controls is a major security issue.

Yes, you need to convince someone to use your SHA, but social engineering is usually the easy part.


I mean, it's embarrassing how bad it is.

- (unrelated) build failures just randomly notifies the latest maintainer who happened to merge something? (Imagine you finding this out when your newly added maintainer pings you on Matrix and tells you 1: about this behavior, and 2: that your update/builds have been failing for a week without you knowing?!?!)

- The cache action is horribly, trivially observably broken with seemingly no maintainer?

- Can't tail the build log of a step if their UI poops or your tab unloaded after it started?

- The complete lack of abstraction that might actually make workflows portable across worker nodes? pfft.

- the default image is a travesty. I thought it was obnoxious how bloated it was and then I started digging in and realizing "Oh, some Microsoftie that didn't know Linux was put in charge of this". (saying this as a former FTE that knows). And there's no effort to allow a more slimmed down image for folks that, you know, use Nix? Or even just Docker?

I'm in the process of migrating off GitHub and it's mostly because Actions shocked me to my senses. Too bad MS can't figure out how to retain virtually any Linux talent, and not just their cuffed-acqui-hires or Windows-devs-cosplaying. Even the well compensated ones head for the door.

And I'll just say, I don't program in YAML because YAML is a disgrace wrought upon us by Go+Yaml enthusiasts that don't know any better fueled by senseless VC money shoveled at an overall ecosystem incognizant of modern, actually useful technology.

edit: removing some of the blatantly identifying former-FTE trauma. Knowing what I know I should sell all my MSFT, but the market thinks differently.


Wasn't it obvious that something along these lines would happen when Microsoft took over Github?


I would like to give a strong recommendation for https://pre-commit.ci (no relation, just a happy user).

The idea is that you write hooks (or use pre-existing ones) that check (and fix, where possible) your code before you commit, and then once you've pushed, the CI will check again, including automatically committing fixes if necessary.

Anyway, it works brilliantly - unbelievably fast thanks to good automatic caching, and using the exact same setup on your local development machine to in CI negates a lot of these debugging problems. It's free for open source.

It only does checks, not things like automatic releases, but it does those them well.


This combined with tox is great for Python projects in particular. Tox automates creating virtualenvs for the right Python versions you want to test with then running your tests in them. It can also run static checks by issuing `skip_install = True` because you want to test the source code itself. You just need to run this in a Python container that has tox installed as a globally available tool and all versions of Python available in it (like https://github.com/georgek/docker-python-multiversion <-- not maintained but easy to update).

Here's some boilerplate to do all that:

    [tox]
    envlist = py{310,311}

    [testenv]
    passenv =
            PIP_CACHE_DIR
    deps = coverage   # deps only for tox
    extras = testing  # testing extras include pytest
    commands = pytest ...

    [testenv:check]
    passenv =
            PIP_CACHE_DIR
            PRE_COMMIT_HOME
    skip_install = True
    deps = pre-commit
    commands = pre-commit run --all-files --show-diff-on-failure


Been through that git commit; git push; repeat cycle too much as well until i discovered https://github.com/mxschmitt/action-tmate which gives a shell in between steps, which does not help with all problems but sure it's makes it less painful at times.


Thanks, that looks super useful. :)


My personal wish is for the ability to attach HTML reports the the action runs without having to use the current actions/upload-artifact etc.

Particularly for test builds, I very often just want to quickly view the output HTML report. The current approaches I am familiar with are the aforementioned upload-artifact, or using GH pages, but GH pages is not great when you have multiple different reporting output for the same repo, or you wanna quickly view historical reporting rather than latest.

I'd love a simple "attach-report" action that just put a link to an HTML report on the job summary, and clicking on it renders the html.

Other automated CI/CD style systems have much richer support out of the box for dealing with HTML report capturing and viewing.


Granted, it's not HTML, but you can append Markdown output directly to $GITHUB_STEP_SUMMARY without dealing with artifact uploads etc.


Thanks, I'm aware of this option too, but if you say have an HTML report with the results of 200 tests, no one has time to write some logic to parse the results to a markdown summary frankly. Just give me the link!

Similar too if the reporting captures video/screenshots of a given bug - the original report UI is far easier to deal with. Many test frameworks already natively put out an HTML report UI etc, it's just harder to get to than it should be with GH actions today.


> an HTML report with the results of 200 tests, no one has time to write some logic to parse the results to a markdown summary frankly.

Not sure how GHA treats it, but markdown is a subset of HTML so by default all HTML pages are valid Markdown documents. No conversion required.


They just don’t show half of the stuff inside correctly.


I mean GH Actions is basically a re-brand of Microsoft's "Azure Pipelines". As somebody who used all previous incarnations of TFS/VSTS/AzDO build and release pipelines: they are not good at this. This is not a team with a record of success. That Azure Pipelines is moderately usable only happened because they failed literally every other approach they tried.

There was a project to allow you to run the pipelines locally so you could do the edit-run-debug loop on your own private environment without committing. It was, of course, canned.

https://github.com/microsoft/azure-pipelines-agent/pull/2687...

However, there are tools to improve QOL. For example:

https://marketplace.visualstudio.com/items?itemName=ms-azure...

A vscode extension that's syntax-aware.

Now, I'll be a bit controversial: if they'd used XML instead of YAML, you could have an xmlns declaration up-top that would give you validation in most decent code editors without user intervention. XML is awful, but it has a lot of useful features that we gave up when we threw the baby out with the bathwater.


> I mean GH Actions is basically a re-brand of Microsoft's "Azure Pipelines". As somebody who used all previous incarnations of TFS/VSTS/AzDO build and release pipelines: they are not good at this. This is not a team with a record of success. That Azure Pipelines is moderately usable only happened because they failed literally every other approach they tried.

I was under the impression (which might be wrong!) that GHA was an independent project within GitHub that was well underway before the acquisition. Are you saying that GHA was rebuilt on top of AzP, that it's just a relabeling of AzP, or something else?

(I have no particular dog in it being one way or the other, but I'm curious about the history here.)


They share a lot of code. My understanding is that it was an MS project first, but I might have that backwards.

> GitHub Workflows execute on runners. The runner code is essentially a fork of the Azure Pipelines code, so it's very similar. It's also cross-platform and you can also use hosted or self-hosted runners.

https://learn.microsoft.com/en-us/dotnet/architecture/devops...


Thanks for the link -- I knew that GHA workflow runs ran on Azure, but I didn't know the workflow runner itself was a fork of Azure's runner/instrumentor. That's interesting context!


if you look at the source of the GHA runner, you can see where they regex-replace all references to Azure Pipelines with GitHub Actions lol


the original GHA implementation was shitcanned


Is that a nice way of saying we are still stuck with the original GHA implementation


> GH Actions is basically a re-brand of Microsoft's "Azure Pipelines"

Probably even moreso than most people think - a large portion of the AzDo team got moved over after the acquisition to work on GitHub Actions/Projects.


You had me up til that last paragraph.

Call me crazy but xml just hurts my eyes. I'll always take a nicely formatted yaml doc with all the pains that come with it over the horrors of angle brackets and camel case.


XML syntax is bad, but yaml doesn't have a proper blessed batteries-included built-in schema language afaik. A well formed XML with an xmlns will have validation and autocomplete in a good editor thanks to this. JSON can do it too. Never seen it done with yaml.

It's a pretty big frustration in a language that is so slow to test -- commit push run wait find the error oh I made a typo. Better code-time validation fixes this but the tooling for that in yaml is weak.


This is why I want projects like Earthly to succeed: https://github.com/earthly/earthly

I want to be able to run my all of CI workflows on my local machine.

I agree with other commenters that most of what CI does should be abstracted out into scripts or other non-CI tools. Unfortunately it's not easy to do that for large pre-existing CI setups, especially if a different team is the one maintaining your CI workflows.


Isn't Earthly out of the picture now? https://earthly.dev/blog/shutting-down-earthly-ci/


As @vladaionescu mentioned, Earthly is alive and well. We stopped the CI product exactly so we could focus on use cases like making Earthly in GitHub Actions better.


Nope. Only Earthly CI is.


It's hard for me to articulate exactly why but I really dislike the information layout in GH Actions. Compared to other CI/CD tools, it's harder to debug problems or get a sense of the state of a pipeline. Part of the problem is the number of clicks it takes to look at various step logs. Some of the controls are unintuitive, but maybe that's just me.


This could be said for a lot of GitHub. The fact that you need to open a disclosure menu to edit a PR is wild.


Their comment system just doesn’t feel right compared to Bitbucket’s either - I think it’s the distinction between comments that are part of a review, versus comments that are ‘just’ comments


A linter can catch some obvious errors: https://github.com/rhysd/actionlint. But yes, I agree, it's not a fun debugging experience.


Yep, actionlint is great! I've used it successfully both to lint my own workflows, and to lint third-party workflows for (basic) security issues.

Unfortunately, it can't lint actions themselves, only workflows that call actions[1]. This is a substantial deficiency, especially for users (like me) who write and maintain a decent number of actions.

[1]: https://github.com/rhysd/actionlint/issues/46


act would also be helpful here in terms of debugging and development of Actions workflows.

https://github.com/nektos/act


The single thing that I most detest about GitHub Actions is how they, by design, completely miss the point of containers. Having "actions" that are just "install language foo" is barely better than just publishing shell scripts or even the days of Travis.

As a result, few GitHub workflows benefit from the immutability or reproducibility guarantees that can be provided by containers, and most workflows I interact with spend more than half their time running ridiculous installer scripts.


Are you unaware that GH Actions can run jobs in containers??

https://docs.github.com/en/actions/using-jobs/running-jobs-i...


Founder of Earthly here - besides the build debugging difficulty, I would add that modern CI/CD repeats a lot of steps: downloading, installing and configuring dependencies, making things much slower than they should be.

We built Earthly [1] to tackle these two problems specifically. We're open-source (10k stars).

[1]: https://earthly.dev


We recently converted from Jenkins to GitHub Actions (complete rewrite) build pipeline. Jenkins had its issues and warts, but holy Beelzebub what a monster GitHub Actions is. It pure downright evil and creates all sort of headaches. Opaque, zero clues on what broke. Glacial slow even if you throw the biggest instance at it. It’s all enterprise and 1000s of repositories big company.

I’m spending more time finding bugs in the build pipeline than the time I spend doing other things. And it’s all GitHub actions fault. I hate it with a passion.


Struggling with this today, and numerous other days. It's so bad. Stop trying to build an operating system out of YAML. I'll always use and recommend Gitlab from now on.

And what the earth is an "actions" anyway? How on earth is simple bash functions not just as suitable here? Instead you have some weird YAML scripting language. It's so bad. Why. Somebody please tell me. I'm losing my mind. It is a good reflection of the rest of the world though and why the worlds infrastructure is crumbling in many places.


Executing bash statements/scripts/functions is the thing I struggle least with in GH actions personally, it's remarkably easy to execute shell steps. If you really want your entire build to be a shellscript the action executes, you can do just that with very little YAML.


Except it doesn’t execute quite like in bash, and every single command is it’s own little island.

So exporting an environment variable doesn’t work for example.


What are you talking about? You can put a whole Bash script into a single step.


The feature I'd love is history.

I'd like to know if my builds are getting slower over time. I'd be able to detect flaky tests automatically.

It seems basic, but I know third-party solutions exist for this. It's out of the box in Circle CI and Buildkite and feels like it should be here.


I was with a team that moved from Jenkins to Actions. Jenkins has a lot of issues, but if you knew what you were doing, at least it was easy to see the history and at a glance know the project pipelines were green/healthy. It was super useful for seeing where flake was in tests or observing changes to coverage over time.

After the team migrating to actions swept though, we ran without this basic stuff for years because nobody had the time to figure out what 3rd party tool to use, money to pay for it, or capacity to re-implement the functionality. It made dealing with test failures or flake that crept in awful.


I'm really not sure if we are using CI correctly. Sometime i think all those CI Templates should be replaced by just one executable that does everything, like a modern alternative to Makefiles (and there are a lot of build tools).

So the CI pipeline would only call the build tool, like "./build containers push-to-registry release:1.0.0 run-tests"

Those scripts can be tested and debugged everywhere. Also migrating to a different CI platform would be really easy.


https://dagger.io/

may be an alternative. You can run it in GitHub actions somehow as well.


Thanks for the tip, I should try that.

I was using https://nuke.build for one or two projects. It's in .NET and probably not so useful if you need to build non .net code bases. Stopped using it because some of their code is proprietary, and it's not as useful as I thought.


How does that integrate in with every else's tooling? At least in enterprise there are a ton of things like "SCA, SBOM, Compliance report, etc" that tie in with plugins and such.

Also, why would a large commercial CI want to have their environment too open?


They don’t want to have their environments too open, that’s why we are at this point.

Those enterprise tools i know (like sonarqube for example) are mostly cli tools in their core. Can be integrated everywhere.


While this is totally possible you lose a lot of the things that make GitHub Actions nice. Nice logs, annotations, individual steps that went wrong. Being able to see the status at a glance. Seeing 10 steps where 1 fails and 13 pass is nice.


All those things the custom build tool would need to handle. It needs more or less the same functionality as GitHub actions and the marketplace. It would also need to integrate with different CI platforms to some extent for reporting and user interactions.


A tight feedback loop is everything. I will not use tools that require long waits between tests. I want to edit, test, wait a matter of seconds to see the results.

Seeing people push hundreds of commits to their CI pipeline trying to fix it by trial and error makes me cringe. I do not understand how some people accept such a workflow without thinking it could be better.

When I was planning the CI set up at my job I made sure it could all be run locally and hammered this point home to developers. CI isn't magic, it's just some other computer running your shit.

We use Gitlab CI which runs your script in a container. It's trivial to run the container locally with docker to test exactly what will happen in CI (although you do have to make a fresh clone of your repo for the mount, I thought about scripting this but in practice it was never needed that much).

GitHub Actions seems much harder to test locally. There seems to be some projects to do it but they are too heavy for me to want to install. For this reason alone I don't like Actions.


Can I trigger a workflow from a different workflow yet? That seems very basic, and last time I looked I had to create a token that gives read/write access to every repo in every organization to do this. And then it didn't work because the docs for the trigger API is apparently wrong.


As long as they’re in the same repository, yeah.


Different repo same org


I created https://github.com/typesafegithub/github-workflows-kt and https://github.com/typesafegithub/github-actions-typing to address some of the mentioned issues. Some problems remain unsolved because it's just a Kotlin DSL that generates the YAML, but it does catch some issues early. Related blog post: https://dev.to/jmfayard/github-actions-a-new-hope-in-yaml-wa...


People really need to stop writing scripts in yaml. Not to be dismissive but the issues outlined in this article, aside from that security footgun, are non-issues as soon as you start using github actions only for what it should be used - to invoke your scripts in parallel in response to an event. We've banned all but actions/checkout in our org and have a very healthy dev experience as a result. People are naturally guided towards writing scripts that they can run locally instead of for github actions specifically.

Some real issues with github actions:

1. MacOS runners have ridiculously inconsistent IO performance causing 200x slowdown in some cases

2. Getting charged to the nearest minute is asinine and punishes highly parallelised (fast) workloads

3. GitHub OIDC endpoints constantly timing out


Yaml is the worst possible programming language, beating out even ant’s xml abomination.

Unfortunately most of the work of GitHub actions is configuring the runners and setup data, which requires you to use lots of yaml to invoke unvetted public repositories (actions) written in JavaScript to do what is easier done as shell scripts, and then use shell scripts to workaround the limitations of actions.

State is impossible, data is even more impossible, and caching is completely broken. And if you use your own runners, GitHub actions don’t even clean up their own garbage. I suspect it’s the same on public runners, but just statistically less likely to affect you, but a potential security vulnerability.

I’ve been working 2 things:

1. A front end that crafts GHA workflow yamls from a sane configuration with common tasks baked in.

2. An event and reporting system so you can see what is happening during a workflow — not after — without having to scroll through GitHub’s hideous log output in html or download and unzip the full log.


GitLab’s CI is so much better than GH it’s not even funny


Most of my problems would vanish if there was an official way to run workflows locally, something like:

shell: run_workflow name=MyJob in=MyWorkflow.yml params={}


I guess you have to ask yourself what's in MyWorkflow.yml that can't be in a script and run locally?


Well, that's why the word "official" is in there. Obviously you could mock up the entire github actions scaffolding locally and have it inject environment variables and support all the actions and everything, but keeping that up to date will be a nightmare if you're doing it on your own.

Obviously you could put the whole CI/CD into a bash script and not use any features or functionality provided by github actions but there are plenty of nice things that it does and it would be a shame to not use any of them.


I didn't mention it in my reply but I was referring to the complaint in the article that the author wants to be able to debug locally without having to push changes to GitHub first. The top comment by MoreQARespect and others highlight the benefits of scripting as much as possible so that processes like builds and cutting releases can be run and tested locally.

> Obviously you could mock up the entire github actions scaffolding locally

Not sure if anyone's advocating for implementing GA entirely. You can at least automate project-specific bits as much as possible using scripts, and then have the CI environment use the same automation. That allows for more local debugging than overusing GA.


> And so, the question: why are there so few of them? Here is just a smattering of the official actions that don’t exist[...]

This part of the article half triggered me and half made me laugh. I was recently on a GitHub webinar to listen to the pitch for CoPilot, but of course they also talked about Actions and GHAS. Anyway, during the overview of Actions, the presenters made a comment about how Actions was "backed by the community." It felt to me like a glaring admission that the product isn't mature. But in true Microsoft fashion, they charge you like it is.


The best approach is to develop completely local build scripts and add a few options to plug GitHub Actions' exclusive features into them. Make a build script that can take a cache folder's path as argument. Make it take a test report output path as another argument. etc.

So your GHA workflow is: set up secrets (AWS and co.), set up Python, download cache to path X if possible, run the build in X and send reports to Y, publish reports in Y. And put as much work as possible inside the build script. We have a whole Python project for that.

This is truly painless after the first development effort.


gitlab-runner allows you to run scripts locally for GitLab, but even that is lacking (any env secrets stored in GitLab? Script is going to fail).

I really, really wish there was a way to clone the GitHub Actions or GitLab runner environment, spin up a runner locally and test. That would shave off 95% of the wait time


This is a real problem. I think the problem stems from the unasserted assumption that declarative YAML is not really coding/debugging. The root problem is the assumption that infrastructure specifications are second-class citizens when it comes to managing a software ecosystem. Due to this, one rarely sees any sort of strong tooling support for creating, updating, debugging and extending various infrastructure activities. The unpleasant truth is that we live in the dark primitive days of infrastructure management.


As someone who has recently sunk a considerable amount of time in modernizing some pipelines, workflows, what have you, I feel this deep in my bones. The commit-test-fail-recommit test loop is painful and something I’ve experienced all too well. Trying to work around product limitations brought about by whoever concocted the system on GitHub’s side to save customers from themselves and declared it good is painful. Trying to debug simple issues is painful. I’m glad I’m not alone.


My experience with GitHub Actions is mostly managed by way of certain Azure Portal interactions these days.

When I create a new Function app, it gives me the opportunity to enable GitHub integration. The experience for this is flawless, IMO. You provide your GH credentials, select the org/repo/branch, and then it will create the workflow file for you and push it automatically. It will also update the secrets in your repository settings to match what Azure expects on its end for deployment.

By the time you get to look at your GitHub repo, the action is already running and will 100% complete successfully if you followed a standard/default project structure. The automatically-generated workflow files aren't perfect, but they're so close that it becomes trivial to tweak for additional build args or project arrangements. Just getting the secrets & related boilerplate configured makes the difference between me doing it right now vs maybe never. The consequence of always having proper CI/CD from day zero, even for the most trivial projects, seems profound to me.

There exist some really happy paths now wrt GH actions, but you gotta be willing to get pretty hammered on the Microsoft koolaid to explore them.


One serious problem not mentioned in the article is the feature that looks for magic strings in the output and treats them as special commands [0], meaning that you can't safely output anything that's not completely trusted or carefully filtered.

At least it's deprecated now, but they apparently decided to postpone the removal because too many people are still using it. Maybe if we all do our part in migrating to the new way [1], they will eventually close this giant security hole.

[0] https://github.blog/changelog/2022-10-11-github-actions-depr...

[1] https://docs.github.com/en/actions/using-workflows/workflow-...


We moved our CI to this orchestrator called Netflix Conductor. It’s highly customized internally with workers capable of running different kind of tasks etc. But since we did this, CI has been a wonderful experience. Things run smoothly and history can be tracked and easy to build variants by modifying or creating new workflows. We aim to open source this extension soon.


I think most developers don't need to or don't have the time to dive deeply into their build systems, but having had to do so as part of a project to migrate to Gitea (which turned into a larger project to enhance/fix its act-based github actions clone), my take is:

1. Github actions embodies the wrong abstractions. I'm not 100% sure what the right ones would be but it feels all wrong.

2. It has the feel of an unfinished/undocumented project. Someone's project to get a promotion perhaps that they handed off to interns after securing said promotion. A product that forces me to expend brain cycles reverse engineering it's behavior just makes me want to design it out of my life. I may as well develop my own thing : at least I'll understand that.

3. A build system that can only be run in a proprietary SaaS environment is a terrible idea. I mean, great idea for locking in customers I suppose.


Another tool that tackles GitHub Action debugging head on: https://www.ci-debugger.io/

Wrap your GH Actions in a debugger breakpoint and connect into the live broken GH Action to inspect the machine, re-run commands etc..

At it's heart - realtime debugging for GitHub actions.


It's not really context switching. Author is debugging github actions, thats the context in which a few window changes seem minor. Context switching would be to get dragged to meeting, or getting asked by a colleague to explain how to set up github actions for their project.


I am thinking about porting my ci/cd pipelines from Jenkins to GitHub actions to leverage autoscaling self-hosted runners, which I don't think there is an exact equivalent for Jenkins nodes, at least not one that is already implemented where I work.

The pipelines though, are all configured using ansible. Applying the same principle I did to Jenkins, using it strictly for execution automation, logs, permission and the likes, and leaving the heavy lifting to a more trusty tool, I think I can avoid both major vendor lock-in and idiosyncrasies and nonsenses from the language.

It boggles my mind how unrefined the GitHub actions flow is. It doesn't offer a good experience by a long shot.


One thing that actually makes this slightly less horrible is the Gh tool. You can use gh run watch to tail your logs in the terminal and at least don't have to click through a million things just to see output.


I wrote a command-line tool that streamlines retrieving test results from GitHub Actions even further. Essentially parses jest/tsc/eslint errors in GHA jobs' logs for the current's branch PR. https://github.com/raine/ghtool


Do you know a way to stream the gh run watch output?


You’d think they could at least train GitHub Copilot to test run the workflow and point out immediate issues.

We have to add an action to kill duplicate runs on triggers.

The caching is marginal, at least on Windows runners because it takes forever to expand the tar ball. Not even sure from their docs / issues on the cache action if they did finally move it to use GNU tar.

Having some way to get an interactive shell on fail would be a big step up for debugging issues. Otherwise we are back to print debugging in the actions or uploading artifacts at each step so we can inspect them.


My number one complaint is that they still don't have managed ARM runners. Considering how overpriced runners are (compared to cloud instances), I'm sure they would make a ton of money here.

My choices here are either to use a different CI solution (but I quite like Actions, and they are better integrated in the Github UI than alternatives), or to use custom self-hosted autoscaling runners (but there doesn't appear to be a good solution available at the moment, and I definitely don't want to managed infra for CI.)


Check out https://dime.run/ they manage workers for you


It's clear we've all wrestled with GitHub actions and Jenkins in some shape or form. Debugging them can feel like being in an escape room, without any clues.

I work at Trunk.io and we have a tool called CI Debugger. It helps you understand your workflows. Instead of getting lost in logfiles and commits, you can pinpoint where things are bogging down or going haywire. It can give you a clearer picture of what's happening under the hood. Like having a heat map for the CI/CD processes.


One of my biggest gripes with Actions is that it doesn't allow you to trigger workflows for a specific SHA. Only the branch HEAD.

This is pretty much possible in all other CIs.


You can pass a sha into your workflow and then check it out.


This is my normal CI/CD workflow -- even for GitLab. Debugging why CI/CD doesn't work is one of the worst headaches you can imagine.


The current state of GA is much better than Jenkins. I don't like yaml, but this is still better than mucking around Jenkins configs. Fin


Github Actions has been a pleasant introduction to CI/CD for me. I used to build locally and then git push and/or rsync everything directly to a VPS (Digitalocean Droplets in my case). But for collaboration, this workflow breaks down. Then, using Github as the "origin" repository becomes compelling, and Github Actions fit very nicely with that workflow.


re: `pull_request_target` and fork PRs - You can get around the !Even More Fun! limitations this has (GITHUB_TOKEN being read-only) by having a workflow called by another workflow.

For example, I wrote a set of workflows[1] to automatically apply a label after 2 'review approvals'.

First, a 'dummy' workflow triggering off PRs (in that context) that uploads an artifact: the PR number

Second, the 'real' workflow that runs in the context of the actual repository, set to be `on: workflow_run: workflows: - Final Review Labeler` - this pulls in the artifact, runs a GraphQL query, and applies the label if applicable.

[1]: https://github.com/goonstation/goonstation/blob/master/.gith...


I wish individual actions could be specified as Dockerfiles instead of yaml files. That would address a lot of issues that I've run into around build environments being slightly different than locally.

Yes, I can wrap it with a simple docker action. It would be cool if that were automatic.


Most of my frustrations with GHA arise when doing something useful conflicts with someone’s idea of security. For example, branch protection rules intended to stop devs from yoloing commits blocking me from pushing a version bump commit during a release workflow.


The lack of exceptions for actions is bizarre. And then they build repository rules or whatever, and made the same mistake again! You can at least exclude github apps now, but then you need to run all your repo actions with an app installtion key :/


My biggest pain point is not being able to do proper ternary statements in workflows. And passing arrays to called workflows. My ‘runs-on’ statements need more than one label!

You can fake it by passing a stringified JSON object, but that’s bizarre.


I wish Github Actions were Lua instead of YAML. One could use a restricted environment populated with predefined functions, a restricted set of modules and all the advantages of syntax checking and language help.


Congratulations, you just reinvented Jenkins but using Lua instead of Groovy.


Not a bad thing. Also Lua is both smaller, saner and more performant than Groovy. (with LuaJIT)


one thing to be aware of, if you use windows, and submodules in a repo, ci will fail to clone over 50% of the time (with internal retries) with early eof errors

it's apparently a fixed problem in upstream openssl beta that isnt bundled yet, but i think a custom runner would be better too (but not an option at my org)

still, some days it gets in a rut and quickly fails for 10+ runs all billed at full rate. the entire eng group is severely annoyed by it because we cross test all oses on each commit so its becoming an expensive problem. basically 3x the billing on windows ci


I feel this pain. So much that now, when I need to set up new GitHub actions pipeline, I always create a new branch to push there until it's working. And then squash merge as a single commit.


For validating YAML I love CUE. It takes some work to define the schema — would def be nice if GitHub provided that.

I found Nektos act pretty usable for local testing when I was doing a lot of GA work.


The VSCode extension is quite good and solves some of the problems mentioned concerning detecting errors without needing to go through the commit-push-run cycle.


Concourse has it right: You can ssh into a failed container, and debug in the exact environment where the failure occurred.

I don’t understand why people settle for less.


Very good points, especially re. the security footguns


Just give me yaml anchors!

I'm so sick of having to duplicate pieces of my actions constantly


What's the technology behind the blog? It's clean, minimal and beautiful.


It's a Jekyll site. I originally built it off of a popular theme (back in 2014 or so), but these days the theme is just a custom thing that I've cobbled together.

(So, in a sense, there's no real technology "behind" it. It's just Markdown with a little HTML templating, with Jekyll as the SSG.)


Thanks for the reply. Is the code available by any chance?


Unfortunately not -- it used to be in a public repo, but I also track drafts and other things now that I don't want to be public until they're ready.

I'll see if I can clean it up into a public repository, but no guarantees :-)


Right click -> view page source.


Jekyll


I don't see how you love something that makes you jump through these hoops:

> In this particular case, it took me 4 separate commits (and 4 failed releases) to debug the various small errors I made: not using ${{ ... }}5 where I needed to, forgetting a needs: relationship, &c


We use Github Actions and we just don't have any issues with it outside the first time we set it up for each repo. Then we make 100s of commits a week and it does its thing and our work goes live a few seconds later. That's why I love it.

Could things be better? Of course; that's how software is--and this should resonate with most folks on this site. But just because some product isn't infallible doesn't mean we can't love it too.


>Then we make 100s of commits a week and it does its thing and our work goes live a few seconds later.

Wow, computers doing what they're supposed to do. Pretty impressive ...

The UX on GH actions is crap, other CI/CD solutions give you way more control and tooling. GH really needs to step up their game on this one.


I am not sure why you’re being so snarky, but I did address that it could be better.

The thing for us is that we’re already paying for GitHub and Actions are included. I know, for a fact, GH is/was working with stakeholders using their software to get feedback on what could be better. So they are trying!

I’m not here being a fanboy by any means; I’m just saying it does what I need it to do and I’m an active user who is pleased with the service provided.

YMMV.


> we just don't have any issues with it outside the first time we set it up for each repo

But that's exactly what a lot of commenters complain about.


There's tools like Act[0] that tries to solve this, but this has been an issue with CI systems since they were invented.

[0] https://github.com/nektos/act


Also reused by gitea for their CI runner. I was quite impressed by that feat, pretty neat.

https://gitea.com/gitea/act_runner


Isn't that just programming? That's not much different than saying you forgot a bracket and had to make another commit to make it work. Granted, it would be nice if they had some linter.


Generally programming these days does not require you to submit build jobs or jump through other hoops to catch trivial mistakes. Although it depends on what you do, embedded developers and game programmers writing for consoles might disagree.

It really feels like a throwback to the punch card era.

I too don't enjoy writing CI scripts (despite knowing Linux administration and shell scripting quite well), it always takes an inordinate amount of time, but it also saves much more over the long term.


This is a usecase where chatgpt works great! I usually pass this kind of ops DSL in gpt and it will find my mistakes. Github actions and graphanaQL code, oof, they just click for me.


I'm not gonna post my employer's proprietary code into ChatGPT.


Don't fixate on the details, fixate on the larger point. your specific situation of having code self-hosted and an employer that cares deeply about their proprietary code is a detail beside the point. LLMs shine at fixing up DSLs that you're unfamiliar with. chatgpt to llm is kleenex to facial tissue.


Yeah, I wouldn’t want to use any automation that you can’t also easily and quickly test locally.


Fun fact: Microsoft had a plan to provide that!

They canned it.

https://github.com/microsoft/azure-pipelines-agent/pull/2687...


Act can do most of it locally

https://github.com/nektos/act


More official or blessed packages for sure. I agree with that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: