Hacker News new | past | comments | ask | show | jobs | submit login
Write Gitlab CI Pipelines in Python Code (gitlab.com/dbsystel)
127 points by DrSarez on April 29, 2021 | hide | past | favorite | 95 comments



Another step in the endless cycle of configuration vs code.

It's not an accident that we are in a deep cycle of constrained configuration languages (yaml/json/etc etc). People chose to go there because before that we had a cycle of using programming languages and people hated it, for all sorts of good reasons.

Now I see we are on the way back into adding wrappers around the static config files, to turn them back into programming languages. This is happening all over the place - because guess what, it turns out, people hate static config files too, for all sorts of good reasons.

I am not sure if we will ever reach a compromise here or if we are just going to have to put up with endless change and churn because nobody is ever happy or at least willing to just settle for things that are "ok" but not "perfect".


The big difference is that config files are, at some point after interpolation and templating, static data structures composed of primitive data types.

Further, these configs usually act as declarative languages, giving directives to pipelines, controllers, etc., typically with idempotent side-effects.

This is just good software engineering. You have some static data structure which describes some result you want to exist in the world, and your pipeline/program makes that outcome manifest. This is the essence of boundary programming/hexagonal architecture.

https://www.destroyallsoftware.com/talks/boundaries


Exactly! And this is perfectly achievable with high-level expressive languages too.

The important part is that the interface has to have the same semantics as a config file: atomic/transactional. For example the builder pattern does this. You can enjoy all the expressiveness and power of the high-level language, but at some point you create the final config/pipeline/job-graph, call whatever needs to be called, and the provider of the interface needs to validate that.

That's why the semantics of Terraform is better than Ansible's.


Yup. My main gripe with Ansible is exactly this. Too much mutable state, too imperative, too much reliance on "gathering facts". There's no singular source of truth. It tries to be declarative, but every Ansible deployment I've maintained (granted, not many) always accretes corner cases and config drift. Maybe that's just the nature of the domain - mutating live systems will always be more imperative than (re)deploying the whole thing from scratch.

I'd much rather use python/jinja/gomplate/what-have-you, do all my fancy expressive stuff with code I can check in, and get a single snapshot of plain, structured text (which I can also check in). Not to mention locking down your invariants (I can't recall how many times I've gotten bit by interpolating docker image uris with the wrong TAG). It's actually pushed me towards first Make and then later python cli tools (cause Makefile syntax is...ugh) to generate configs, rather than relying on any runtime business logic, cause it's just so much saner, traceable, auditable, etc.


... again, exactly. I just use Bash, sacrifice the usual few virgins at certain midnights per year and get on with my life. It's much faster to just type it, easier to debug, no problems with python modules, no YAML.

Sure, it is still Bash, it still requires selling your soul, but it's not like I was going to Haskell heaven anyway.


I think the only exit to this cycle is something like dhall, which adds features like functions and imports, but is "total" / forbids side-effects. https://dhall-lang.org/


This is deeply misguided, I think. Dhall is a neat hack, but it's still a hack.

Any and all attempts to munge YAML or to turn YAML into something Turing complete are signals that a scripting language would have been a better fit for the problem.

The exit to this cycle is:

* to use configuration languages for things that are genuinely configuration now and always

* scripting language APIs for things that are genuinely not configuration (e.g. build systems).

* provide both for domains that are genuinely mixed (a lot of kubernetes probably fits into this category - some things are configuration for some people and dynamic or generated for others).

It sounds simple but it is hard - the correct border between config and code is usually hard to perceive and how your tools end up being used is inherently unpredictable.


You just described the cycle, not a path to exit it. There is no such thing as "genuinely configuration now and always", nor "genuinely not configuration".

Systems change, and requirements of those systems change. Oh now someone wants to do more things with what used to be 'just configuration': scale it up and to work across many instances by scripting configuration templates with parameters. Look, all of this scripted automation is doing the same things that can be neatly divided into buckets, can we simplify it and turn it into a few configuration options? And so the cycle repeats. The churn is not reducible to labeling it as pointless make-work, its born of changing and growing systems.


Dhall is not turing-complete. That's the whole point.


That was incidental to dhall. Dhall doesn't and was never intended to help clean up the mess made by having ansible code be Turing complete (for example).

I'm pretty sure the whole point of dhall is to DRY out configuration files and to hack on type guarantees to config files.

I played with it for a bit and those are the only two use cases I could find. Those two use cases usually come encumbered with several others which it does not help with at all, but a scripting language would.

A good, refactored-to-be-DRY config file that is typesafe completely obliviates any need for dhall and if you've got a Turing complete YAML monstrosity it's not going to help much.

It reminds me a bit of XSLT (except that was accidentally Turing complete of course).


This comment doesn't make any sense to me

> A good, refactored-to-be-DRY config file that is typesafe completely obliviates any need for dhall and if you've got a Turing complete YAML monstrosity it's not going to help much.

Dhall is a "refactored-to-be-DRY config file that is typesafe", and the host of safety-related features ( https://docs.dhall-lang.org/discussions/Safety-guarantees.ht... ) make this practical to implement, even while evaluating untrusted & potentially malicious dhall code. These guarantees are much stronger than what nearly any other config or scripting language provides.


The point is you could all get that with YAML without plonking dhall on top if you had a properly designed schema.

Hence it's a hack to circumvent badly designed / no schema.


Lets look at a specific example. Take Kubernetes: everything is yaml, with complete schemas, all the way down. From your perspective this is configuration utopia, right? Meanwhile back in reality k8s is the poster child of "yaml hell". From the day it was released, people took one look at it, gave it a giant NOPE and instantly spawned half a dozen templating languages. The most popular of these is helm, which has a terrible, no good, very bad design: full of potential injection attacks from purely textual string substitution, manually specified indentation to embed parameterized blocks, virtually no intermediate validation, no way to validate unused features, etc etc

Compare to dhall which publishes a complete set of dhall-k8s schema mappings which enables you to factor out any design you want down to as few configuration variables as you like, while validating the configuration generators themselves at design time. https://github.com/dhall-lang/dhall-kubernetes#more-modular-...


>Lets look at a specific example. Take Kubernetes: everything is yaml, with complete schemas, all the way down. From your perspective this is configuration utopia, right?

LOL! No, their schemas are horribly designed and half of their YAML should really be APIs.

But it's hella popular.

>Compare to dhall

Which isn't.


Your argument is akin to "why do you need a high-level language? just write it in assembly, C is just a hack to circumvent badly designed assembly". If a 'properly designed yaml schema' was sufficiently powerful for all these use-cases then why does everyone and their third cousin come up with new configuration languages every week? Clearly they do, thus I must reject the null hypothesis and conclude that plain yaml schemas are not sufficient.

These arguments aren't holding any water.


>If a 'properly designed yaml schema' was sufficiently powerful for all these use-cases then why does everyone and their third cousin come up with new configuration languages every week?

Coz they fucked up/didn't fix their schema designs and people keep trying to work around that?

I thought I was being very explicitly obvious about this.

Programming is a history of bad designs being worked around with awkward hacks. This is nothing new.


Ah so your argument is that they just aren't doing it right, that a true yaml schema designer wouldn't make this mistake. It's just a mistake.. which keeps being made over and over by everyone that does configuration. Sounds like a no-true-scotsman and "it's the children who are wrong" to me. Can you give any examples of a popular correctly designed yaml schema?


I'm saying that a schema designers reaction to their users using jinja2 or dhall on their YAML ought to be to fix their schema so it's not necessary.

There are plenty of properly designed YAML schemas where people never feel the need to use something like dhall or jinja2.


I would put my money on lisp.

The structure of data and code is the same.

The result is that you start with configuration. But you all ways have a nice escape hatch


Too much escape hatch. I would prefer my configuration to be less than turing complete, tyvm.


YTT is pretty nice too and basically glues star lark and yaml together.


via comments! YAML is an abomination in itself ( spaces for logic?!), the last thing it needed was comments for extra logic.


Not actually via comments, they just use the same syntax, im pretty sure the pretemplated code is not actually valid yaml.


Proper spacing makes code readable. Forcing it is a good thing.


It's a good thing, until you ask a non-technical user to edit the config file. Then they become confused because a tab is visually, but not functionally equivalent to a number of spaces.


I wouldnt ask a non-technical person to write any of these config files.

If I want non-technical people to configure something I need to write a GUI for self-serving.


Indeed, it does, when done via something like go fmt. But in the case of YAML, a pedantic level of "proper spacing" is used for logic. 2 or 3 spaces shouldn't change the meaning and order of things


Or cuelang


In my opinion generating configuration from Python (or any language with a robust standard library) is still a win if it means less Ops and DevOps engineers are stringing together convoluted bash scripts that torture sed/awk/jq/yq to obscene lengths.


While it is true that Python scripts are far more superior to cryptic bash scripts, those scripts shouldn't be on the CI's side. Nothing wrong having build.py and test.py and the others, but there are no good reasons for them to be executable only by the CI. Debugging CI scripts is just asking for non-stop pager duty.


Most all configuration specifications have a limit unless they are turing complete. Complex systems evolve beyond static configurations easily but the question is can you leverage configuration generation to minimize logic inside your CI?

Orchestration ought not be the province of your build scripts, that should be the role of CI. If the configuration isn't capable of variadic functionality but is capable of orchestration... what do you do?


What is stopping you from making a project dedicated to orchestration/configuration/deployment and using test containers to make those tests? How do you debug those "complex systems" when they fail?

I've made a project in one of my old jobs dedicated to configuration and deployment tools used by CI jobs. At first it got some criticisms about "testing the CI that tests the software" and "testing the tests" but when the number of fires we have to take care of was cut by half it quickly became the norm.


You can call your own Python script from the pipeline, instead of using sed/awk/etc., right?


You make a good point.

As a developer who also dabbed in devops, I hate CI-side scripting with a passion.

The last place I've worked a few years back had about 20 interdependent Jenkins jobs per project. During a build, a job would break another job who would break another job but not successively fast enough to intervene so the majority of the time of the devops guru was spent on making hacks and fixes everywhere, thus fueling the endless cycle of madness. Middle management were aware of the problem but they never could say no to new aggressive timelines so here we were but I digress.

For me a CI server is a way to automate tasks so that 1) devs' time is saved 2) all versions are release worthy 3) human mistakes are kept to a minimum 4) a public record is kept 5) long and boring tasks such as packaging releases, generating code coverage and running the static code analyzer do not bother devs and are done periodically.

What CI should not do: 1) do things that devs can't do 2) crash 3) run nondeterministic builds and tests 4) be understandable by only one guy named Brent (wink wink Phoenix Project).

A typical CI script should limit itself to a simple workflow: checkout revision, get dependencies, build, test, release; and those should all be things that can be done, tested and modified and debugged on a common developing environment. Anything else is just loaning time from the future.


One important thing about typical config language is that they are much simpler to do static analysis upon than full fledged PL.

Static analysis, even just simple parsing, is a quite common need when you need to migrate or take statistics of something related to all configurations in a given code base.


I think the main issue with using programming languages to run your CI/CD is that the whole point of CI/CD is to test and verify that your code works. Now you have code that may or may not be buggy, running your CI/CD so you'll have to set up CI/CD to test your test running code, ad nauseum.

People might hate constrained configuration, but the idea is that you dont have to prove that what you're configuring is actually whats happening.


> Now you have code that may or may not be buggy

Doesn't your solution of replacing the possibly buggy code with configuration leave you with configuration that may or may not be buggy?


Its mainly a question of it being deterministic and reproducible I think. The presumption is that you then go and test a bunch of stuff based on whatever was built, so no, it will not be buggy.

But if the code that actually goes to production is actually different to what is tested (because your rebuilt it based on a config with some runtime behavior that executed differently), you are compromising the whole thing and all bets are off.

People will try to split the difference and generate a static config with code and test that as a reproducible entity, but then you have another set of tradeoffs (not least, complexity).


Template languages are meant to change a lot less than programming languages and adapt when they do. I see companies using programming languages like this and then in a year the libraries are no longer maintained, they have some churn and their new employees then want to rewrite everything. I rarely think it is worth writing CI/CD in a programming language unless you are committed to supporting it for its lifetime. Most companies will not be able to and they'll end up back at the standard templates sooner or later.


I get your instinct, but there is one thing missing. I’m theory, my CI isn’t really being run in a known environment. I can’t write a shell script to power the ci, because how am I to know the CI launcher even has a shell? Of course, once an environment has been deployed, I am safe to switch to write shell scripts, calling system commands, etc. But it does make some sense to be completely static at the start and I haven’t be even picked an operating system.


The middle ground between flexibility of a programming language and constraints of a static config syntax appears to be right about where Dhall’s niche is.


The line between configuration and code is fuzzy in mental space, so it makes sense we can't decide how to resolve it in digital space.

I believe the answer is "it depends on how you're using it" and generally speaking, customer needs override any pre-existing consensus on how it's being used.


I quite like how helm charts solves the problem. You have a values.yaml which is plain yaml, but at the same time you have the helm template language, which allows you to create more complex configuration templates with functions and flow control.


> before that we had a cycle of using programming languages

When was that?


> nobody is ever happy or at least willing to just settle for things that are "ok" but not "perfect"

I hope we never settle for things that are "ok" but not "perfect".


Several weeks ago I poked around Github's CI workflow references. It seems that there's a clear path for crafting a workflow that simply grabs the text of a each issue (as it's submitted) and appends it to the documentation before closing the issue, perhaps marking it as "expected behavior". In short, an evening of tinkering could automatically turn all of a project's bugs into features for the foreseeable future.

This would represent a quantum leap in software quality assurance.


At GitLab we take the 'Release notes' section of each issue and use it to automatically generate the release post. For an example of a release post see https://about.gitlab.com/releases/2021/04/22/gitlab-13-11-re...


What I would really like to see is a CI system that lets me write a script in a language of my choice instead of defining a pipeline config file. That way I can run the pipeline locally, put breakpoints in, etc.

Nuke [1] gets close but there are still a lot of tasks that don't have C# bindings, such as publishing build artifacts and uploading test results.

While I'm dreaming about my perfect CI, I'd also like the ability to download benchmark results from previous commits so that I can generate trend graphs and publish them in the build results.

To do this right the CI system would have to have an API using REST, GraphQL, gRPC or some such API format that generates clients in many languages. That way they don't have to maintain bindings in every language.

[1] https://nuke.build/


I think this is more of an issue with people misusing CI systems, your gitlab-CI/Jenkins/concourse/etc shouldn’t have any real logic in it, it’s just glue code.

Bazel/Gradle/Maven/etc tend push you in the right direction, bash, etc dont.

Or to put it succinctly, you need both a build system and a CI system, they are not the same thing.


I think the main issue is that most build systems are not expressive enough to do complex packaging, code gen, etc that most projects need. I've been using Bazel for a few projects/teams and it's worked really well. Once your build system is as expressive as you need you get a lot of freedom. Bazel also let's you execute completely local builds and remote builds triggered locally. If you set this up debugging CI issues is amazingly simple.


Totally agree. I've never used Bazel; it sounds a lot like Nuke. I had a coworker whose job was to maintain the CI. His commit messages would look like this sometimes:

> Fix CI issue with blah blah blah

> Hmm that didn't work lets try something from stackoverflow

> Build fix

> Build fix please work

> Please

> I hate my life

> I am tired and hungry, I want to go home

> Stupid yaml

The CTO would call him like a week later to make sure he was okay. This happened often before we switched to Nuke. It rarely happened afterwards. It's awesome being able to debug your CI locally.


The problem in C++ land (my land) is all your dependencies are probably using different build systems.

Do you wan to migrate all your third party libs to Bazel, or just hack up a few lines of CI yaml to call the authors build system?

I lean on gitlab yaml quite heavily and it feels quite effortless


We vendored all of our third party dependencies into our Bazel build. There were a lot and it was a giant PITA but over a multi-year horizon it was worth it. That doesn't mean it's the right choice for every business but it paid off for us.


Vendoring all your third party dependencies is fine, but you still have to build them.

Personally I use the native build system that upstream uses for most of them, and just hack in my compiler flags/options where necessary. Then I use CI triggers to rebuild my projects when the dependencies are rebuilt.


Yep, and if you get everything working smoothly enough developers can use the same very polished tooling for their own workflow.


Check out tekton CI (https://tekton.dev/), it's a Kubernetes operator to run a CI pipeline that's defined as commands running inside any container. Use any language, any commands, etc--as long as you can get a container image, you're good to go. There's a growing set of community created and curated actions to do common things too: https://github.com/tektoncd/catalog

Yeah you need a k8s cluster, but even a simple kind dev cluster that you spin up in 30 seconds with one command on your laptop will work.

I like it a lot because it enforces very little structure on you and doesn't reinvent everything. Stuff like storage (either ephermeral or existing volumes), secrets, configuration, etc. are already modeled and supported by Kubernetes and tekton can use all of that natively. And since it's all k8s native stuff you have all of the power of k8s, like its entire API for manipulating and managing execution, exposing services, etc. There is very little cognitive overhead or new things to learn once you know k8s.

If you're really averse to k8s though, check out drone. It has a local execution mode that is similar and just runs whatever pipeline commands you want in docker containers. https://github.com/drone/drone Batect is another even more minimal tool that's effectively just a docker-based workflow system: https://github.com/batect/batect


We use Tekton to manage our CI pipeline and I agree that the way it enforces very little structure is a strength. On the other hand it's new enough that if you need to stretch its capabilities you are going to have to get creative. The primitives it has are nice, but they have their limits.

For instance, if running a bunch of parallel tasks, collating results on a PV is out the window unless your cluster supports multiple writer volume types, which GKE does not. You have to bring in NFS volume types or something like that for it. In the early days of tekton they had a results primitive which synchronized an output dir to GCS, but they decommissioned that. So you are left pushing that logic into your task command. Running gsutil is easy enough, but it means you are pushing logic into your scripts and not declaring steps in the pipeline definition. You could make that command a step but I see little benefit in that.

Additionally there is no way to loop in the configuration to generate tasks, much less loop with an ordinal value. We end up just programmatically generating the resource definitions with ruby erb templates. All of our pipeline specs (including task runs, etc) creates a 2MB yaml file. We push dozens and dozens of these into k8s daily. It works but at the same time our usage of Tekton is more or less as a glorified alternative to batch jobs which works because batch jobs _still_ don't have a proper sidecar capability and also because we rely on the DAG to order dependent taskruns.

If your pipeline is simple, look at Tekton. But if your pipeline is complex... still look at Tekton but expect to do some work. Once you get a good workflow though you can you can scale your pipelines as easily as you can a deployment in k8s. We use node autoscaling and preemptibles (Tekton can retry if a task disappears due to node reclamation) to manage our CI costs quite effectively.


> Running gsutil is easy enough, but it means you are pushing logic into your scripts and not declaring steps in the pipeline definition.

Some other comments here have argued for pushing as much logic as possible into your scrips, so that they can be tested without the CI system. What's the downside of doing this?


I’ve been meaning to give Tekton a try, but how is the visualization (pipeline graphs ets) and Github/Gitlab/CI integration for returning results and viewing build output?


There's a super basic dashboard: https://github.com/tektoncd/dashboard Tekton is really more of a lower level engine for CI though and doesn't focus on building UI, etc. You might check out jenkinsx which uses tekton but builds a whole new jenkins experience on it: https://jenkins-x.io/


> What I would really like to see is a CI system that lets me write a script in a language of my choice instead of defining a pipeline config file. That way I can run the pipeline locally, put breakpoints in, etc.

Every tool I've used allows you to do this. Just write the script in the language of your choice, then run it in the pipeline.

A pipeline should be considered like "markup" around your scripts. Everything should work independently and on any system. The pipeline config just tells the scheduler which order to run it in, which bits can run in parallel, what artifacts to keep etc.


In practice the YAML seems to be used to hook things up - uploading assets, sending messages on slack, etc.

If gitlab/GitHub could just provide hooks to do this stuff for a range of scripting languages that could be easily dry run this would solve the problem.


My main problem is running the job/step locally and not wait to CI steps to validate some change.

My solution is: https://github.com/rosineygp/mkdkr

I can write somethings like this:

  py:
   @$(dkr)
   instance: python:3.8
   run: pip install requests
   export MKDKR_SHELL=python
   run: 'import requests
      r = requests.get(f"https://api.isevenapi.xyz/api/iseven/2/")
      print(r.json())'


GCP takes a really flexible approach with their cloudbuild CICD offering. Each build step is just a docker image URL, entrypoint, and args.

Want to do something fancy and complex? Just pull in a python image and call your python script. But you can still implement simple build steps directly in the build file. No need for a multitude of language bindings, everything you want to do is probably implemented with simple shell commands.


It's almost like we've turned yaml into some sort of primitive assembly file that must be compiled by a higher-order language. What was once created for human readability, has now turned into a machine generated and machine parsed format.


It was created for human readability, but always with machine interop in mind, otherwise English/<NatLang>, would prob. do just fine in most cases, especially if sprinkled with some markup.


> We started this project because of Gitlab CI yaml files growing over thousands of lines.

Is this typical? All of our Gitlab CI files are well under 100 lines. What sorts of things are these pipelines doing that require so much configuration?

Our CI steps are basically:

* Build

* Run some static analysis

* Test

* Publish build artifacts

With each step taking only a few lines. Most of the “heavy lifting” is managed by other tools like npm, or some scripts we have checked into the project, and our CI process just kicks off those steps.


My largest one is 200 lines. But yeah, most of the heavy lifting should be done by other tools. The most common "script" for a build step should be simply "make". For Python, almost everything is run through tox, so I have "tox" for tests, "tox -e wheel" for the packaging etc.


If you've got thousands of repositories deploying services written in different languages to many different environments and multiple clusters within each environment then your CI files can get quite lengthy. Using Docker helps standardize things, but our shared pipeline repo is still 3.8k lines of YAML.


I think if we're going that way, it would be nice to have a single library that can generate configurations for multiple CI systems (GitLab CI, GitHub Actions, Jenkins, CircleCI, etc)


It seems like a lot of people don't understand that having a DECLARATIVE language/configuration/whatever such a huge advantage it's insane. It's so easy to write, you can never make logical mistakes, avoid all kind of bugs, it's just easy to learn and you can just write the desired result.

I never understood these project where somebody makes a very easy declarative thing and makes it imperative.

Programming in Python is order of magnitude "harder" and error prone than writing a simple YAML file.


> Programming in Python is order of magnitude "harder" and error prone than writing a simple YAML file.

For a simple config, sure.

But as soon as you have something more complicated, you end up with:

- a frankenstein monster DSL, with pseudo flow control + funcs backed in through a patched up templating system

- weird error messages, no stack trace, and type errors that accumulates

- no DRY

- no tooling: type checks, linting, de bugger, logger. Forget about it.

Case in point: ansible playbooks.

So unless your config is going to stay under 20 lines, just use a real programming language. If you don't want to use Python, fine. Use dhall, nix, jsonet or something else.


Slightly relevant: I made a hack to write Ansible playbooks in Python [1]. It doesn't fix everything, for example runtime control flow still has to be described as attributes of tasks, but it does help with parametrising. It's a question whether it's worth it to add such a hack or if I should just make do with YAML.

[0]: https://gitlab.com/-/snippets/2112382


It's a great hack actually.


You have linting and logger in Ansible. Type checks, kind of depend.

Debugging is definitely something I also would like to see something better coming out of Ansible than what is currently available.


Have you seen the kind of "config files" YAML is being used nowadays? Gitlab CI, Ansible, Saltstack, various low-code tools... It's no better than a scripting language, because it turns out if what you need is to encode logic and control flow, you can't get away from it by shouting "DECLARATIVE" and pretending the problems don't exist. Pretty much all of these complex tools that use YAML don't use it for declarative purposes, but as a kind of hybrid AST/DSL. You're not avoiding anything except for maybe the need to parse a proper language.


I have made something similar for Tekton (Kubernetes) with Jsonnet, https://mustafaakin.dev/posts/2020-04-26-using-jsonnet-to-ge... but the eternal cycle of configuration vs code is always bothering me in every few years and make me question myself.


This is cool, but I'd rather my CI jobs are just k8s or nomad or systemd jobs. Then the code we deploy is the same as the code we use to build, and it doesn't matter WHAT you do in build land, go nuts and do whatever you want.

CI systems will either grow into general purpose code runners or they will wither and die.


Gitlab CI is already basically that


Can I run random long jobs, like say a PostgreSQL instance under it? and keep it monitored and sane?


Reminds me of the original atlassian bamboo pipeline as code impl that had you write the pipeline as Java or Groovy, can’t remember specifically. They got dunked on for it not being a declarative file.

Now we’ve got a pipeline being defined by a declarative file being generated by code.


We use bamboo at my job, and it's java that generates a yaml file that is then shoved into bamboo. It's a completely shitshow. why would i use a turing complete language to generate a declarative yaml file containing turing complete shell scripts? Why not just write the shell script in the first place?


Because the shell scripts are only run from _within_ pipeline steps. Gitlab CI yaml has no way of conditionally creating the pipeline steps, or the step's contents.


That is interesting. In fact I've googled for it and found a blog post with really nastay Java code defining a pipeline. Of course, in code you can do nasty things and create insane pipelines. Thus I think it is good having both - the declarative base and a generator on top. The declarative base ensures having an easy way to get into pipeline mechanics. However if you plan to write really complex pipelines, the pipeline-as-code approach jumbs in. At this point your'e firm with the basics and know what your'e doing in code.

Amazon is doing the same with the Cloud Development Kit (CDK). With CDK you can code your infrastructure in a number of languages (java, csharp, .net, python, typescript), which was synthesized into cloud formation to be finally deployed. For smaller projects and teams not firm with one of those languages, plain CFN may be much better. However after learning CDK you won't create any infrastructure without it.


To me it reminds me of troposphere[1]. It's similarly using an imperative language (also python) to generate a declarative file (CloudFormation).

[1] https://github.com/cloudtools/troposphere


JetBrains took that approach in Space, also, only choosing Kotlin in order to side-step(?) the "groovysayswhat?" ambiguity and anti-discoverability of Jenkinsfile


Gitlab can load its ci file from a remote web server. What I have done is generate the yaml programmatically and this gets returned from the web server. Less than ideal but it at least allows dynamic creation


You can just have a CI job generate a CI config file and have GitLab CI load that into a child pipeline. Look up dynamic pipelines.


I came across this great blog post: https://www.objectif-libre.com/en/blog/2021/02/23/a-new-era-...

Documentation: https://docs.gitlab.com/ee/ci/parent_child_pipelines.html

GitLab 13.10 also added support for parallel matrix job execution in child pipelines, speeding up the execution once more.

https://about.gitlab.com/releases/2021/03/22/gitlab-13-10-re...

There's more to dynamic pipeline creation, collecting blog post ideas in https://gitlab.com/gitlab-com/www-gitlab-com/-/issues/11122#... :)


Our shared pipeline used to create child pipelines and we wouldn't recommend them.

If all the shared pipeline jobs show up in your consumer project pipeline, you can easily extend them or overwrite them entirely.


Ya, as I understand, that's all this python ci lib is doing. I find it a little annoying that your first have to start a worker to create the pipeline file, and then again to run the pipelines steps


In this day and age it's not that weird. GitHub's own staff are spawning CI jobs just to add a label to a each new issue https://github.blog/2021-04-28-use-github-actions-manage-doc...


I feel like there was some weird edge cases I was having with child pipelines. Maybe it was something with artifacts, don’t remember.


Artifacts, passing environment variables from the main pipeline to the child pipeline, expanding variable references in the child pipeline and getting unexpected results...

We're migrating our shared pipeline away from child pipelines. Maybe one day we'll do a write-up. :)


> “We use bamboo at my job, and it's java that generates a yaml file that is then shoved into bamboo. It's a completely shitshow.”

> “What I have done is generate the yaml programmatically and this gets returned from the web server. Less than ideal but it at least allows dynamic creation”

They amount of polarization in this comment section is making me dizzy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: