Hacker News new | past | comments | ask | show | jobs | submit login

It's weird that people keep building DSLs or YAML based languages for build systems. It's not a new thing, either - I remember using whoops-we-made-it-turing complete ANT XML many years ago.

Build systems inevitably evolve into something turing complete. It makes much more sense to implement build functionality as a library or set of libraries and piggyback off a well designed scripting language.




> Build systems inevitably evolve into something turing complete.

CI systems are also generally distributed. You want to build and test on all target environments before landing a change or cutting a release!

What Turing complete language cleanly models some bits of code running on one environment and then transitions to other code running on an entirely different environment?

Folks tend to go declarative to force environment-portable configuration. Arguably that's impossible and/or inflexible, but the pain that drives them there is real.

If there is a framework or library in a popular scripting language that does this well, I haven't seen it yet. A lot of the hate for Jenkinsfile (allegedly a groovy-based framework!) is fallout from not abstracting the heterogeneous environment problem.


>What Turing complete language cleanly models some bits of code running on one environment and then transitions to other code running on an entirely different environment?

Any language that runs in both environments with an environment abstraction that spans both?

>Folks tend to go declarative to force environment-portable configuration.

Declarative is always better if you can get away with it. However, it inevitably hamstrings what you can do. In most declarative build systems some dirty turing complete hack will inevitably need to be shoehorned in to get the system to do what it's supposed to. A lot of build systems have tried to pretend that this won't happen but it always does eventually once a project grows complex enough.


> Any language that runs in both environments with an environment abstraction that spans both?

Do you have examples? This is harder to do than it would seem.

You would need an on demand environment setup (a virtualenv and a lockfile?) or a homogeneous environment and some sort of RPC mechanism (transmit a jar and execute). I expect either to be possible, though I expect the required verbosity and rigor to impede significant adoption.

Basically, I think folks are unrealistic about the ability to be pithy, readable, and robust at the same time.


>Do you have examples? This is harder to do than it would seem.

Examples of cross platform code? There are millions.

>You would need an on demand environment setup (a virtualenv and a lockfile?) or a homogeneous environment and some sort of RPC mechanism (transmit a jar and execute). I expect either to be possible, though I expect the required verbosity and rigor to impede significant adoption.

Why need it be verbose? A high level rewuorements, a lock file and one or two code files ought to be sufficient for most purposes.


We're not talking about a program that could run on any given platform (cross platform). We're talking about one program that is distributed across several machines in one workflow. That's a form of distributed (usually) heterogeneous computation. And typically with a somewhat dynamic execution plan. Mix in dynamic discovery of (or configuration of) executing machines and you have a lot of complexity to manage.

This is why I wanted to see some specific examples. I haven't seen much success in this space that is general purpose. The closest I have seen is "each step in the workflow is a black box implemented by a container", which is often pretty good, though it isn't a procedural script written in a well known language. And it does make assumptions about environment (i.e. usually Linux).


I call this the fallacy of apparent simplicity. People think what they need to do is simple. They start cobbling together what they think will be a simple solution to a simple problem. They keep realizing they need more functionality, so they keep adding to their solution, until just "configuring" something requires an AI.


Scripting languages aren't used directly because people want a declarative format with runtime expansion and pattern matching. We still don't have a great language for that. We just end up embedding snippits in some data format.


Who are the "people" who really want that, are responsible for a CI build, and are not able to use a full programming language ?

I used jenkins pipeline for a while, with groovy scripts. I wish it had been a type checked language to avoid failing a build after 5minutes because of a typo, but, it was working.

Then, somehow, the powers that be decided we had to rewrite everything in a declarative pipeline. I still fail to see the improvement ; but doing "build X, build Y, then if Z build W" is now hard to do.


People used to hate on Gradle a lot, but it was way better than dealing with YAML IMO. Add in the ability to write build scripts in Kotlin and it was looking pretty good before I started doing less Java.

I think a CI system using JSON configured via TypeScript would be neat to see. Basically the same thing as Gradle via Kotlin, but for a modern container (ie: Docker) based CI system.

I can still go back to Gradle builds I wrote 7-8 years ago, check them out, run them, understand them, etc.. That's a good build system IMO. The only thing it could have done better was pull down an appropriate JDK, but I think that was more down to licensing / legal issues than technical and I bet they could do it today since the Intellij IDEs do that now.


Uhh, I don't know. All the groovy knobs on Jenkins (especially the cloudbees enterprise one) and nexus enabled a ridiculous amount of customisation which while it made me a load of consultancy money, I think taught me the lesson that most of the time it's better to adapt your apps to your CI and infra, than to try and adapt your CI and infra to your apps.

I much prefer GitLab + k8s to the nightmare of groovy I provided over the last decade anyway..


It's funny. If you stick around this business long enough you see the same cycles repeated over and over again. When I started in software engineering, builds were done with maven and specified using an XML config. If you had to do anything non-declarative you had to write a plugin (or write a shell script which called separate tasks and had some sort of procedural logic based on the outputs). Then it was gradle (or SBT for me when I started coding mostly in scala) with you could kind of use in a declarative way for simple stuff but also allowed you to just write code for anything custom you needed to do. And one level up you went from Jenkins jobs configured through the UI to Jenkinsfiles. Now I feel like I've come full circle with various GitOps based tools. The build pipeline is now declarative again and for any sort of dynamic behavior you need to either hack something or write a plugin of some sort for the build system which you can invoke in a declarative configuration.


It's so true. I used Ant > Maven > Gradle. The thing that I think is different about modern CI is there's no good, standard way of adding functionality. So it's almost never write a plugin and always hack something together. And none of it's portable between build systems which are (almost) all SaaS, so it's like getting the absolute worst of everything.

I'll be absolutely shocked if current CI builds still work in 10 years.


This is kind of why I like keeping them as dumb as possible. Let each of your repos contain a ./build ./test ./run and the ci does stuff based on those assumptions...

You're switching from rpms->k8s? Actually nothing has to change per repo for this.

Also it creates a nice standard that is easily enforced by your deployment pipelines: no ./run? Then it's undeployable. kthxbye etc..

This becomes important when you have >50 services.


Haha. I'd be surprised if they work NEXT year.


That's sweet. It's thursday morning here. They probably already don't work any more.


Is maven old now....oh uh...gotta get with the cool kids


I was waiting for jai to see how the build scripts are basically written in... Jai Itself.

It seems that zig [1] already does it. Hoping to try that someday...

[1] https://ziglearn.org/chapter-3/


You can activate typechecking in groovy with @CompileStatic. It's an all or nothing thing though (for the entire file).


Joe Beda (k8s/Heptio) made this same point in one of his TGI Kubernetes videos: https://youtu.be/M_rxPPLG8pU?t=2936

I agree 100%. Every time I see "nindent" in yaml code, a part of my soul turns to dust.


> Every time I see "nindent" in yaml code, a part of my soul turns to dust.

Yup. For this reason it's a real shame to me that Helm won and became the lingua franca of composable/configurable k8s manifests.

The one benefit of writing in static YAML instead of dynamic <insert-DSL / language>, is that regardless of primary programming language, everyone can contribute; more complex systems like KSonnet start exploding in first-use complexity.


I wouldn't say helm has won, honestly. The kubectl tool integerated Kustomize into it and it's sadly way too underutilized. I think it's just that the first wave of k8s tutorials that everyone has learned from were all written when helm was popular. But now with some years of real use people are less keen on helm. There are tons of other good options for config management and templating--I expect to see it keep changing and improving.


Are there good kustomize bases for things like Redis, Postgres, Mysql, etc? My impression is that most "cloud-native" projects ship raw manifests and Helm charts, or just helm charts. By "won" I just mean in terms of community mind-share, not that they built the best thing. But I could be out of date there.

I do like kustomize, but the programming model is pretty alien, and they only recently added Components to let you template a commonly-used block (say, you have a complex Container spec that you want to stamp onto a number of different Deployments).

Plus last I looked, kustomize was awkward when you actually do need dynamic templating, e.g. "put a dynamic annotation onto this Service to specify the DNS name for this review app". Ended up having to use another program to do templating which felt awkward. Maybe mutators have come along enough since I last looked into this though.


Some of us use Make + evnsubst[0] (and more recently make + kustomize[1]) in defiance.

I haven't found time to take a look at Helm 3 yet though, it might be worth switching to.

[0]: https://www.vadosware.io/post/using-makefiles-and-envsubst-a...

[1]: https://www.vadosware.io/post/setting-up-mailtrain-on-k8s/#s...


Can just default to something like

  """
  apiVersion: v1
  appName: "blah"
  """.FromYaml().Execute()
or something.


I wish more people who for some reason are otherwise forces to use a textual templating system to output would remember that every json object is a valid yaml value, so instead of fiddling with indent you just ".toJson" or "| json" or whatever is your syntax and it pull get something less brittle.

(Or use a structural templating system like jsonnet or ytt)


Things like rake always made more sense to me - have your build process defined in a real programming language that you were actually using for your real product.

Then again, grunt/gulp was a horrible, horrible shitshow, so it's not a silver bullet either...


The way I would categorize build systems (and by extension, a lot of CI systems) is semi-declarative. That is to say, we can describe the steps needed to build as a declarative list of source files, the binaries they end up in, along with some special overrides (maybe this one file needs special compiler flags) and custom actions (including the need to generate files). To some degree, it's recursive: we need to build the tool to build the generated files we need to compile for the project. In essence, the build system boils down to computing some superset of Clang's compilation database format. However, the steps needed to produce this declarative list are effectively a Turing-complete combination of the machine's environment, user's requested configuration, package maintainers' whims, current position of Jupiter and Saturn in the sky, etc.

Now what makes this incredibly complex is that the configuration step itself is semi-declarative. I may be able to reduce the configuration to "I need these dependencies", but the list of dependencies may be platform-dependent (again with recursion!). Given that configuration is intertwined with the build system, it makes some amount of sense to combine the two concepts into one system, but they are two distinct steps and separating those steps is probably saner.

To me, it makes the most sense to have the core of the build system be an existing scripting language in a pure environment that computes the build database: the only accessible input is the result of the configuration step, no ability to run other programs or read files during this process, but the full control flow of the scripting language is available (Mozilla's take uses Python, which isn't a bad choice here). Instead, the arbitrary shell execution is shoved into the actual build actions and the configuration process (but don't actually use shell scripts here, just equivalent in power to shell scripts). Also, make the computed build database accessible both for other tools (like compilation-database.json is) and for build actions to use in their implementations.


I think what you are getting at is a "staged execution model", and I agree.

GNU make actually has this, but it's done poorly. It has build STEPS in the shell language, but the build GRAPH can be done in the Make language, or even Guile scheme. [1]

----

I hope to add the "missing declarative part" to shell with https://www.oilshell.org.

So the build GRAPH should be described as you say. It's declarative, but you need metaprogramming. You can think of it like generating a Ninja file, but using reflection/metaprogramming rather than textual code generation.

And then the build STEPS are literally shell. Shell is a lot better than Python for this use case! e.g. for invoking cmopilers and other tools.

I hinted at this a bit in a previous thread: https://news.ycombinator.com/item?id=25343716

And this current comment https://lobste.rs/s/k0qhfw/modern_ci_is_too_complex_misdirec...

Comments welcome!

[1] aside: Tensorflow has the same staged execution model. The (serial) Python language is used for metaprogramming the graph, while the the highly parallel graph language is called "Tensorflow".


What is a well defined "scripting" language? Lua, Python, Ruby?

I do agree it'd be nice with a more general purpose language and a lib like you say, but should this lib be implemented in rust/c so that people can easily integrate it into their own language?

Many unknowns but great idea.


Literally any real language would be better. Even if I have to learn a bit of it to write pipelines, at least I'll end up with some transferable knowledge as a result.

In comparison, if I learned Github Actions syntax, the only thing I know is... Github actions syntax. Useless and isolated knowledge, which doesn't even transfer to other YAML-based systems because each has its own quirks.


On the theme of the posted article “Literally any real language would be better” is how I feel every time I write cmake/auto tools/etc


I'd say it's not about the capabilities of the language, but the scope of the environment. You need a language to orchestrate your builds and tests (which usually means command execution, variable interpolation, conditional statements and looping constructs), and you need a language to interact with your build system (fetching code, storing and fetching build artifacts, metadata administration).

Lua would be a good candidate for the latter, but its standard library is minimal on purpose, and that means a lot of the functionality would have to be provided by the build system. Interaction with the shell from Python is needlessly cumbersome (especially capturing stdout/stderr), so of those options my preference would be Ruby. Heck, even standard shell with a system-specific binary to call back to the build system would work.


People hate on it, but do you know what language would be perfect these days?

Easy shelling - check.

Easily embeddable - check

Easily sandboxable - check.

Reasonably rich standard library - check.

High level abstractions - check.

If you're still guessing what language it is, it's Tcl. Good old Tcl.

It's just that is syntax is moderately weird and the documentation available for it is so ancient and creaky that you can sometimes see mummies through the cracks.

Tcl would pretty much solve all these pipeline issues, but it's not a cool language.

I really wish someone with a ton of money and backing would create "TypeTcl" on top of Tcl (à la Typescript and Javascript) and market it to hell and back, create brand new documentation for it, etc.


> I really wish someone with a ton of money and backing would create "TypeTcl"

They did. Larry McVoy of BitMover created Little, a typed extension of Tcl [0]. Didn't do the marketing bit, though.

[0] https://wiki.tcl-lang.org/page/Little


I'd rather see an Lua successor reach widespread adoption. Wren pretty much solved all my gripes with Lua but there is no actively maintained Java implementation.


How is the Windows support? One of my big needs for any general-purpose build system is that I can get a single build that works on both Windows and POSIX. Without using WSL.

That said, you're right, at least at first blush, tcl is an attractive, if easy to forget, option.


Windows support? Great as far as I can tell. ActiveTcl, the Windows distribution, has been a thing for several decades now. I remember using it back in 2008.


The IDE/editor that ships with Python for Windows is built on Tkinter, which is a wrapper for Tcl/Tk. So it seems to work fine.


Tcl. We already have that language and it's been around for decades, but it's not a cool language. Its community is also ancient, and it feels like it.


Kotlin. It has good support for defining DSLs and can actually type check your pipeline.


They probably mean an interpreted language or at least something distributed as text. But honestly, you could come up with some JIT scheme for any language, most likely.


> Build systems inevitably evolve into something turing complete. It makes much more sense to implement build functionality as a library or set of libraries and piggyback off a well designed scripting language.

This is so true. That's why I hate and love Jenkins at the same time.


The problem with arbitrary operations is that they are not composable. I can’t depend on two different libraries if they do conflicting things during the build process. And I can’t have tooling to identify these conflicts without solving the halting problem.


Pulumi and CDK come to mind, they look very interesting compared to yaml/dsl approaches


Fully agreed -- Pulumi was in this space correctly and right out of the gate while CDK is a relative newcomer (both in general and inside the walled garden of AWS). AWS has contributed to the bloodshed with CloudFormation for a long time.

Also, don't forget that CDK for Terraform now exists[0] as well.

[0]: https://www.hashicorp.com/blog/cdk-for-terraform-enabling-py...


That's kind of what Bazel does. Skylark is a Python dialect.


Why is an entirely new dialect necessary? Why couldn't it just have been a python library?


It started as that at Google and was a nightmare in the long run. People would sneak in dependencies on non-hermetic or non-reproducible behavior all the time. The classic "this just needs to work, screw it I'm pulling in this library to make it happen" problem. It just kept getting more and more complex to detect and stop those kinds of issues. Hence a new language with no ability to skirt around its hermetic and non-turing nature.


Mozilla uses Python, but executes the code in a very limited environment so you can't import anything not on an allowlist, can't write to files, etc. But it's just the regular Python interpreter executing stuff. It produces a set of data structures describing everything that are then used by the unrestricted calling Python code.

It seems to work pretty well, though it feels a little constraining when I'm trying to figure something out and I can't do the standard `import pdb; pdb.set_trace()` thing. There's probably a way around that, but I've never bothered to figure it out.


Starlark is an extremely limited dialect of Python. It is intentionally not Turing complete, to prevent unbounded computation.

I think the rationale for making it syntactically similar to Python was: we want to add a macro processing language to our build system, and we want to support common features like integers, strings, arithmetic expressions, if-statements, for-loops, lists, tuples and dictionaries, so why not base our DSL off of Python that has all that stuff, so that people familiar with Python will have an easy time reading and writing it?

Then they implemented a very limited but very fast interpreter that supports just the non-Turing-complete DSL.


To limit what can be done, to make it easier to reason about: https://docs.bazel.build/versions/master/skylark/language.ht...


This is sorta Gradle's approach.


Gradle's approach is commendable, but it's too complicated and they built Gradle on top of Groovy. Groovy is not a good language (and it's also not a good implementation of that not good language).


It's good enough for me. It's unlikely to be "replaced" by the new hotness because it was created way before the JVM language craze where there were dozens of competing JVM languages. Sure that means it has warts but at least I don't have to learn a dozen sets of different warts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: