“this is the language that powers Cloud Firestore / Cloud Storage security rules”
This really needs to be stated on its front page. Right now it’s all “Hows” and no “Why”.
First question anyone looking at it asks: What problem does it solve?/What need does it fill? A real-world use case provides an easy relatable answer.
Incidentally, with existing links to protobuf and no halting problem to worry about, it sounds like you’re halfway to having a remote query language a-la SQL too.
I actually looked at this thinking it was pretty cool and could come up with usecases immediately–it is a great way to do simple query/matching in a consistent way. If I'm not mistaken, this could be used for something like Gmail's advanced search? There are tons of interfaces where everyone designs their own ad-hoc expressions and having something like this would be very useful in those cases.
I'm an ardent supporter of executable config languages, especially for the infrastructure-as-code space (the only thing special about this space is that configs tend to be very large, so you're more likely to run into reuse issues), which markets itself as "it's just YAML!" but inevitably all of that copy/pasted YAML becomes unwieldy and you want reusability. At that point, you have a few distinct options:
1. Build an AST on top of YAML a la CloudFormation. Now you're programming in YAML, hurray!
2. Extend your static language with executable features a la Terraform/HCL, basically reinventing (and very badly, at that) more traditional language features
3. Use text templates, a la Helm--now you can generate syntactically invalid configuration! (and absolutely trivially, at that)
4. Use an expression language (familiar, ergonomics) a la Pulumi, Starlark, Nix, Nickel, Dhall, etc
Note that in these conversations, someone inevitably shouts "use the simplest tool for the job!" ignoring that static configuration languages (and options 1-3 above) are strictly more complex for the reusability use cases outlined above.
EDIT: Pulumi isn't an expression language; rather, it lets you use real languages to generate configuration, and these languages often include powerful expression features. AWS's CDK is also in this category.
> someone inevitably shouts "use the simplest tool for the job!"
I think one problem is that configuration needs start out simple and evolve to complex - as opposed to being obvious from the start you'll need dozens of discrete components to deploy. At that early stage, anything beyond a few lines of YAML seems like definite overkill.
Eventually, it becomes clear that it's very hard to maintain, but by then there's thousands of lines of hard-knocks battle-tested, working production config to try and basically recreate from scratch, without breaking anything.
Until you've gone through this process once (or maybe a couple times..) it's hard to see why you should make that initial leap to a much more complex, tools-required workflow. "But eventually, we will probably need..." is a tough sell against someone arguing to do the simplest thing.
That’s where experience comes in. We should know that certain domains (e.g., Kubernetes configs) are going to get unruly fast and we oughtn’t waste time with YAML. I don’t think this will be a controversial opinion in a few years time.
Also; more generally: it makes sense to KISS, and the key risk here isn't somebody using yaml or json or whatever initially - even where experience shows that's insufficient, it's just not that costly either. The question is what to do when that becomes unwieldy. And I think it's pretty clear that kinda-sorta-programming that tries to incrementally extend stuff like static config languages - but only slightly - doesn't work well and is a bad idea. It's inconvenient; it results in many of the same issues as a full programming language, and it's often really inconsistent in its expressiveness - as in, for any given application thereof you're likely to run into limitations.
I think it's wise to try and skip as many of those intermediate stages as possible. Of course; that's not a clear-cut solution strategy either; because what's "as possible"? Exactly how high up the language chain do you need to go; conversely which language (and environment) features are too powerful, rendering the language difficult to contain?
>
I'm an ardent supporter of executable config languages
Me too. I feel like there should be some eponymous law about this. Every declarative language that starts out trumpeting "simplicity and not being Turing complete is a feature!" ultimately grows features until it is an imperative language, or gets replaced by one that is.
If you're gonna get there anyway, you may as well design for that instead of bolting on features poorly after the fact.
What about the option of just writing a regular, one-off program in a regular programming language, the output of which is your baked YAML config; and then having a pipeline that involves running that config-generator program, piping its output to your orchestrator of choice?
Nearly every programming language has a YAML serialization library†. And before that serialization happens, your config can be expressed using the regular-ass coding features of your program, however you like. (For optimal clarity, I personally would suggest creating a builder DSL and using it.)
† Technically a language doesn't even need a YAML serialization library to emit valid YAML; because valid JSON is also valid YAML. You can just serialize to JSON on your end, and feed the result into anything that's expecting YAML.
At that point why use YAML at all? If it's generated by a program and fed to a program, you're better off using protobuf or something like that. In fact, since you're probably using the same language on both ends, why not just write a regular value in your language?
This probably sounds like a strawman, but it's not. It's how a lot of e.g. Python projects are configured - the "config" file is just a normal bit of code that gets run to produce a value. Unless you're using a programming language that absolutely sucks at expressing plain values (e.g. C or Java), it's much better than separate config files, IMO.
Ideological answer: For the same reason HTTP/2.0’s binary protocol didn’t instantly obviate/deprecate HTTP/1.0’s text protocol. Text has advantages: text is debuggable, and prototypable. If the interface between two programs is a text based declarative language, you can audit that text, diff that text, edit that text to see how changes affect the result, mock one side or the other by producing or consuming that text, etc. “GitOps” style config management would never work if config was all opaque binary blobs. These are all reasons that major software projects standardize on YAML or other widely-supported textual data serialization formats for their config.
Pragmatic answer: because we’re talking about production configuration management, here, which is, 99% of the time, about writing configuring and managing the third-party black-box components in your stack, not your own components. Your own business layer usually can be configured conventionally, with minimal explicit config, for your use case, since you built it to work idiomatically for that use-case. It’s all the third-party stuff that has an impedance mismatch to your use-case assumptions, translating to needing tons of config to do what you need.
And, obviously, if you don’t control the other end, you don’t decide how the other end does its config. Usually, these days, it’s YAML (or TOML) — for the ideological reasons mentioned above.
Example: Kubernetes. Big consumer of complex YAML. Many people try to template that YAML. Much simpler and less error-prone to just write a program to generate said YAML. No reason to assume you’re writing in whatever language the k8s orchestrator is written in. (In fact, there are multiple orchestrators, written in different languages, and the shared YAML resource spec is the only formal interface they share.)
> Ideological answer: For the same reason HTTP/2.0’s binary protocol didn’t instantly obviate/deprecate HTTP/1.0’s text protocol. Text has advantages: text is debuggable, and prototypable. If the interface between two programs is a text based declarative language, you can audit that text, diff that text, edit that text to see how changes affect the result, mock one side or the other by producing or consuming that text, etc.
I can see the argument for using a textual format (although I think it's weaker than you say; if we're generating this config with code then we don't want to diff or edit the generated config), but YAML seems like a singularly poor choice if you want reliable diffs and editing; it's like picking tag-soup HTML. Straight JSON (ideally with a schema), TOML or even XML seems like a better bet if you're generating it programmatically.
> And, obviously, if you don’t control the other end, you don’t decide how the other end does its config.
Right, in that case it's all moot. I took GP to be talking about what formats these tools should use. IMO if the tool is intended to consume a machine-generated config then it would be better to use a machine-oriented config format. I think the option of something like protobuf (which is language-independent) is underappreciated, but even restricting ourselves to textual options, something stricter than YAML seems like a better bet.
But the third-party tool frequently isn’t intended to (only) consume machine-generated config. It’s usually built to consume a format that could equally be machine-generated or hand-authored. Usually with an emphasis on hand-authoring, where machine-generation is an automation over hand-authoring that will only need to happen as one scales; and so high-complexity machine-generation will only be relevant to the most enterprise-y of integrators.
Other examples of formats like this, that are hand-authored in the small but generated in the large: RSS, SQL, CSV.
Again, Kubernetes is a prime example of this. K8s config YAML is designed with the intention of being hand-authored and hand-edited. It’s only when devs or their tools need to auto-generate entire k8s cluster definitions, that you begin needing to machine-generate this YAML. This generated YAML is expected to still be audited by eye and patched by hand after insertion, though, so it still needs to be in a format amenable to those cases, rather than in a format optimal for machine consumption.
> if we're generating this config with code then we don't want to diff or edit the generated config
Look more into GitOps. The idea behind it is that whatever tooling you’re using to generate config is run and the resulting config is committed to a “deployment” repo as a PR; ops staff (who don’t necessarily trust the tooling that generated the config) can then audit the PR, and the low-level changes it describes, before accepting it as the new converged system state. It puts a human veto in the pipeline between machine-generated config and continuous deployment; and allows for debugging when upstream tweaks aren’t having the low-level side-effects on system state one would expect.
In most programming languages you can hand author a value just fine - that part isn't an advantage to something like YAML or json. Given the use of variables and a few other similar simple techniques, I dare say many programming languages are more amenable to hand-authoring static config objects than most static config languages.
I think the real issue is reproducibility; and that boils down to purity. Fully fledged languages all come with lots of apis and features to interact with the rest of the world, and it's quite unclear which apis have such dependencies and which do not - and it's seductively easy to do something actually useful in a "real" programming language that will make the whole configuration process unwieldy later - like, say, reading parts of the config from disk, getting some services public key off the internet, embedding a timestap, or even writing some computed config like a random key to a bit of storage for a later config process to consume. And once you do that, then the whole thing gets flaky, fast.
If you can rigorously avoid that, there's not too much advantage to a static config language.
> In most programming languages you can hand author a value just fine
But keep in mind that we’re not inherently talking about programming languages here — nor are we necessarily talking about people capable of programming as our configurators. We’re talking about third-party components that need to be configured by ops people, who may or may not be DevOps people. Usually they’re not — most ops people are just pure ops, and don’t know any programming languages. As well, most amateur integrators (e.g. a person setting up their own blog) aren’t programmers either.
The goal of these systems, when choosing a configuration solution, is twofold: to give pure-ops and amateur integrators a config language they can author directly, in a text editor, without learning programming; while also making that language formal/structured enough that it’s easy to machine-generate from your programming runtime of choice, if you do have those skills, and a rigorous mindset.
Sure, programming languages don’t necessarily require you to use the full-fledged expression syntax they enable, and so can “reduce” to a configuration-language-like subset of themselves.
But remember, again — ops people and amateur integrators. What do such people tend to do, to create their config? Read the reference config schema? No. They tend to look up tutorials with samples, or StackOverflow “solutions”, from arbitrary places on the Internet.
And what do the creators of those samples have in abundance? Cleverness and a desire for clarity of meaning. Traits that cause them to use the expressive features of whatever the configuration language is, in order to make their answers more “pithy”.
Which means that, to wield these “pithy” samples/solutions, the ops people and amateur integrators now have to understand how to “patch” one arbitrary piece of complex code into another increasingly-arbitrary piece of complex code.
The thing a static data-serialization format gets you, is that the rules for merging any two expression-nodes in it are very simple to learn, because there just aren’t that many types of expressions. There’s no way to be “pithy” with the configuration that requires people to learn entirely-new-to-them syntax.
By choosing to configure your system in YAML, you’re guaranteeing that the samples these ops people and amateur integrators find and attempt to glue together, will also just be pure YAML. And since their existing config file, and each new sample, are pure YAML, they’ll likely succeed at doing this gluing-together.
Meanwhile, DevOps people and enterprise integrators can create their own programs to generate the YAML — but since there’s no first-party framework for doing this, there won’t be much value in sharing these programs around, and so the samples the pure-ops people and amateur integrators find will never be given “in terms of” writing code for such a framework, but rather only in terms of the config YAML itself.
> I think the real issue is reproducibility; and that boils down to purity. [...] If you can rigorously avoid that, there's not too much advantage to a static config language.
Individual users might be able to rigorously avoid that (though expecting a rigorous approach to formal expression from non-programmers is a bit much.) But often it's the system itself that needs purity and reproducibility.
Remember, config formats are usually something executed at every startup — in other words, they're durable state that happens to be human-modifiable. (Think: the Windows Registry.) As the designer of a system, you don't want the same state you serialized today to deserialize to something else tomorrow; and you especially don't want the meaning of your state to depend contextually on the environment. You want to "pin down" your state.
A good example: programming-language package-ecosystem "lock files." In most languages, dependency-constraint specification is done in a programming language, such that the generation of those constraint expressions has access Turing-complete features. But once you lock those constraints down to a baked set of choices, the lockfile itself — the predetermined set of choices, that should be environment-independent — is not expressed in a Turing complete language (in any runtime I know of, at least) but rather is always expressed in its own little static declarative language; or at most in a limited "data-expressions only" subset of the parent language (e.g. Erlang's `file:consult/1` format.)
In this case, dep-constraints are the inputs to a config-generator program; while the lockfile is the config format itself. The config format is a necessary intermediate here; it'd be impossible for the runtime to make the same static guarantees about package management if it wasn't! (In fact, see e.g. Python's setup.py, where exactly that problem stymies any package-manager the Python ecosystem introduces from pre-determining dependency graphs before actually downloading and attempting installation of the dependencies.)
Yeah, there's something to be said to a format that makes it hard to shoot yourself in the foot; essentially. That point is somewhat orthogonal to the issue of how easy it is to author a config value, however.
By the way, you conflate purity with turing completeness; but the two are not really all that strongly related. It's possible to have a turing incomplete language that is nevertheless impure (public I/O without unconstrained repetition), and conversely a turing complete language that is pure (i.e. keep your tape private).
I'd argue that turing completeness isn't as relevant as people make it out to be here. It's not a good thing, mind you, but it's just not that problematic either; externally imposed termination and storage limitation can render any turing complete system into a turing incomplete system - that's easy - but a system with uncontrolled sideeffects is almost intrinsically hard to manage. In fact, even technically turing-incomplete systems may well need to impose similar limitations anyhow, because a technically turing incomplete language that allows (say) nested loops or iteration - albeit bounded - may well not practically terminate, or nevertheless cause too much I/O. Some languages are really limited, and perhaps then you can get away without externally imposed resource constraints, but it's not clear to me how realistic that scenario is.
The real problem (to my mind) in general-purpose languages when it comes to using them for config-specification is not turing completeness, it's purity (i.e. reproducibility). And that's not even really a language issue alone, it's because those languages tend to come with large, pervasively used libraries, to the point that it's not trivial to just take some code off stackoverflow (say) and reliably tell whether it's pure or not - because that depends on the internals of all of those library methods too.
That's irrelevant right? The point is that it's reproducible. Whether you define purity as to include non-termination or not is besides the point; the point is to avoid side-effects. Lack of side effects matters in the context of configuration, non-termination does not (and see the thread you're replying to for an argument as to why that is). That's kind of the whole point of the argument.
> That's irrelevant right? The point is that it's reproducible.
It's not reproducible if it's not a value. The point of a pure function is that you can replace it with the value that it evaluates to.
If you include nontermination as a value in your language then your language becomes almost impossible to reason about as you break almost every equivalence property you could think of. E.g. you can no longer say x * 0 = 0.
> Lack of side effects matters in the context of configuration, non-termination does not (and see the thread you're replying to for an argument as to why that is).
I don't find "but terminating code may still take a long time" to be a convincing argument that nontermination isn't important; rather it's an argument that code taking a long time might also be important (at least to the extent that it actually comes up in practice, which I'm not convinced of).
I think it's pretty reasonable to say that technically you might not be able to equivalently replace x * 0 by 0. Note that if it's replacable, it's reliably replaceable. This is essentially how pretty much all functional languages work incidentally - functions may not terminate, and in theory that can cause issues, but in practice not so much. Part of the saving grace here is that:
(A) - you're exceedingly unlikely to run into this issue in the first place. Nontermination limmits aren't set to things you're likely to hit without a runaway loop or recursion.
(B) - when you do hit the a forced termination issue, such replacements are usually irrelevant, i.e if your algebraic rewrite doesn't affect the recursion or loop it won't affect bail out either. Depending on how you implement forced termination, you can likely guarantee this, but it's not very valuable to.
(C) - The alternative isn't real if you allow theoretically bounded loops but with high limits. You can decide not to specify forced termination, but that doesn't mean you don't have it; it simply means the OS or user will terminate the process instead, and reasoning about that is much, much harder. A system with small limits is possible, but those are much less practical to start with. And there have been quite a few systems over the years that tried to impose such limits by design and then it turned out that there were escape-hatched that could be abused to nevertheless impose huge load (if you can do any kind of doubling in an iteration, you don't need many to cause denial-of-service).
(D) - Although it's possible an algebraic rewrite could affect termination, the scope for this is pretty constrained. Either it works, or the whole system fails to terminate; there's no middle ground. That means if you simply assume termination will occur and deal with the code as if it were pure, you'll either end up with a functioning system, or a clearly non-functioning system, but without corruption or any unacceptable uncertainty. (It's possible to shoot yourself in the foot here, but I don't it's possible to do so accidentally).
I mean, if you want to make the argument that all of this is tricky - yes, it sure is; and there are a few risks and some complexity to all this! But simultaneously, I don't think you're going to do a lot better if you want the kind of flexibility that recursion and looping allow. These complexities are pretty manageable in practice; the risks limited. And if you need even tighter guarantees you're going to need to lose most recursion and loops, likely even bounded loops. I've never used it, but earlier in this thread somebody mentioned starlark - and while they clearly tried to avoid turing completeness (loops are bounded, and no recursion by the looks of it), they do allow nested loops with large bounds; i.e., given whatever time you think your OS or user will be willing to wait before pulling the plug you wont hit those bounds: in terms of reasoning, you cannot rely on termination.
But I think that's a decent trade-off. The restrictions needed to reliably terminate in some bounded amount of resources are just too onerous to leave room for a language that can come close to one that does not have those restrictions. As such, it's fine to have either a deterministic, side-effect free language with the risk of (practical) non-termination, or a language with very, very limited looping (e.g. no nesting, and perhaps only constructs like map as opposed to iterating over ranges) - but not much room for anything in between.
Again, the context is the kind of languages you might consider for configuration. And in that context I don't think that turing completeness is all that relevant, compared to determinism and no side-effects, assuming termination. It's those latter aspects that really have a huge impact, and termination mostly in theory, not in practice.
Things like AWS Cloudformation require YAML input, so there's no real choice on what you emit.
But writing the YAML is fiddly and annoying, so that's a good example of something where it is better to generate it via troposphere (a python module) or some similar system.
To be less specific I guess the answer is that sometimes you don't control both ends - the part that emits and the part that consumes, and having faught ansible, and similar tools, if I can avoid it I'd never want to write YAML by hand for non-trivial purposes if I could script it instead.
Just write JSON and pretend it's YAML. YAML is a superset of JSON so there's no need to generate "nice" YAML if there isn't a human reading or writing it.
It’s still good for humans to be able to debug it, and there’s no downside to generating YAML over JSON (I say this as someone who typically prefers JSON).
This is even more true for Ruby. The language is famous for the ease of creating DSLs because of block passing and optional parentheses. Examples: puppet, chef, vagrant, Rails' configuration files. I still remember the joy of not configuring a project with XML coming from Java Structs in 2006.
After trying to write complex loop statements and conditionals in yaml for ansible I had this thought as well. It's nice when declarative configurations work, but once they don't and you have to try to write a real program in yaml you'll want to pull your hair out.
This was my exact experience as well. I was an early user if sensible and loved that it was yaml. Then had to deal with jinja inside yaml. And finally a syntax for loops appeared!
I have run into the same type of issues with salt.
> I'm an ardent supporter of executable config languages
I agree, but it becomes very important to limit scope. For example, Azure templates allow looping and conditionals and all sorts of fancy stuff. Approaching a typical library of ARM templates is a massive undertaking: reading a JSON (or YAML) if statement is not ergonomic in any way - causal relationships can be multiple screens (or files) apart because of the sheer amount of JSON required to represent executable code.
It should be kept relatively lightweight, with stuff like CloudFormation GetAtt to glue deployed things together. Anything more complex should be solved with tooling designed for computation, i.e. programming languages (that emit config, e.g. Pulumi).
I haven't used Lua, but I've used Starlark extensively and I will say that static typing is a boon, especially in the infra-as-code space where the feedback loop can be very long.
I sort of agree with this, except that I don't think it's square one. The ability to change config without rebuilding your artifact is one of the advantages of separate config files, and using an embedded language like lua wouldn't remove this advantage.
Completely agree. Executable configuration in a language with strong declarative programming support is superb. I've had great success embedding Lua into a C++ application for exactly this purpose.
The moment you encode alternation in the configuration file, its time to think about biting the whole Turing complete bullet.
Starlark is Python (thanks Guido!), while CEL is designed specifically to not be Turing complete or have constructs like loops, etc.
"CEL evaluates in linear time, is mutation free, and not Turing-complete. This limitation is a feature of the language design, which allows the implementation to evaluate orders of magnitude faster than equivalently sandboxed JavaScript."
As mentioned, the goals are security policies (it was first used internally as the Security Rules for Cloud Storage for Firebase and the Cloud Firestore) and proto contracts (e.g. you could define addons to your proto to specify the data matched certain behavior):
I forget the exact syntax for the contract, but it looked something like this...
```
message person {
@contract(matches(/* RE2 phone number regex */))
string phone_number = 1;
...
}
```
That data could enforce client side checks as well as be used server side (in different implementation languages).
> Starlark is Python (thanks Guido!), while CEL is designed specifically to not be Turing complete or have constructs like loops, etc.
Sort of. Starlark doesn't (or at least didn't originally) support recursion or while loops or a number of other structures. There's also a few other differences that make starlark "better" for configs (some immutability is different, there's no such thing as a `class`, etc.)
I still support loops in a configuration language
for x in sequence:
generate_complex_thing(x)
or
[generate_complex_thing(x) for x in seq]
are better than a lot of the more declarative approaches (such as the various contextual approaches of a number of alternative langs) which get hard to reason about because they represent implicit global state.
"Executable" configuration languages (most "non-executable" configuration language parsers are push down automatons that execute the configuration) are handy, but without strong coding standards and good discipline, the line between business logic and configuration tends to blur over time.
Cartesian product, map, and reduce operations over finite sets and lists are really handy in configuration. ("For each server in SetA look at each path in SetB and ...")
But, if you find yourself starting to write general loops (as opposed to loops implementing map, reduce, and Cartesian product in languages that don't have them built-in), it's a sign you're starting to blur the line between configuration and business logic.
Unit-testing configurations is difficult, especially if they can be non-deterministic (depend on data/time/random()) and aren't modular.
In some sense, all programs with configuration files are really interpreters for the language of their configuration files. (As mentioned before, many of these abstract machines are just push down automata.) Taken too far, the configuration becomes the real program.
I've seen a (now retired) automated trading system with a powerful XML-based configuration language where a few times people got themselves into trouble (and caused trading losses) when their complex tower of configuration fell over. Part of the problem was there existed a few people who weren't trusted to write application logic, but who were trusted to "just update configurations". When the only tool some of your people are allowed to use is a hammer, hammer marks start mysteriously showing up everywhere. Additionally, this was over 10 years ago, and prior to these trading losses, configuration underwent less stringent review. I don't think my experience was atypical.
I've also seen configuration loading get stuck because someone added some code to the config to hit a REST endpoint in the middle of the configuration file. Ideally, you'd leave any I/O to the main program logic, where it's easier to perform the I/O asynchronously, or otherwise non-blocking.
Deterministic non-Turing-complete immutable "executable" configuration languages (or at least ones where it's difficult to get unbounded recursion) tend to be a happy medium. Also, declarative rather than imperative configuration languages tend to be easier to read.
Back when I was a developer in web search infra at Google, I vaguely remember once or twice using a language (maybe Borg's config language, borgconfig) that completely lacked mutability and essentially used object prototyping (A is created as a copy of B, with differences specified at object creation time.)
> When the only tool some of your people are allowed to use is a hammer, hammer marks start mysteriously showing up everywhere.
This ... yes. Being the ops guy backing up second line support at an ISP for a while brought me many examples of this and inspired many in-house tools for them to use instead.
The creator of Apache Ant, James Duncan Davidson, wrote about choosing XML as the “language”:
> Now, I never intended for the file format to become a scripting language—after all, my original view of Ant was that there was a declaration of some properties that described the project and that the tasks written in Java performed all the logic. The current maintainers of Ant generally share the same feelings. But when I fused XML and task reflection in Ant, I put together something that is 70-80% of a scripting environment. I just didn't recognize it at the time. To deny that people will use it as a scripting language is equivalent to asking them to pretend that sugar isn't sweet.
I've never used this particular library, but I did put together my own simple evaluation engine and have found it very useful for a range of purposes.
Initially it was designed to process incoming slack messages, and sometimes trigger a notification to an on-call engineer, but over time I've found uses for it processing email, scripting simple actions on my desktop, and more.
These kind of things are pretty simple to write, but sometimes I almost think it is a shame there isn't something more standard. (Lua was kinda winning for that embedded-logic role for a long time, but nowadays we still have the mixture of YAML, HCL, and other niche-specific language/filtering and I imagine the time has passed to pick one standard.)
Nesting them can lead to very not-so-easy to understand logical expressions, furthermore if the parts between ?: is long enough it can also noticeable reduce readability.
Instead of `<cond> ? <left> : <right>` I prefer `if <cond> { <left> } else { <right> }` the additional brackets noticeable improve readability and you can extend it to support `if <cond> { <a> } else if <cond2> { <b> } else { <c>` instead of `<cond> ? <a> : <cond2> ? <b> : <c>`.
(Oh and that last example might be wrong needing brackets depending on operator precedence...)
Through if you don't nest it it doesn't matter (oh and because it's a expression evaluation `else` is not optional but required as you need a value the expression resolves to).
Ah. The biggest difference between ?/: and if/else is that ?/: is an expression (returns a value) and if/else are statements (a step/command/declaration/etc). You can build statements on top of CEL (lots do), but the core Common Expression Language (CEL) doesn't actually have them.
Note that python uses if/else for the ternary expression form as well: `a = b if c else d`
Though personally, I like to have the condition in the front, instead of in the middle.
I use both in different projects. OPA (and its language Rego) is a good matching and policy engine with declarative blocks, modules, and expressive functions for HTTP headers, JWTs, etc. It's great for security. For building up abstractions and testing arbitrary JSON with complex, pre-defined policies. Testing is built in, and that's great. You can create "functions" and your own DSL for matching/evaluation.
If you want ABAC, use OPA. In every case.
CEL, and CEL-GO, is entirely different. It allows you to evaluate arbitrary expressions with random data. Think a search (eg. linkedin API's crappy search, or log searching, or random predicates).
You would not define complex policies in CEL like you would in OPA. Well, I would not - you can define arbitrary macros and functions in CEL but it is not made for that scale. OPA is more suited for that.
Some examples:
- In OPA, you can define a policy that matches RBAC, ownership/acl, and ABAC in one file. With multi-tenancy. Think: "as a patient, I can see my data", and "as the patient's guardian, if they're under 18, I can see their data". And "As a doctor in the patient's clinic, I can see their data". And "as a clinical director in sudo mode, I can see their data". All in the same policy package, with tests.
- OPA supports "partial evaluation". For example, if you only have a subset of data available, you can evaluate an OPA policy and have OPA tell you whether the policy evaluates to true or what data is missing. This is quite powerful for building up complex auth layers.
- In CEL, you can say "all users > 30 days old". Simple, easy, filtering. EG, with a custom date macro, `date(users.created_at) > duration("30d")`.
In short, use both. OPA for security and complex policies. CEL for user-defined "expressions".
Thanks for breaking this down -- that makes a ton of sense. I've sene OPA's use along with k8s but haven't seen much use of it outside k8s yet. It seems like almost a special case.
My biggest problem with a lot of these generic computation (you could view OPA as generic computation but with a focus on auth) is that they bring their own DSLs -- I'd love to see something like CEL that's based on regular programming languages, and the only way I can think of doing that right now is through WASM.
I'll let Tristan or Torin comment more authoritatively, but back in 2017/8 CEL partnered with OPA and I believe CEL was used as the basis for expressions in their new version of Rego.
I left the team about that time, so I don't know what exactly happened after that, but I wouldn't be surprised if the two are fairly close. My assumption is that's why CEL is polished up and OSS (I think we first published it a few years ago, why'd it get posted now?)
OPA Rego and CEL are distinct, but you can see similar thinking in OPA Gatekeeper and CEL Policy Templates (https://github.com/google/cel-policy-templates-go) which are aimed at separating config from policy in order to create a better user experience. Note, the CEL Policy Templates are early in development, but build upon the abstractions provided by CEL.
A lot of the times engineers at Google will open source libraries or tools they have worked on, which go under the Google GitHub repo, but are attached with that language. This is basically saying that it is owned by Google but it is not something Google is officially supporting. It may continue to get updates, it may not. I've definitely seen some libraries open sourced from Google, that stopped being pushed externally once the primary driver behind it left Google or moved on to other projects.
If it's not officially supported, it probably means it's just one or two people that open sourced it, and it would fall on them to keep it in sync with any internal work.
Officially supported means it's actually owned by some team. They'll dedicate resources to it, meaning it will be accounted for on any project planning or resource management the management needs to do.
"Official" is mostly about who they consider the customers of the product when making a decision.
If the project is officially open-sourced, that will be taken into consideration when project priorities, re-orgs, and direction occur at higher levels. "How does this effect our commitment to the community" is a question to address. If it isn't official, then is best-effort by the people who pushed for it to go open-source.
An open source maintainer is never obligated to address your bugs / issues. However it is good open source citizenship to be clear about what level of support people can and can't expect.
Unless the open source is part of a paid offering (like Firebase CLI), it's required to be listed as not being an Official Google Product. That said, CEL is used in a number of publicly supported Google Cloud services which means that it's well supported with dedicated maintenance. Case in point, I'm the CEL lead at Google.
Google puts that on most of its open source products including both employee personal projects and projects where Google has employees whose job it is to contribute to it.
Yes, for example the Firebase Tools CLI is open source (MIT), but it does not include the waiver, because it is an official Google product: https://github.com/firebase/firebase-tools
I can't escape the feeling that emacs got this right. Nobody wants their config to be lisp, but it fits the bill for what you needed. Especially combined with the custom sections. So nice.
While I don't disagree, you can get rather far by limiting what you allow in the evaluation. There is no reason you have to pull it in in your current environment directly.
And then you have an easy mechanism to allow some configs from trusted parties to be a bit more capable, if they need it.
Though in that case you definitely want to disable macros for this config (as they allow for exponential time/space) and be very careful with any additional functions you expose.
If you’re constantly fighting yaml, consider jsonnet. It’s another project by google and similarly not Turing complete. Works wonderful at generating templates.
The goal of CEL is fast, scalable, and portable expression evaluation.
Fast - CEL runs without the need for sandboxing, making it much faster than sandboxed solutions like WebAssembly, Lua, and embedded JavaScript.
Scalable - Features like variables and functions would make CEL more expressive, but also less scalable as it's easy to write a few lines of code with functions that consume exponential amounts of memory and compute. CEL is simply the expression and nothing more.
Portable - CEL is implemented in Go[0], C++[1], and Python[2] with Java open sourcing in development. There is a public codelab[3] available for Go if anyone is interested. There is also a conformance suite in CEL-Spec to ensure consistent behavior between runtimes and environments. Our objective is to make it possible to bring CEL to K8s, J2EE apps, and C++ proxies. Evaluate at line-rate everywhere. Personally, I hope someone tries to make CEL work on IoT devices some day too.
Where? - CEL is usually embedded into larger projects rather than being the one stop shop for solving a particular kind of problem. For example, CEL Policy Templates[4] has an opinionated way of using CEL to validate/evaluate YAML configs. Most of the time CEL is part of a service API.
In addition to being used in Firebase's Cloud Firestore / Cloud Storage security rules, it is also used in several other Google Cloud services:
- Cloud Armor[5]
- IAM Conditions[6]
- Cloud Healthcare Consents[7]
- Cloud Build Notifiers[8]
- Security Token Service[9]
- Access Levels[10], and more.
CEL is also used in some prominent open source projects like Envoy RBAC[11], Caddyserver[12], Krakend.io[13], and Cloud Custodian[14].
I'd be a bit suspicious about the claim that it is not Turing complete. To be fair I can't yet find a way to allow arbitrary computation (though it seems easy to add one with fairly innocuous features). Although you can get it to solve 3-SAT, though only for some predefined number of variables (which it assures can be at least 32). Combinatorics stuff like printing all possible sudokus also seems like it should be feasible.
Don't expect your config files to terminate when they use macros, that's all I'm saying.
There's a difference between pathologically high complexity functions and Turing completeness. Sub-Turing languages generally don't allow recursion or unbounded loops - your program is always making progress. Solving 3-SAT doesn't sound like it precludes sub-Turing completeness, since you'd have a finite number of solutions you're iterating over.
It's still useful for a config language because it makes it harder to accidentally make a config that (in practice) never terminates, and usually allows for easier static analysis and refactoring of the config files through immutability and purity.
On the one hand you're right, though at some point 'arbitrarily long' and Turing complete become pretty similar. In fact an ordinary computer isn't entirely a Turing machine either, as its memory is limited.
Also it means you need to be careful about malicious input, you need to take countermeasures when you evaluate an expression from an untrusted source.
Recursion in a pushdown automata (the equivalent machine for a CFL) is bounded by the input words being consumed, since each state transition consumes one input token. Since all input words are finite, indefinite recursion is excluded.
> Although you can get it to solve 3-SAT, though only for some predefined number of variables (which it assures can be at least 32). Combinatorics stuff like printing all possible sudokus also seems like it should be feasible.
How is this compatible with their claim that "CEL evaluates in linear time"?
The only place I can find that makes the 3-SAT claim is this comment thread.
Nevertheless, there's two answers I can think of here. The first is simple (but IMO probably not right): 3-SAT is (probably) exponential, but the language definition document qualifies the linear claim by saying that macro expansion can also be exponential.
The second is probably more accurate, though. General 3-SAT is (probably) exponential, but we're dealing with 3-SAT for a constant number of variables. You can solve 3-SAT by producing all combinations of values for the variables and testing the expression for each one, but while testing an expression should be linear there's an exponential number of combinations. If your combination count is constant, though, the whole complexity becomes linear ... with a shockingly bad constant factor.
Macros are considered an optional feature of the language because they can:
a) easily be disabled,
b) have no dedicated syntax beyond that defined for core CEL.
CEL supports subsetting and extension which means you can actually turn off built-in features in order to guarantee a sort of maximal compute / memory impact that an expression might have while still augmenting the core feature set in order to tailor it to your use case.
Bounded iteration is possible via macros, and such iterations can be nested; thus, you can have high polynomial time expressions, but only if you choose to permit them and many use cases (like IAM Conditions) don't. This is different from OPA Rego or HashiCorp Sentinel in that these features are baked into the syntax and impossible to turn off with 100% certainty.
Since you can't declare functions or variables within CEL, the environment of an expression (the variables and functions it can use) is completely controlled by the host process. The environment acts like a sandbox of sorts, but one specifically chosen by the application and not a general purpose mechanism like a hypervisor or sandbox like WebAssembly.
Hi, I'm Jim (@JimLarson), another CEL maintainer. Yes, the full language spec is more clear about the performance limits, and macros can easily result in exponential time or space complexity. The original claim could be for either the macro version or the equivalent expanded version, e.g. "[true, false].exists(v1, [true, false].exists(v2, ... [true, false].exists(vN, (v1 || !v2 || v3) && (v2 || v3 || !v4) && ...)...))", etc. But it doesn't require much power in the language to express this kind of "solver" - there's still no general recursion.
Specifically in our rules everything after the "if" is Common Expression Language.
See: https://firebase.google.com/docs/firestore/security/rules-co...
The efficiency and safety of CEL enables us to put security rules in the critical path of every database request.