StrictYAML

IshKebab · 2025-03-07T12:26:35 1741350395

Well it's definitely better than YAML but I'm not sure that's saying much. I wish we just had universal support for JSON5.

solatic · 2025-03-07T13:00:10 1741352410

I get the argument that YAML is here to stay and we need to live with it as a configuration language. I can buy an argument that environment variables are best for configuration with less than 10 knobs, and YAML is fine when there's a simple hierarchy, let's say within 250 knobs.

The issue here is that schema validation is expressed in Python. The author contradicts himself when he argues, on the one hand, that Python shouldn't be used for configuration because it's too powerful: https://hitchdev.com/strictyaml/why-not/turing-complete-code... and on the other hand, that Python is really powerful for building schemas: https://hitchdev.com/strictyaml/why-not/json-schema/ .

Schema validation, IMO, only really gets necessary for much larger configurations, the kinds that are the size of Kubernetes manifests or CI/CD pipelines. And we do have less powerful languages, like CUE, that prove that one doesn't need a Turing-complete language to have expressive schemas.

If you have to support YAML, that's one thing. But ideally, if you're at a scale where it really matters, you should be looking at a more modern configuration language.

hitchdev · 2025-03-07T14:49:40 1741358980

>The issue here is that schema validation is expressed in Python. The author contradicts himself when he argues, on the one hand, that Python shouldn't be used for configuration because it's too powerful: https://hitchdev.com/strictyaml/why-not/turing-complete-code... and on the other hand, that Python is really powerful for building schemas: https://hitchdev.com/strictyaml/why-not/json-schema/ .

Disclosure: I'm the author.

The difference is because full validation/parsing is a task that can rarely be always fully accomplished with JUST a non-turing complete schema. Every time I use JSON schema I have to add additional validation on top written in turing complete code.

This happened to me literally just an hour ago when I wanted to put a DSL in a field in a config file. json-schema (the "config" schema) doesn't let me write code to validate this and reject it. It's a string or it's not. With StrictYAML schemas written in code it's pretty straightforward to create a parser/validator that rejects invalid DSL with a meaningful error.

You might argue that "these rules bolted on top aren't part of the schema" or "this is validation that you can do after the json schema validates" but there is benefit to combining them - namely, code coherence and validation error consistency.

(there are also down sides - namely that json schema can be used in multiple languages. strictness comes at the expense of reusability).

In practice almost every schema I build I want to have stricter validation rules that are not enforceable with something like json-schema alone.

These are both instances of the law of least power. There are plenty of languages which are too powerful for the task at hand and plenty which are not powerful enough and people hack around and even rage against both. There are other "goldilocks" languages that are just right for the task at hand.

solatic · 2025-03-07T20:31:36 1741379496

> This happened to me literally just an hour ago when I wanted to put a DSL in a field in a config file. json-schema (the "config" schema) doesn't let me write code to validate this and reject it.

You can embed DSLs in CUE. It's a bit unwieldy because you have to essentially reproduce the DSL grammar in CUE, and it may not be performant, but yeah, it's doable. Can you provide more details?

> You might argue that "these rules bolted on top aren't part of the schema" or "this is validation that you can do after the json schema validates" but there is benefit to combining them - namely, code coherence and validation error consistency.

I would argue that it's a slippery slope. Consider v1 where an enum is statically defined as Employee or Manager. Then in v2 we add VP and CEO. Then in v3 actually the list of permitted titles needs to be fetched from a database populated by HR. Is it still correct to put this in configuration validation? What if the person writing the configuration doesn't have permissions to read from HR's database? So nothing should work?

hitchdev · 2025-03-08T09:51:31 1741427491

>You can embed DSLs in CUE

CUE lets you embed functions too it looks like it's almost a programming language itself.

The closer a configuration language gets to a programming language the less of a reason I see for it to exist.

>I would argue that it's a slippery slope. Consider v1 where an enum is statically defined as Employee or Manager. Then in v2 we add VP and CEO. Then in v3 actually the list of permitted titles needs to be fetched from a database populated by HR. Is it still correct to put this in configuration validation?

No, coupling to a database would be bad design IMO, but grabbing those enums from other config files in the same folder that are parsed earlier I have done a lot.

I've also used libraries that provide lists of timezones and country codes as enums and plugged them in to the parser so you couldnt invent your own country code.

And Ive written validators that reference other bits of the config (e.g. the list of permitted titles is in another part of the config).

All of these things I would argue are good and useful and not worth sacrificing in exchange for preventing possible misuse (like coupling parsers to a DB).

I actually wrote this parser in the first place because I wanted to create a good metalanguage for tersely defining strongly typed executable specifications in YAML (i.e. Gherkin done right). Tons of stuff I wanted to strictly validate wouldnt have been possible with config-based schema validation and with YAML's weak, implicit typing it was a fucking mess.

benrutter · 2025-03-07T13:47:31 1741355251

When you say more modern configuration, are you thinking of any specifics? TOML?

anotherhue · 2025-03-07T12:21:58 1741350118

I'm a fan of anything that moves us away from stringly typed nonsense.

See also Dhall (which can render to yaml). I like the idea but found the veneer broke a little too often and left me squinting at Haskell.

https://dhall-lang.org/

sevensor · 2025-03-07T13:55:29 1741355729

My problem with libraries like this in Python is that, because nobody wants to have to hand write semantic validation code, they end up mashing semantics into the parser. What I want is for all the different file formats to be different syntaxes for Data, defined recursively as

    Data = str | list[Data] | dict[str, Data]

Leave validation to other libraries. You can get very far simply by reflecting on type annotations.

karel-3d · 2025-03-07T13:17:20 1741353440

I wonder how it compares to Yaml 1.2, Yaml 1.1 (that are not compatible with each other), and the weird mix of 1.2 and 1.1 that go-yaml/yaml (the one used in k8s, helm, docker) use

https://github.com/go-yaml/yaml?tab=readme-ov-file#compatibi...

vander_elst · 2025-03-07T15:38:19 1741361899

What would be the advantages over something like textproto?

AlecBG · 2025-03-07T13:08:09 1741352889

Why OrderedDict when dict is already ordered?

loloquwowndueo · 2025-03-07T13:15:00 1741353300

That’s relatively new (introduced in Python 3.6). You wouldn’t believe how many production code bases are still in 3.5 or lower.

The change log for 3.6 states:

The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5

Therefore it’s advisable to use OrderedDict if there’s even a chance this code might be used with older versions.

whalesalad · 2025-03-07T13:51:37 1741355497

Yeah I recently learned about this after giving another engineer PR feedback that dicts are not ordered and we cannot expect the intended behavior. Jokes on me. Old habits die hard

porridgeraisin · 2025-03-07T13:17:46 1741353466

OrderedDict = you can choose and manipulate the order

Normal dict = insertion order only (since some python version)

Haven't looked into why they need key ordering here though.

tempay · 2025-03-07T13:14:49 1741353289

It wasn’t when strictyaml was made and then it becomes a backwards compatibility issue.

BiteCode_dev · 2025-03-07T12:30:08 1741350608

I hope one day we will all move to something like Cuelang.

anotherhue · 2025-03-07T12:32:30 1741350750

Assuming you meant Cue: https://cuelang.org/

See also Dhall.

ansc · 2025-03-07T13:13:17 1741353197

It's nice, but not a lot of bindings from what I can tell.

xelxebar · 2025-03-07T13:16:41 1741353401

Okay, permit me a curmudgeonly rant. As someone who has implemented a YAML parser and spent way too much time dissecting the spec along with analyzing various implementations, StrictYAML smacks of simple ignorance.

The page describing the project's raison d'être [0] is mostly a collection of incredulous statements, sprinkled with lovely factual errors. Heck, the point about implicit typing even links to the YAML 1.2 spec, claiming that implicit types are intended behavior, while the spec explicitly makes the opposite clear.

That said, a lot of the common complaints about YAML are rooted in the fact that almost all end user libraries are stuck at YAML 1.1. This is mostly because everyone (including PyYAML) relies on libyaml, the primary culprit. IMHO, YAML 1.2 is quite nice, and I wish we could fix libyaml instead of everyone and their dog inventing their own half-baked language to scratch an itch.

Or perhaps even better, the primary inventor of YAML is currently avidly working on YAMLScript[1] which is a much more radical idea on programming and config language design, while all being YAML backwards-compatible.

[0]:https://hitchdev.com/strictyaml/features-removed/

[1]:https://yamlscript.org/

Ygg2 · 2025-03-07T13:46:46 1741355206

> YAML 1.2 is quite nice, and I wish we could fix libyaml instead of everyone and their dog inventing their own half-baked language to scratch an itch.

YAML 1.2 is categorically better than YAML 1.1.

That said: It still suffers from many things that sucks about YAML. From user perspective, it suffers from anchors being a thing (billion laughs attack), duplicate keys, complex object as keys (WTF is this feature even), and loading YAML to objects.

From an implementation perspective, the quoted scalars, several string forms, and huge number of corner cases really makes parsers difficult to write.

xelxebar · 2025-03-07T15:18:49 1741360729

I totally feel you on the implementation side. The formalization used in the spec leaves a lot to be desired, and the spec devs seem to recognize this well. One of the aims of my dayaml[0] project is to explore a completely different formalization that rationalizes out the non-fundamental sharp edges.

That said, there really aren't that many special cases. A lot of the them are UI features, arguably endemic to the problem space in one form or another.

The deficiencies you see in the language, however, don't ring true with me. They are mostly implementation deficiencies:

- Anchors are just pointers. YAML can efficiently represent generic object graphs, but if a loader copies those pointers into a billion laughs, that's an issue with the implementation decision. Usually, it comes down to assuming that YAML graphs are always trees, which will always turn cyclic graphs into infinite tree unfoldings.

- Keys are explicitly specced as unique [1]. Not really sure why libyaml and friends get this wrong.

- Loading YAML to objects is explicitly designed to represent the native data structures of your language. It's a serialization format by design. That's why it loads into objects. That's why it has complex keys.

We can certainly discuss whether YAML is appropriate as a configuration language or not, but YAML is first and foremost designed as a language-agnostic textual representation of object graphs. The spec goes well out of it's way to make this clear, and viewing YAML for what it is, instead of a configuration language, really makes the apparent oddities disappear IMHO.

[0]:https://github.com/xelxebar/dayaml

[1]:https://yaml.org/spec/1.2.2/#mapping

[2]:https://yaml.org/spec/1.2.2/#11-goals

Ygg2 · 2025-03-07T15:44:45 1741362285

> Anchors are just pointers.

> That's why it has complex keys.

Yes. And it leads to exploits or errors. The dumber the serialization format, the better. JSON has thrived without these mis-features. And most stuff I've seen in the wild doesn't use the exotic features anyway.

The format user should worry about converting nodes to references, and complex keys are something few users ask for.

xelxebar · 2025-03-08T04:05:18 1741406718

> it leads to exploits or errors

You are blaming implementation failures on the language design. That's kind of my whole gripe. Instead of wasting cycles on inventing new languages, I wish we'd pool resources into fixing libyaml.

> most stuff I've seen in the wild doesn't use the exotic features anyway

Mostly due to lack of awareness. Ruby, Python, JavaScript Rust, Java, etc. all allow mostly arbitrary objects for keys. It's only confusing if you conflate dictionaries/maps for objects.

In the scientific computing community, it's not unheard of to see lists as keys in YAML documents, which is really convenient if that serializes to exactly the data model you're working.

FriedrichN · 2025-03-07T12:48:38 1741351718

I'll just come out and say that I hate every single configuration language. All of them suck in their own unique way and every time a new one comes out it fixes some issues of the language it's supposed to supersede but never without introducing new problems. And eventually you're left thinking that you should've just used a .ini file.

solatic · 2025-03-07T13:14:05 1741353245

"Configuration" languages are fundamentally necessary and it's tragic that people don't understand why.

There are declarative and imperative paradigms. Software engineering layers them on top of each other. Frontend devs write imperative TypeScript to manipulate declarative JSX, which instructs a React library to imperatively decide how to layout declarative HTML, which instructs a web browser to imperatively decide how to render, and so on. The frontend sends an imperative API call to a declaratively-specified API gateway, which imperatively forwards a declarative request body to a backend service, which imperatively goes through validation, authorization, etc. before submitting a declarative SQL SELECT to a database, which imperatively plans out a query over declarative representations of data on the disk, sending imperative system calls to the kernel/disk controller, etc.

Python, JavaScript, Rust, Go.... these are all fine programming languages that allow expressing an imperative paradigm.

But we have fewer languages for declarative paradigms. So-called "configuration" languages are attempts to build higher-level declarative paradigms. Nothing more, nothing less. We need higher-level declarative paradigms to build on top of the current imperative paradigms. It is the next step in the march towards more power and expressiveness, and therefore more productivity and ease of maintenance.

masom · 2025-03-07T13:19:33 1741353573

yup.

TOML gets pretty close to a `.ini` file as a standardized parser, taking the original format idea a little bit further.

https://toml.io/en/

XorNot · 2025-03-07T13:29:11 1741354151

I hate TOML. It's even worse at expressing maps and sequences then YAML, which is actually quite good at it.

demurgos · 2025-03-07T13:44:05 1741355045

What is the issue with TOML maps and sequences compared to YAML?

My main gripe was that inline tables had to be single line, but the restriction is lifted in TOML 1.1.0.

Ygg2 · 2025-03-07T13:34:10 1741354450

Seeing how YAML allows for keys to be objects (and often mutable at that), I can't agree with that sentiment that YAML is good at expressing maps.

loloquwowndueo · 2025-03-07T13:11:11 1741353071

Well .ini files also suck so what are you going to use now? :)

FriedrichN · 2025-03-07T13:15:45 1741353345

They absolutely do. But I usually don't have to explain the config language and it's widely supported, that is an absolute upside of .ini files.

NeutralForest · 2025-03-07T13:26:00 1741353960

Agreed and they don't have good escape hatches. What was supposed to be declarative always ends up requiring some logic here and there and then you're stuck with a terrible language like YAML and you have to turn to templating, references to anchors and whatnot.

What I don't like is when you need to use a configuration language like Bicep or Terraform when the underlying architecture cannot be represented declaratively. You can create resources and provision them, that's fine. But any time you need forking paths, specific conditions, iterations over some resources, etc. You're done for, unless the configuration language has built a command or keyword for your specific use case. You can always tell me that I'm holding it wrong but when the platform requires me to use those config files or the SDKs for whatever languages are useless, it's infuriating.

Side note and not related to configuration languages but how they're used on $cloudProvider. But when you declare resources or operations that are legal in the language but invalid on the plateform, I die a little bit inside. The platform has all the knowledge about the existence of resources, policies, behavior; there's a whole class of problems that shouldn't exist before you're even trying to run a pipeline!

rad_gruchalski · 2025-03-07T12:57:14 1741352234

I like how you shifted the goal post from “I” to “you” to justify your point of view. I don’t care, give me yaml, toml, json, jsonnet, ansible, who cares. It’s a tool. I’m not married to it.

FriedrichN · 2025-03-07T13:08:06 1741352886

I'll use what I'll have to use, it's a tool like you said. But I don't have to love it. Configuration is a necessary evil and whatever I end up using, I'm never fully satisfied with the end result.

rad_gruchalski · 2025-03-07T13:14:28 1741353268

There’s no need to have any emotional connection to any tool.

FriedrichN · 2025-03-07T13:19:47 1741353587

You are right. However, it is my work and I do have an emotional connection to my work and my frustrations with certain technologies are very real.

monkey_monkey · 2025-03-07T14:09:33 1741356573

Similarly, there's no need to comment on this.

rad_gruchalski · 2025-03-07T15:01:35 1741359695

Don’t comment then. You may not agree with me but I feel it is important to comment because it’s important to share an opinion shaping the path of new engineers joining the field. Tools are tools. Some are better than others. There’s no reason to have an emotional connection to them. Five years from now there will be new tools we never imagined we need. We are paid for getting the job done, not for an emotional opinion.

monkey_monkey · 2025-03-07T17:24:23 1741368263

You seem to be the emotional one here, but hey, if it makes you feel better trying to police how other people react to stuff, then you do you, sister.

Also, what's this "we", just because you're being paid to be a robot doesn't mean everyone else is.

rad_gruchalski · 2025-03-08T20:53:49 1741467229

Woosh.