StrictYAML

js8 · on July 3, 2022

It's a mystery to me why people won't start using properly designed configuration languages, like Dhall (https://dhall-lang.org/) or Cue (https://cuelang.org/). Even using csexps (https://en.wikipedia.org/wiki/Canonical_S-expressions) would be progress.

technion · on July 3, 2022

Github actions is the one that surprised me. It's a recent enough invention that it was built long after all these discussions, issues raised, and alternatives proposed.

I've had far more issues getting these YAML files correctly formatted than I should admit, and every test means pushing a commit and waiting to see what happens.

cobbal · on July 3, 2022

And one of the root keys in a GitHub action is "on" which will get turned into True by a compliant parser if unquoted

Master_Odin · on July 4, 2022

Only if you use a yaml 1.1 parser. Yaml 1.2 changed the behavior of how booleans are parsed, but it's only been 13 years since it was released, so I get why people might not have migrated to it yet.

technion · on July 4, 2022

This sounds like an example of the problem: It's not up to me to migrate, we're using a third party service and I have no idea what version of Yaml Github Action's parser users. I read this guide and can't find a version referenced anywhere:

https://docs.github.com/en/actions/using-workflows/workflow-...

It refers you to this YAML guide, which itself never mentions what version it's helping you learn:

https://learnxinyminutes.com/docs/yaml/

Epa095 · on July 3, 2022

> and every test means pushing a commit and waiting to see what happens.

I use nekos act to run workflows locally (and it does not support reusable workflows), but it sure helped me to get the yaml right.

edflsafoiewq · on July 3, 2022

YAML lets you easily embed sh scripts

  run: |
    mkdir build && build
    cmake ..

etc. This is great for CI.

andybak · on July 3, 2022

Depends who the intended audience is. I wouldn't inflict curly braces on anyone other than software developers and certainly not where you expect people to do a lot of manual data entry.

Even I tend to write "pseudo-YAML" and then add the JSON line noise programmatically if I'm entering a lot of data.

Ygg2 · on July 3, 2022

I wouldn't inflict YAML on anyone either.

- Oh you used a tab instead of a space.

- What do you mean country code can be 'no'?

- Sure we are JSON compatible**!

- Yeah. 1e3 is an actual string. If it's not. We're not compatible with YAML 1.1

- Yes. I will import your remote exploit.

neilv · on July 3, 2022

People who've learned a Lisp family language offer our hopes and prayers that people using these configuration languages don't miss a comma, space, newline, or optional quote. :)

    ((name        "Ford Prefect")
     (age         42)
     (possessions "Towel"))

    (:name        "Ford Prefect"
     :age         42
     :possessions ("Towel"))

    (countries "GB" "IE" "FR" "DE" "NO")

    (countries gb ie fr de no)  ; if country codes are identifiers in a DSL

    ((first-name "Christopher")
     (surname    "Null"))

    (:first-name "Christopher"
     :surname    "Null")

mike_hock · on July 3, 2022

> You can import Dhall expressions from URLs that support CORS

We seem to have very different ideas of what a "configuration language" should be capable of and what its parser should (be allowed to) do.

infogulch · on July 3, 2022

To add to siblings comment, dhall can only load resources from URLs if it's annotated with the hash of its content. The "language" itself is explicitly not turing complete, and is fully deterministic.

Look a little closer, dhall is probably the best option I've seen that preserves the properties I want out of a configuration language (correctness, determinism, turing incompleteness, easy projection into other formats, etc).

mike_hock · on July 3, 2022

Ok, I don't want the language to be Turing-complete (checked), but I also don't want the parser to open network sockets.

infogulch · on July 4, 2022

That's reasonable. Please note:

> However, when you protect an import with a semantic integrity check the import is permanently locally cached after the first request, so subsequent imports will no longer make outbound HTTP requests.

Also this PR to nixpkgs from 2020: https://github.com/NixOS/nixpkgs/pull/79900

> Many users have requested Dhall support for "offline" packages ... The goal of this change is to document what is the idiomatic way to implement "offline" Dhall builds ... The trick to implementing offline builds in Dhall is to take advantage of Dhall's support for semantic integrity checks. ... The offline nature of the builds are enforced by compiling the Haskell interpreter with the -f-with-http flag ...

https://docs.dhall-lang.org/discussions/Safety-guarantees.ht...

js8 · on July 3, 2022

Not really, Dhall is strongly normalizing, it cannot execute typical programs.

HelloNurse · on July 4, 2022

Both XML and s-expressions have a great formal advantage over JSON or YAML and most of their derivatives: the ease of being explicit by means of using names and types allows applications to process their configuration freely rather than according to inappropriately high level conventions.

  (server-addresses)

  (server-addresses (ipv4 "10.255.5.5:801") (localhost) (ipv6 "1234::5678") (ipv4 "10.255.25.25:8001"))

  (list 
   (server (address (ipv4 "10.255.5.5")) (port "801") (allowed-countries (list "se" "no" "dk" "fi")) )
   (server  (name "nil") (allowed-countries nil) (address (ipv6 "1234::5678") )))

chipotle_coyote · on July 3, 2022

It may be because for many use cases -- probably most, statistically, although that's just an intuition -- what you're really looking for is a configuration data format, not a full-blown language. You need key-value pairs, probably nested, and... that may very well be it. It may not even matter if the data format can explicitly specify data type.

Dhall and Cue are no doubt excellent and amazing for complicated use cases, but there are a lot of use cases where using them is akin to taking the VTOL jet out to pick up a few things at Trader Joe's.

fesc · on July 3, 2022

I disagree.

Yeah, you always start out with K/V pairs but it never takes long until you need the same configuration but just with a slight tweak here or there. For example consider the same configuration but for different running environments.

People always come up with custom solutions per tool which some sort of metaprogramming, again in YAML. This is just bonkers IMO.

So why not just use a proper language from the beginning?

BiteCode_dev · on July 3, 2022

I'm a big fan of Cue, but it's stuck in Golang.

If I where to use it in Python, I would have to code a parser, then implement the entire type logic mysql.

And I bet this is why it's not more used: JSON or YAML are comparatively easy to implement because you just need the parser. You don't have a full featured language with a set based typing system on top.

pkrumins · on July 3, 2022

The answer to this mystery is that no one has heard about these esoteric languages.

BiteCode_dev · on July 3, 2022

I did, and they are very underrated. CUE in itself is brillantly designed: just enough power to be useful (conditionals, loops), but not enough to be dangerous (not turing complete). The idea that you can define data types and data values with the same language makes for very ergonomic configuration files, and it is compatible with YAML and JSON, being able to export to and validate them.

Any new tech is esoteric at first.

When I started Python 20 years ago, there were no job offer for it in my country.

oreilles · on July 3, 2022

> Dhall is a programmable configuration language that you can think of as: JSON + functions + types + imports

So like, Typescript but worse ?

HelloNurse · on July 3, 2022

More along the lines of Haskell but Javascript.

formerly_proven · on July 3, 2022

I have a weak spot for S-exprs and tend to use them when nobody will be looking. There's just something very nice and right-looking about them, and the "correct" pretty-printing is algorithmically very simple, parsing is easy, when indented they're easy to read and edit even if the file gets a bit longer etc.

samatman · on July 3, 2022

Why pay for what you're not using?

Sometimes you want to let a user edit some fields in a map/dict/whatever, and that's it. I use TOML here, personally, but YAML has some advantages for more complex data, especially if string keys are long.

Not everyone is trying to set up a Kube cluster.

teekert · on July 3, 2022

Yaml is nice, its a shame MS Word does not support it (with syntax checking of course) or its adoption would be even greater. It just looks nice and non coders easily get the hang of it.

js8 · on July 3, 2022

You're the 2nd person in the reply who worries about non-coders.

Honest question - what is the example use case here? Which configuration files in YAML are currently being handled by people who don't know any coding, and would be impeded by curly braces (and anything like JSON or XML)? I think this is only an imagined problem.

Also, I have a met a non-programmer lady from HR, who was able to download a VB script into Outlook and adapt it to her needs (which was some kind of automation). Richard Stallman made a similar observation with Emacs configuration. I think you quite underestimate what non-programmers can do.

AlphaSite · on July 3, 2022

I don’t know much about dhall, but Cue is one of the worst pieces of software I’ve ever used. It’s clearly been designed by someone who’s so proud of their own cleverness that they never considered if any of it is actually useable or not.

lytedev · on July 3, 2022

I've been using cue for all my personal config, so I'm curious! What are these terrible issues you've encountered?

jitl · on July 3, 2022

My YAML secret weapon is that JSON is more or less valid YAML. So, if you want YAML with a “safer syntax”, just write JSON instead. You can sprinkle in comments if needed.

pyrolistical · on July 3, 2022

With some caveats https://stackoverflow.com/a/44617419/21838

Dylan16807 · on July 3, 2022

Most of which disappear with 1.2

Don't use YAML 1.1

alexeldeib · on July 3, 2022

From a sibling: json can’t have the header for yaml versions, so 1.2 arguably doesn’t solve this.

If you force 1.2 without the header I suppose it works, but breaks compatibility?

Ygg2 · on July 3, 2022

There aren't many options for it.

YAML spec is a huge with bunch of gotchas. E.g. C# has two listed parsers. One is 1.2 compatible and unmaintained.

rurban · on July 3, 2022

So you get insecure YAML, in contrast to YAML 1.1.

Don't use YAML 1.2, use the secure subset 1.1. Or even better StrictYAML

Dylan16807 · on July 3, 2022

The secure subset?

I mean, things like "NO" were fixed later.

rurban · on July 4, 2022

Yes, but objects, refs, tags came all in 1.2.

StrictYAML got rid of all these, as all the safer YAML variants. E.g. perl5 still uses 1.0 for its cpan metadata abstraction, and still has to restrict these. I maintain the https://metacpan.org/pod/YAML::Safe module, which allows whitelisting of certain objects.

jmillikin · on July 3, 2022

https://john-millikin.com/json-is-not-a-yaml-subset

GlitchMr · on July 3, 2022

This is an issue with Ruby's YAML implementation. YAML 1.2 processors should interpret documents without YAML directive as if they were YAML 1.2 documents - see https://yaml.org/spec/1.2.2/#681-yaml-directives.

jmillikin · on July 3, 2022

See the last section -- a parser that interprets YAML as v1.2 by default will break for YAML v1.1 documents.

There's no way to determine whether a `.yaml` file is YAML v1.1 or v1.2 without a version directive, and most YAML documents are v1.1 because most YAML parsers default to v1.1 semantics.

I used Ruby as an example since it's easily available on most platforms, but you could also use Python or C++ or Swift or whatever language[1] you prefer. The underlying issue -- YAML not being a subset of JSON -- is universal.

[1] Note that some libraries, such as go-yaml, do their own thing and don't conform to either v1.1 or v1.2 semantics.

0x69420 · on July 3, 2022

on the front page right now:

- a git explainer centered around git internals, serving as an indictment of git's ux

- a parser for a restricted subset of yaml, serving as an indictment of yaml's excesses

on the front page yesterday:

- an rsync explainer centered around rsync internals, where part 1 details how rsync is wrapped in a dockerfile and a perl script in order to be made useful

- a sad thinkpiece on how baroque the web has become

on the front page the day before:

- a go utility weighing in at tens of source files that implements what should be a built-in feature of AWS

- an article on encapsulation in rust, the buried lede of which is "you must audit your transitive dependency graph in order to retain the benefits of rust"

why do so many technologies feel like self-harm?

brigandish · on July 4, 2022

Possibly it's because it's hard to get things right the first time, but sometimes it's better than the old way so there's a big shift to it, and once we know a better way it's hard to shift people to the better thing because the old thing was just good enough.

Come to think of it, this probably applies to any product.

kriz9 · on July 3, 2022

Exploration? Who knows, maybe we will all replace our yaml with perl at some point. Maybe not. But it is definitely good to see these somewhat crazy experimentations. That is how we move forward.

rexpop · on July 4, 2022

I don't understand what, in your perspective, would be the most convivial technology.

Please elucidate?

HelloNurse · on July 4, 2022

Convivial technology consists of tools to eat and drink: knives, chopsticks, crustacean access instruments, sporks, sushi conveyor belt systems...

Did you mean something else?

rexpop · on July 4, 2022

I meant it in the sense of Ivan Illich's "Tools for Conviviality" [0], in which he addresses even your misunderstanding[1].

> My purpose is to lay down criteria by which the manipulation of people for the sake of their tools can be immediately recognized, and thus to exclude those artifacts and institutions which inevitably extinguish a convivial life style. Paradoxically, a society of simple tools that allows men to achieve purposes with energy fully under their own control is now difficult to imagine.

> The hypothesis was that machines can replace slaves. The evidence shows that, used for this purpose, machines enslave men. Neither a dictatorial proletariat nor a leisure mass can escape the dominion of constantly expanding industrial tools

> The crisis can be solved only if we learn to invert the present deep structure of tools; if we give people tools that guarantee their right to work with high, independent efficiency, thus simultaneously eliminating the need for either slaves or masters and enhancing each person’s range of freedom. People need new tools to work with rather than tools that “work” for them. They need technology to make the most of the energy and imagination each has, rather than more well-programmed energy slaves.

0. https://archive.org/details/illich-conviviality

1. "After many doubts, and against the advice of friends whom I respect, I have chosen “convivial” as a technical term to designate a modern society of responsibly limited tools. In part this choice was conditioned by the desire to continue a discourse which had started with its Spanish cognate. The French cognate has been given technical meaning (for the kitchen) by Brillat-Savarin in his Physiology of Taste: Meditations on Transcendental Gastronomy. This specialized use of the term in French might explain why it has already proven effective in the unmistakably different and equally specialized context in which it will appear in this essay. I am aware that in English “convivial” now seeks the company of tipsy jollyness, which is distinct from that indicated by the OED and opposite to the austere meaning of modern “eutrapelia”, which I intend. By applying the term “convivial” to tools rather than to people, I hope to forestall confusion." (Illich)

pyrolistical · on July 3, 2022

My favorite kind of strict yaml is just to say no.

ainar-g · on July 3, 2022

There are basically two types of YAML file I've seen:

1. Application configuration. That would be better off as environment variables and/or JSON(5) for structured data.

2. CI/CD configuration. Those should really be scripts, because basically every such configuration file I've seen invents its own way of making conditionals anyway.

masklinn · on July 3, 2022

> CI/CD configuration. Those should really be scripts, because basically every such configuration file I've seen invents its own way of making conditionals anyway.

Hear hear. Every ci/cd is “what if we half-assed a terrible programming langage in yaml?” And apparently the world is ok with that.

BiteCode_dev · on July 3, 2022

There should be a law stating that if you write a new DSL, you are forced to write all the plugin for all the IDE and a debugger for it before you are allowed to publish it.

For 3 months I have been working with a team leader that had the project of making all entrypoints in the program accessible from a JSON based DSL. I tried to suggest we first expose a nice regular ergonomic API, and if we can't satisfy our requirement, we then add the DSL on top.

Nope.

Got the PR this week. Now he leaves next month, so we will have to maintain that.

jeroenhd · on July 3, 2022

Having used yaml for anything from configuration to Gitlab CI pipelines, I can't see how JSON could possibly improve the situation. It's an ugly subset of YAML more readable by computers at the expense of being readable by humans.

Yes, if you want, you can write some terrible YAML. It's your project though, you don't need to make your YAML terrible. The same is true for everything else about your configuration, code, and infrastructure. I've seen some terrible JSON configuration that used the fact that some JSON parsers will overwrite subsequent reuses of keys as a method of documentation, for example.

I know that in YAML 1.2 you can just enter JSON into a YAML file and it'll work if that's what you want. I haven't found a real life example of why that would be better, though.

If you're going configuration format purist, use XML. It's actually good at requiring you to follow the schema and there are XML parsers for absolutely any platform. You can write conditions in an accompanying XSLT file for "active" configuration.

masklinn · on July 3, 2022

> Having used yaml for anything from configuration to Gitlab CI pipelines, I can't see how JSON could possibly improve the situation.

How did you even get that idea from GP’s comment?

Ci/cd is literally a different item from the json/config file one.

woojoo666 · on July 3, 2022

Most docker compose files I've seen are not complex enough to warrant scripting, but look way better as YAML than JSON. In those cases I think StrictYaml would be a great fit.

Also scripts are very different from configs. Configs are declarative, (most) scripts are imperative. If your config is complex enough then maybe you should move some logic into imperative code, but you can still have a config on top of that.

kaichanvong · on July 3, 2022

I like XML or the no. Going off to do something, entirely non-computer based could be nice. further onwards from that you might have a nice chat in one of those hidden inter-racial dimensional pubs with neither futuristic or westworld themes.

smcl · on July 4, 2022

What on earth does this mean

corytheboyd · on July 3, 2022

You mean just to say false

shaicoleman · on July 3, 2022

StrictYAML removes features that might be useful for some usecases, such as Node anchors+Refs and Flow Style.

I don't think the cost of an additional standard is worth it in this case.

While YAML has issues, they aren't much of problem if you use a linter, such as yamllint [1]. This can be enforced as part of the CI process.

1. https://github.com/adrienverge/yamllint

Dylan16807 · on July 3, 2022

So the documentation talks about the problem of 「null」being confused with 「"null"」.

But the solution is to make it impossible to write 「null」 at all? I don't think I would have suggested that one...

hombre_fatal · on July 3, 2022

It shows on the docs how to decode ‘a: null’ with the schema so I’m missing the part where you can’t use null.

Dylan16807 · on July 3, 2022

Oh I think I found the page you mean? https://hitchdev.com/strictyaml/using/alpha/scalar/empty/

It looks like you need to use an empty string and you can tell it to translate to None. That's better than nothing, but it's still basically an inability to use actual null here.

hombre_fatal · on July 4, 2022

I think I'm misunderstanding you, but StrictYAML's biggest departure from YAML is that it doesn't try to guess types at all. Everything is decoded as a string by default until you specify otherwise with its schema system.

Here you can use `NullNone` to parse `null` into None.

    from strictyaml import Map, NullNone, Int, load

    schema = Map({"a": NullNone() | Int()})
    load("a: null", schema) == {"a": None}
    load("a: 7777", schema) == {"a": 7777}

Dylan16807 · on July 4, 2022

It doesn't guess, but also the data can't tell it. It's not just that you get a string by default, it's that everything starts as a string and then goes through post-processing. If you want to distinguish between 'any string' and 'not a string', you can't.

> Here you can use `NullNone` to parse `null` into None.

That's the worst possible outcome for poor Christopher Null.

The best you can do for him is use "a:" to mean null but at that point you're not really dealing with nulls, you've just gone with strings and used an empty string.

benibela · on July 3, 2022

Simple solution: Make it impossible to write 「"null"」

vsajip · on July 3, 2022

I developed a configuration format which is similar to, and a superset of, the JSON format. It's not new - it dates from well before its first announcement in 2008 - and has the following aims:

* Allow a hierarchical configuration scheme with support for key-value mappings and lists.

* Support cross-references between one part of the configuration and another.

* Provide a string interpolation facility to easily build up configuration values from other configuration values.

* Provide the ability to compose configurations (using include and merge facilities).

* Provide the ability to access real application objects safely, where supported by the platform.

* Be completely declarative.

It's similar to newer formats such as JSON5, HJSON, HOCON and similar but offers a number of features [0] which they don't, as indicated by the above list. It's not intended to occupy the niche where you find things like Cue, Jsonnet, Dhall and similar.

It was just never especially publicised when first implemented for use in Python projects, but it now also has implementations for the JVM, .NET, Go, Rust, D, JavaScript [1], Ruby and Elixir (all BSD-3-Clause licensed) and it would be great to get feedback on the project from the HN community.

[0]: https://docs.red-dove.com/cfg/intro.html - description of features and comparison with other similar systems

[1]: https://docs.red-dove.com/cfg/playground.html - uses the JS implementation to create an interactive playground

Kab1r · on July 3, 2022

The need for StrictYAML makes me realize how unnecessarily complex YAML is. I think choosing to make YAML a superset of JSON was a smart idea to encourage adoption, but as YAML has grown in popularity it has outgrown the feature. It reminds me that there a constant tradeoff—between legacy comparability and simplicity—that applies beyond configuration formats.

HelloNurse · on July 3, 2022

Since JSON is already terrible on its own, making YAML a superset of JSON is a great example both of "putting lipstick on a pig" and of considering "it works in the common case, let's try something even harder" an acceptable design standard.

adontz · on July 3, 2022

Last few years I was playing with configuration files and my final thought was always, why not to have a Python module imported for configuration just like Django does?

Sometimes I need to substitute variables.

Sometimes I need to generate similar blocks from templates.

Sometimes I need to read part of configuration from environment variable.

Simply importing Python module was always easier.

mixmastamyk · on July 3, 2022

It works quite well until you need a non-developer to change it. Or not using Python, or multiple technologies.

adontz · on July 3, 2022

So you believe there are engineers who can edit YAML, but cannot edit Python?

Terraform introduced a brand new language, Pulumi uses JavaScript.

strictyaml website has examples in Python, that's why Python.

mixmastamyk · on July 4, 2022

non-developer

adontz · on July 5, 2022

Who are these non developers who can edit YAML but not Python?

mixmastamyk · on July 7, 2022

Everyone. They might be able to handle foo = "bar", but some will not be able to handle the quotes, or understand the difference between 123 or "123".

If you use anything else from Python lang they drop like flies. def? loop? Out of the question. I've even seen so-called IT people not able to handle scripting.

.ini files are about the only thing you can count on, (almost) guaranteed that a non-developer can handle.

nathants · on July 3, 2022

this is a good idea, but difficult to nail in scope and multi-lang support. i’ve made similar attempts[1,2]. tbh if this had go support i’d probably try it today.

json, yaml et al are ways to declare literal data. this is good. they are fine.

the issues always come from what the data is used for. nailing your schema, making your data structures as simple as they can be and no simpler, this is where the engineering happens. this is the hard part. literally all that matters.

not validating arbitrary data inputs is obviously a bad idea. whether you validate them via a high level library or tediously by hand[3] isn’t very important.

what is important is that the data structures are sane, simple, and stable. if they are easy to describe, they might be a good idea. if the approach the complexity of general purpose pl, they probably aren’t.

most literal data schemas are too broadly scoped. too general. github actions, other ci, k8s, etc. they have too many knobs, too many permutations. this is not a feature, it is a failure of design.

good schema validation won’t fix broken design. it’s unrelated.

1. https://github.com/nathants/py-schema

2. https://github.com/nathants/clj-schema

3. https://github.com/nathants/libaws/blob/ae48040911bf2c0554da...

bschwindHN · on July 3, 2022

Making my obligatory comment that JSON5 is a sane alternative to YAML. Better than TOML too for more deeply nested structures.

rubatuga · on July 3, 2022

I’m hoping NestedText will take the place of YAML:

https://nestedtext.org/en/stable/

twarge · on July 3, 2022

“Ability to read in YAML, make changes and write it out again with comments preserved.”

How are the comments preserved in the Python representation? Is it some kind of ordered dict?

andreareina · on July 3, 2022

Probably the Map() class actually takes a series of nodes, that could either be comment or a key/value pair (and validates that you don't have duplicate keys). tomlkit does something similar.

ewuhic · on July 3, 2022

What was the name of that serialization language “for humans” featured on hn recently as someone's long-going personal project?

mixmastamyk · on July 3, 2022

Strict yaml works well and avoids many common problems. I recommend it for most uses of yaml and quite a few others.

bvrmn · on July 3, 2022

Schema requirement is very daunting part of StrictYAML. Basic types besides string are useful.

BiteCode_dev · on July 3, 2022

XKCD 927, once again.

I would rather avoid YAML all together.

Small human editable files: TOML.

Big machine editable files: JSON.

And I validate all that with pydantic.

I wish CUElang was ready for prime time, that would solve all this at once.

nrvn · on July 3, 2022

I believe YAML has deserved the motto:

"Everyone hates YAML. Everyone writes a lot of YAML."

d12bb · on July 3, 2022

My biggest gripe with YAML is significant whitespace and the inability to use tabs for indenting, so still a No for me.

danjc · on July 3, 2022

Tangential but before I get to validating YAML I’d really appreciate recommendations on a decent JavaScript YAML parser/serializer.