I don't like YAML and would like to move on, but I hope we don't move onto this.
I think it's crazy that when I add a string to an inline list, I may need to convert that inline list to a list because this string needs different handling. I think it's crazy that "convert an inline list to a list" is a coherent statement, but that is the nomenclature that they chose.
I don't like that a truncated document is a complete and valid document.
But what is most unappealing is their whitespace handling. I couldn't even figure out how to encode a string with CR line endings. So, I downloaded their python client to see how it did it. Turns out, they couldn't figure it out either:
I wish people would stop trying to write programs for which there are no interpreters, compilers, or linters:
name: Install dependencies
run:
> python -m pip install --upgrade pip
> pip install pytest
> if [ -f 'requirements.txt' ]; then pip install -r requirements.txt; fi
That is a program that is hiding in the bowels of a "nestedtext" document ... It is no better than a program that is hiding in the bowels a JSON or YAML document.
We all have to deal with this, but it is beyond stupid.
I don't think it matters much if this is inline or in separate file. If you want to test your tests, "yq -r .run input.yaml | sh -e" works as well.
In fact, if I really wanted to test my tests, I'd say that directly testing the corresponding clause is the more comprehensive approach. For example, what if someone accidentally changes the line to read:
run=/path/to/install-scriptq
? then your test of "install-script" will not catch anything. But if your test runs "yq -r .run | sh -e", then it will catch that error. And you can still forward to a script if you wanted to.
So let's keep inline scripts, they are very reasonable methods for just a few commands.
Depending on the source control tool you may lose syntax highlightning, you, most likely, lose linters and even copying those multi-line commands to shell becomes cumbersome. I consider inlining example from GP's comment awful
It would be nice if YAML wasn't horrendously abused the way it is. You have CI pipelines that let you construct DAGs to represent your builds, but you need several thousand lines of YAML and a load of custom parsing to get programming constructs in the string types, for example. And then each provider has its own way of providing those.
I don't have to re-read manuals describing how to do if/else in Ruby or Java or Lisp, but as soon as yaml and some 'devops' tooling is involved, I have to constantly jump back and forth between the reference and my config.
The main point being that the problem isn't the file format but the products that continue to push it, presumably because hacking stuff on top of `YAML.parse` is less effort than designing something that fits the purpose.
Yeah. A lot of times I find myself thinking YAML is like a really awful programming language. You can sort of do conditional logic and loops, but usually I find it hard to follow what's going on.
For build systems, I always liked the idea of Gradle where the core functionality was simple and declarative, but with the option to use a real programming language for things that weren't simple. For example, integrating installers or form builders (pre-processing) into a build are things I would consider non-trivial if there aren't official plugins, but it was still relatively easy to do with Gradle.
The biggest problem I always had with Gradle was that I didn't like Groovy and I always though there was a missed opportunity to have a statically typed build system with a solid API/contract and all the fancy tooling like auto-complete that you get with statically typed languages.
I see JSON5 mentioned a lot in the comments. In terms of CI / build systems, I feel like something built with JSON5/TypeScript could be really good. I'd be really happy using TypeScript for configuring things like build systems where there shouldn't really be an argument for needing it to be usable by non-programmers.
Personally I feel like I've spent way to much of my life debugging YAML syntax issues.
If you're happy to go lispy, there's Babashka [1], a Clojure without the JVM. It has built-in support for 'tasks' designed to make writing build scripts easy.
My experience with Kotlin gradle scripts is worse than Groovy. For example, given the following valid groovy/kotlin gradle program:
dependencies {
}
What would you expect to see between the curly braces? IntelliJ IDEA which supposedly has full support for the gradle DSL both for Groovy and Kotlin offers only generic suggestions. Common function calls such "implementation()" or "testImplementation()" are not suggested. If you do use those functions, no suggestion is made for their parameters. Because Gradle's DSL is built on top of a general purpose language, it loses the benefits of a DSL (constraining the set of possible configurations and guiding the user towards valid configurations).
The key benefit of the Kotlin DSL is that in this precise example, IDEA does suggest valid stuff:
https://imgur.com/a/vFYNIU1
Kotlin DSL is miles ahead of Groovy in terms of discoverability and IDEA integration. With Groovy DSL, most of the build script is highlighted with various degrees of errors and warnings; with Kotlin DSL, if something is highlighted, it is a legitimate error, and vice versa - if no errors are detected by IDEA, then it is almost certain to work.
There were rough spots of IDEA integration a couple years ago, but now it is close to perfect, within Gradle's limits of course (due to sheer dynamic nature of it, some things are just not possible to express in a static fashion, unfortunately). The biggest obstacle to Kotlin DSL use might be that some of the plugins use various Groovy-specifc features which are hard to use from Kotlin, but thankfully most of the plugins either fix those, or are rewritten in Java or Kotlin instead.
There's a huge gap in Java build tool space for a tool that is simple and easy to learn and can cover 90% of projects' requirements. I have this feeling that we're in the "subversion" days of java build tools and the day someone introduces "git" people will wonder why we suffered with Gradle and Maven for so long. If I had time I would be looking into building this.
predating Gradle was a tool called gant. It was simple, intuitive and did 90% of what every project could want. Ironically it was Groovy based as well. But instead of the Gradle arcane magic based configuration it was literal, direct, a simple extension of Ant that came before it. I liked it much better, but someone decided they could make a business out of Gradle and gant got deprecated and here we are.
I found it fairly simple to build Gradle plugins with Kotlin. If anything, the problem was just having the patience to actually find the right documentation in the first place, and understand what was being described. The main problem I faced there was that I wanted a plugin to configure dependencies for the project it would run against and the docs around dealing with dependencies and detached configurations were a bit confusing.
I do find it curious that a lot of these tools get seen as basic task runners despite offering much more potential.
It's always the same trajectory with declarative programming. It starts with "it's just configuration, we need something simple". Then users come with use cases which are more complex. Then you have programming language on top of configuration language syntax.
Very much so. A good few years ago I got annoyed that I couldn't have change mutt configuration the way that I wanted, because it has a built in configuration language which doesn't allow complicated conditionals etc.
(There are workarounds, and off-hand I can't think of a great example, but bear with me.)
In the end I wrote a simple console-based mail-client, which used a Lua configuration file. That would generally default to using hashes, and key=value settings, but over the time I used it things got really quite configurable via user-defined callbacks, and functions to return various settings.
For example I wrote a hook called `on_reply_to`, and if you defined that function in your configuration file it would be invoked when you triggered the Reply function. This kind of flexibility was very self-consistent, and easy to add using an embedded real language.
Later I added some hacks to a local fork of GNU Screen, there I just said:
* If the ~/.screenrc file is executable, then execute it, and parse the output.
That let me say "If hostname == foo; do this ; otherwise do this .." and get conditionals and some other things easily. Another example was unbinding all keys, and then only allowing some actions to be bound. (I later submitted "unbindall" upstream, to remove the need for that.)
What's really sad is that XML had a much better ecosystem around this for ages. I'd very much rather deal with XQuery or even XSLT to construct build trees, than the current crop of ad-hoc YAML preprocessors. At least the XML stuff had a consistent type system underneath!
XSLT is an absolute horror and not something I would want to deal with again. It feels like some weird academic experiment in an XML declarative programming language that should never have made it to print.
If something needs the flexibility of a programming language, why not use a real one that's been well tested for writing other programs? These various config file programming systems always end up creating something notorious that everyone tries to avoid having to work on.
XQuery is, in many ways, XSLT with better syntax. It doesn't have the pattern-matching transforms that are the T in XSLT - but for configs, I don't think it makes a big difference.
Also, I don't think many realize that the stack has evolved since early 00s. XSLT 1.0 was a very limiting language, requiring extensions for many advanced scenarios. But there's XSLT v3.0 these days, and XPath & XQuery v3.1, with some major new features - e.g. maps and lambdas. Granted, this doesn't fix the most basic complaint about XSLT - its insanely verbose syntax - but even then, I'd still take XSLT over ad-hoc YAML-based loops and conditionals.
I will take the verbosity of XML any day over YAML wrestling (complex YAML configs of course). There is simply too many "implicit rules" for YAML. it's why I prefer python over ruby and perl. Generally though TOML has been good enough for me to do lots of fairly large config files that are easy for humans and machines to parse.
XML died because too many configurations turned what should be a 'prop' into an inner tag -- and it doesn't help that XML doesn't really give guidance as to when to use which. And, of course, when you deserialize XML, the innerText is always in a very strange place as to not really being clear "what the right way to handle it" is.
Honestly, I think using an embedded scripting language, like lua or even javascript, would be a much better fit for these use cases than trying to make yaml do something it wasn't designed for.
Ironically, having used cdk8s[1] for dealing with kubernetes infrastructure, that's the one thing where I've actually preferred yaml. That said, k8s resource definitions are pure config so there's no need to try and hack extra bits on top of a serialized data structure.
I really like approach of buildkite CI -- they use yaml, but this yaml can be produced by an executable script.
So you write yaml by hand for trivial cases, but once it get complex, you can just drop back to shell/python/ruby/node/whatever, implement any complex logic, and serialize results to plain yaml.
Author seems to use misfeatures of a particular implementation to tar all implementations with. The round-tripping issue is not a statement about YAML as a markup language, much in the way a rendering bug in Firefox is not a statement about the web.
Stepping back a bit, YAML is good enough, and this problem has been incrementally bikeshedded since at least the 1970s, it is time to move on. Human-convenient interfaces (like YAML, bash, perl) are fundamentally messy because we are messy. They're prone to opinion and style, as if replacing some part or other will make the high level problem (that's us) go away. Fretting over perfection in UI is an utterly pointless waste of time.
I don't know what NestedText is and find it very difficulty to care, there are far more important problems in life to be concerned with than yet another incremental retake on serialization. I find it hard to consider contributions like this to be helpful or represent progress in any way.
If you can write a bad YAML document because of those mis-features/edge cases, I'd say you've already lost.
Humans are messy, but at the end of the day the data has to go to a program, so a concise and super simple interface has a lot of power to it for humans.
Working at a typical software company with average skill level engineers (including myself), no one likes writing YAML. But everyone is fine with JSON.
I think it's a case of conceptual purity vs what an average engineer would actually want to use. And JSON wins that. If YAML was really better than JSON, we'd all be using that right now.
So does it really matter if YAML is superior if >80% of engineers pick JSON instead?
I would argue that you can write something poor and/or confusing in any markup language that is sufficiently powerful.
Conversely, if a markup language is strict enough to prevent every inconsistency, then it's not powerful enough or too cumbersome to use to be generally useful.
I'd say that YAML is anything but conceptually pure, with all the arbitrariness, multitude of formattin options, and parsig magic happening without warning.
If you want conceptual purity (and far fewer footguns), take Dhall.
> Stepping back a bit, YAML is good enough, and this problem has been incrementally bikeshedded since at least the 1970s, it is time to move on
Nah, in the 1970s we had Lisp S-expressions that completely solved the problem, and everything since then has been regressions on S-expressions due to parenthesis phobia.
After hearing that thing about the country code for Norway, I became convinced that YAML has to just die. Become an ex-markup language. Pine for the fjords. Be a syntax that wouldn't VOOM if you put 4 million volts through it. Join the choir invisible, etc.
S-expressions don't solve the problem at all, you just get to fractally bikeshed all over again about what semantics they have and what transformations are or aren't equivalent. Does whitespace roundtrip through S-expressions? Who knows. Are numbers in S-expressions rounded to double precision on read/write? Umm, maybe. How do I escape a ) in one of my values? Hoo boy, pick any escape character you like and there's an implementation that does it.
S-expressions don’t completely solve the problem: they don’t have a syntax for maps, and in practice there are at least two common incompatible conventions: alist or plist?
Obviously the application has to interpret the Lisp object resulting from reading the S-expression, just like it has to interpret any JSON, YAML, or anything else that it reads. So for maps you can, as you mention, use alists or plists. Regarding other stuff mentioned: none of the encodings are supposed to be bijective (the writer emits the exact input that the reader ingested). Otherwise, for example, they couldn't have comments, unless those ended up in the data somehow. There is ASN.1 DER if you want that, but ASN.1 is generally disastrous.
Stuff like escape chars were well specified in Lisps of the 1970s (at least the late 1970s), including in Scheme (1975). Floating point conversion is a different matter (it was even messier in the pre-IEEE 754 era than now) but I think the alternatives don't handle it well either. You probably have to use hexadecimal representation for binary floats. Maybe decimal floats will become more widely supported on future hardware.
A type-checked approach can be seen in XMonad, whose config files use Haskell's Read typeclass for the equivalent of typed S-expressions.
Solutions for this problem that I've used in my own S-expression config files:
1. Use only alists for maps because they prevent off-by-one errors.
2. Allow plists because they're less verbose than alists and use reader macros to distinguish them, and allow the reader macro definitions to be in the same file.
Most of the time I use option 1 because it's simpler.
I would argue that, in a data markup language, there shouldn't be a syntax for maps. Whether a given sequence should be treated as key-value pairs, and whether keys in that sequence are ordered or unordered, is something that is best defined by the schema, just like all other value types.
>Human-convenient interfaces (like YAML, bash, perl) are fundamentally messy because we are messy
I don't know what to make of this statment, it has so much handwaving built-in. The most charitable interpretation I can find is that by 'Human-convenient' you simply meant the quick-and-dirty ideology expressed in Worse Is Better: Does job, makes users contemplate suicide only once per month, isn't too boat-rocking for current infrastructure and tooling.
Taken at face value (without special charitable parsing), this statement is trivially false. Python is often used as a paragon of 'Human-convenience', I sometimes find this trope tiring but whatever Python's merits and vices its _definitely_ NOT messy in design.
Perl is the C++ of scripting languages, it's a very [badly|un] designed language widely mocked by both language designers and users. Lua and tcl instead are languages literally created for the sole exact purpose of (non-) programmers expressing configuration inside of a fixed kernel of code created by other programmers, and look at their design : the whole of tcl's syntax and semantics is a single human-readable sentence, while lua thought it would be funny if 70% of the language involved dictionaries for some reason. These are extremely elegant and minimal designs, and they are brutally efficient and successful at their niches : tcl is EDA's and Network Administration's darling, and lua is used by game artists utterly uninterested in programming to express level design.
'Humans are messy' isn't a satisfactory way to put it. 'Humans love simple rules that get the job done' is more like it. But because the world is very complex and exception-laden, though, simple rules don't hug its contours well. There are two responses to this:
- you can declare it a free-for-all and just have people make up simple rules on the fly as situations come up, that's the Worse Is Better approach. It doesn't work for long because very soon the sheer mountain of simple rules interact and create lovecraftian horrors more complex than anything the world would have thrown at you. Remember that the world itself is animated by extremely simple rules (Maxwell's equations, Evolution by Natural Selection, etc...), it's the multitude and interaction of those simple rules that give it its gargantuan complexity and variety.
- you stop and think about The One Simple Rule To Rule All Rules, a kernel of order that can be extended and added to gradually, consistently and beautifully.
The first approach can be called the 'raster ideology', it's a way of approximating reality by dividing it into a huge part of small, simple 'pixels' and describing each one seperately by simple rules. I'm not sure it's 'easy' or 'convenient', maybe seductive. It promises you can always come up with more rules to describe new patterns and situations, and never ever throw away the old rules. This doesn't work if your problem is the sheer multitude and inconsistency of rules. The second approach is the 'vector ideology', it promises you that there is a small basis of simple rules that will describe your pattern in entirety, and can always be tweaked or added to (consistently!) when new patterns arise, the only catch is that you have to think hard about it first.
>and lua is used by game artists utterly uninterested in programming to express level design
Rather short sighted and dismissive to a successful programming language that's evolved over 20+ years. Lua is a great general purpose programming language that specializes not in "game making for non-programmers" but in ease of embedding, extension/extensability, and data description (like a config language). There's a whole section in Programming in Lua[1] to that effect. The fact that it's frequently used in games is credit to it's speed, size and great C API for embedding, not because of any particular catering to game designers.
You misunderstood me. I love lua and I wasn't being dismissive of it, I was using the first example that came to my mind to counter the claim that a convenient language has to be messy. Just because that was the example used doens't mean there is an implicit "and that's the only thing it's good for" clause I'm implying there: if someone said "Python is used by scientists utterly uninterested in programming to express numerical algorithms" would you understand that to be a dismissive remark against Python ?
Being used by non-programmers utterly uninterested in programming to solve problems is the highest honor any programming language can ever attain, because it means that the language is well-suited to the domain enough (or flexible enough to be made so) that describing problems in it is no different than writing thoughts or design documents in natural language. This is the single most flattering thing you can ever say about a language, not a dismissive remark.
It's really sad to see the pervasiveness of JSON. For one thing its usage as a config file is disturbing. Config files need to have comments. Second, even as a data transfer format the lack of schema is even more disturbing. I really wish JSON didn't happen and now these malpractices are so widespread that it's hurting everyone.
JSONC. JSON with comments. And even if your favorite parser does not support it natively it’s not so hard to add with a very simple pre-lexer step.
JSON schemas exist and they’re ok for relatively simple things. For more complex cases I find myself wishing I could just turn Typescript into some kind of schema validation for JSON.
> For more complex cases I find myself wishing I could just turn Typescript into some kind of schema validation for JSON.
Not sure if this is what you're looking for, and whether it's powerful and expressive enough for your use case, but you can use typescript-json-schema¹ for this, and validate with eg ajv.
I've struggled with this in Java recently and at first I used Jankson which supports the complete JSON5 spec, but later we figured out we could configure the standard Jackson JSON package to accept the things we actually need and actually use.
Seems to me that YAML just needs type/schema support to be less of a hurdle.
As an alternative, the encoding/decoding roundtrip using protobuf seems reasonable to me, catches the footgun of using floating-point version numbers (it becomes a parse error), whitespace/multiline concatenation being more obvious, and allowing comments (compared to JSON):
> Seems to me that YAML just needs type/schema support to be less of a hurdle.
Unfortunately YAML already got type support, which made it easier to roundtrip, but also insecure. Creating a type calls constructors with possible insecure side effects. Which was eg used to hack Movable Type.
JSON Schema is an official thing that exists and has implementations in all major languages. Personally I’m very glad that it’s an opt-in addition rather than a requirement.
I agree, but I would recommend JSON5 as the solution. Not YAML or this abomination.
JSON5 has many advantages:
* Superset of JSON without being wildly different. I know YAML is a superset of JSON but it's completely different too. Insane.
* Unambiguous grammar. YAML has way too many big structure decisions that are made by unclear and minor formatting differences. My work's YAML data is full of single-element lists that shouldn't be lists for example.
* Comments, trailing commas
* It's a subset of Javascript so basically nothing new to learn.
* It has an unambiguous extension (.json5). I think JSONC would be a reasonable option but everyone uses the same extension as JSON (.json) so you can never be sure which you are using. E.g. `tsconfig.json` is JSONC but `package.json` is just JSON (to everyone's annoyance).
* Doesn't add too much of Javascript. I wouldn't recommend JSON6 because it's just making the format too complicated for little benefit.
Unfortunately it doesn't really because of the extension issue I mentioned. Certain file names (like `tsconfig.json`) are whitelisted to have JSONC support, but any random file `foo.json` will be treated as JSON and give you annoying lints if you put comments and trailing commas in.
Tools that use JSON as configuration format could simply allow certain unused keys (e.g. all keys starting with #) and promise never to use them. Then author can write their comments with something like:
There's a lot of JSON tooling, and it's liable to interact badly with this. For example, a formatter might re-order the fields of a dict, moving "#comment-1" away from "version". Or the software that this JSON is for might error upon receiving unexpected keys (which is actually useful behavior, as that would catch a typo in an optional field).
Also, this doesn't let you put comments at the top of the file, or before a list item, or at the end of a line.
If you're going to change your JSON tooling to handle comments of some kind, you might as well go all the way to JSONC.
I've heard and read this multiple times. Why are you trying so hard to fit into a format that doesn't just support comments out of the box? What advantages is JSON offering you that you've compelled to bend over backwards to do this? It's exactly these kinds of workarounds that is making it super difficult stop such malpractices. It's just plain ugly. Please stop doing this.
You can't comment out a large section of config easily. For me, this is a relatively common use case for config files, so I take the position that JSON should be used for serialization only.
And I am just writing a JSON de/serializer to move my config from the current system to JSON. I worked on it today and yesterday and several days some time ago.
So you prefer the "good old" XML days? I'll take comment-less JSON over XML any day
(and it doesn't have to be comment-less... JSON with comments is a thing and VSCode has syntax highlighting for it - just strip out the comments before parsing).
Disclaimer: this is not a defense for YAML, I'm just trying to remove the rose tinted glasses some people view XML configs through.
As someone who has used XML configs they have a few problems:
- technical: missing comments are mentioned multiple times here so I will mention that while XML has comments they cannot be nested.
- socially: for some reason (maybe because XML is structured enough that this doesn't immediately collapse?) XML tends to just grow and grow. People start programming in XML too, and not only using XSLT or other standard approaches but also in completely proprietary ways.
At one project someone even wrote an authorization framework in Apache Tiles which allowed one to create roles using somewhere between 600 and 5000 lines of XML pr role. The benefit was of course that you could update the roles without touching the Java code.
(In case it isn't immediately obvious: it would have been extremely much simpler to edit it in Java, and people who know enough Java to fix it are available at the right price, the XML system had to be learned at work.)
Personally I just want it to be kept simple:
- a settings.local.ini and default settings in settings.ini or something to that effect
- if necessary, just use a code file: config.ts works just as well, or config.js if it needs to be adjustable at runtime without transpilation.
not easy to read, it's the java of config, pages of code that express very little, by the time you find what you need, you forget the context and what level of nesting you're on already. It's also more wasteful as a transport.
It compresses pretty decently and doesn't have too much of an overhead, in the example it being around 10% larger than JSON when compressed.
I'd argue that if one were to swap out JSON for XML within all the requests that an average webpage needs for some unholy reason, the overall increase in page size would be much less than that, because huge amounts of modern sites are images, as well as bits of JS that won't be executed but also won't be removed because our tree shaking isn't perfect.
Edit: as someone who writes a good deal of Java in their dayjob, i feel like commenting about the verbosity of XML might be unwelcome. I'll only say that in some cases it can be useful to have elements that have been structured and described in verbose ways, especially when you don't have the slightest idea about what API or data you're looking at when seeing it for the first time (the same way how WSDL files for SOAP could provide discoverability).
However, it all goes downhill due to everything looking like a nail once you have a hammer - most of the negative connotations with XML in my mind actually come from Java EE et al and how it tried doing dynamic code loading through XML configuration (e.g. web.xml, context.xml, server.xml and bean configuration), which was unpleasant.
On an unrelated note, XSD is the one truly redeeming factor of XML, the equivalent of which for JSON took a while to get there (JSON Schema). Similarly, WSDL was a good attempt, whereas for JSON there first was WADL which didn't gain popularity, though at least now OpenAPI seems to have a pretty stable place, even if the tooling will still take a while to get there (e.g. automatically generating method stubs for a web API with a language's HTTP client).
How WSDL and the code generation around it worked, was that you'd have a specification of the web API (much like OpenAPI attempts to do), which you could feed into any number of code generators, to get output code which has no coupling to the actual generator at runtime, whereas Pyotr is geared more towards validation and goes into the opposite direction: https://pyotr.readthedocs.io/en/latest/client/
The best analogy that i can think of is how you can also do schema first application development - you do your SQL migrations (ideally in an automated way as well) and then just run a command locally to generate all of the data access classes and/or models for your database tables within your application. That way, you save your time for 80% of the boring and repetitive stuff while minimizing the risks of human error and inconsistencies, with nothing preventing you from altering the generated code if you have specific needs (outside of needing to make it non overrideable, for example, a child class of a generated class). Of course, there's no reason why this can't be applied to server code either - write the spec first and generate stubs for endpoints that you'll just fill out.
However, for some reason, model driven development never really took off, outside of niche frameworks, like JHipster: https://www.jhipster.tech/
Furthermore, for whatever reason formal specs for REST APIs also never really got popular and aren't regarded as the standard, which to me seems silly: every bit of client code that you write will need a specific version to work against, which should be formalized.
same as to why REST is now not a hot thing anymore, the idea that your API is just a dumb wrapper around data model is poor api design.
API-driven development didn't really took off either, that is write your spec in grpc/OpenAPI and have the plumbing code generated in both ends. It's technically already there with various tools, but because of dogma like "code generation is bad", quality of code generators, or whatever reason, we're still writting "API code"
If this is your first time using Django, you’ll have to take care of some initial setup. Namely, you’ll need to auto-generate some code that establishes a Django project – a collection of settings for an instance of Django, including database configuration, Django-specific options and application-specific settings.
$ django-admin startproject mysite
Similarly, PyCharm doesn't seem to have an issue with offering to generate methods for classes (ALT + INSERT), such as override methods (__class__, __init__, __new__, __setattr__, __eq__, __ne__, __str__, __repr__, __hash__, __format__, __getattribute__, __delattr__, __sizeof__, __reduce__, __reduce_ex__, __dir__, __init__), implementing methods, generating tests and copyright information.
I don't see why CLI tools would be treated any differently or why code generation should be considered an anti-pattern since it's additive in nature and is entirely optional, hence asking to learn more.
First of all, just because a tool or project uses a pattern, it doesn't mean that it's a good idea. Second, code generation as part of IDE or one-time setup is something else.
I need to clarify: when I say that "code generation" is an anti-pattern, I'm talking about the traditional, two-step process where you generate some code in one process, and then execute it in another. But Python works really well with a different type of "code generation".
Someone once said that the only thing missing from Python is a macro language; but that is not true - Python has its own macro language, and it's called Python.
Python is dynamically evaluated and executed, so there is no reason why we need two separate steps when generating code dynamically; in Python, the right way is not to dynamically construct the textual representation of code, but rather to dynamically construct runtime entities (classes, functions etc), and then use them straight away, in the same process.
Unless you're dynamically building hundreds of such constructs (and if you do you have a bigger problem), any performance impact is negligible.
> Someone once said that the only thing missing from Python is a macro language
Ahh, then it feels like we're talking about different things here! The type of code generation that i was talking about was more along the lines of tools that allow you to automatically write some of the repetitive boilerplate code that's needed for one reason or another, such as objects that map to your DB structure and so on. Essentially things that a person would have to do manually otherwise, as opposed to introducing preprocessors and macros.
Wait, it the opposite. XML is designed to indicate context, and JSON is designed to hide context, you have a bunch of braces in place of context there, no matter where you are it's braces all the way down, like lisp.
not really, what enables you to have the context is shorter code. It's useless to have context reminders at the top and bottom of the thing, but not the middle and it's too damn long
For me XML and YAML are about the same. I think I'd also prefer comment-less JSON over both. However, XML wasn't that bad. With a decent editor and schema validation I would say there's a good chance I was more productive with XML than I am with YAML.
It's simple. For config files, choose the format that has the best tooling in your company and that supports comments. For data transfer, choose that supports schemas, backwards compatibility and good tooling (protobufs is just one e.g. that I'm most familiar with).
Actually, yes, I do. XML syntax was far from stellar, and much of the ecosystem (e.g. XML Schema) was drastically overengineered... but even so, we had gems like RELAX NG to compensate. On the whole, it was better than the current mess.
My opinion only: I love JSON because it lacks so many foot guns of yaml. If you’re doing lots of clever stuff with yaml you probably want a scripting language instead. Django using Python for configs made me fall in love with this. Spending years with the unmitigated disaster that is ROS xml launchfiles and rosparams makes me love it even more.
Yaml and toml are fine if you keep it simple. JSON direly needs comments support (but of course wasn’t designed to be used as a human config file format so that’s kind of on us). And not just “Jsonc that sometimes might work in places.”
Beyond that, I think we generally have all the things we need and I don’t personally think we need yet another yaml. =)
These aren't foot-guns per se, but I can think of another handful of grievances I have with JSON:
* JSON streaming is a bit of a mess. You can either do JSONL, or keep the entire document in memory at once. I usually end up going with JSONL.
* JSON itself doesn't permit trailing commas. I can measure the amount of time that I've wasted re-opening JSON files after accidentally adding a comma in days, not hours.
* JSON has weakly specified numbers. The specification itself defines the number type symbolically, as (essentially) `[0-9]+`. It's consequently possible (and common) for different parsers to behave differently on large numbers. YAML also, unfortunately, has this problem.
* Similarly: JSON doesn't clearly specify how parsers should behave in the presence of duplicate keys. More opportunity for confusion and bugs.
Running prettier (https://prettier.io) on each save will fix trailing commas for you. If you accidentally have one, it will just sneakily remove it and turn your document into one that is valid.
It may have been a good or bad decision. But comments were intentionally left out of JSON to avoid obvious ways to sneak in parsing directives and thus incompatibilities between different JSON-parsers.
If I had a penny every time someone tried to parse xml using a regex, if that classifies as a parser. Those are 100% incompatible with everything else.
Easiest way to demonstrate how wrong that is, is to throw in a comment in the example document ;)
the funny thing is that json doesn't even need commas, they essentially act as whitespace, any amount or no amount would make no difference in the meaning of the document.
And the flip side of that with YAML is you can stream it, but you don't know once you've gotten to the end if it was the whole document without some user defined checksum mechanism.
Ran into a great bug with the INI format which has the same issue. The application would read the config file on modification but if you just wrote over the file it would sometimes read the config before the file was fully written. Have to use a temp file and move it rather than just edit it.
I believe that's only true if one were to load YAML via the "SAX"-style per-event stream, and not the "object materialization" that normal apps use (aka `yaml.load_all` or JAX-B objects) since in those more data-object centric views, where would one put the processing events for those markers?
I also originally expected `yaml.parse(...)` to eat them as it does for comments and extraneous whitespace, but no, it does in fact return dedicated stream events for them, so TIL
> Django using Python for configs made me fall in love with this.
I also started advocating in-language configuration files (Python for Python, but also Lua for Lua, etc) a number of years ago because it lets you do really useful things (like functionally generating values, importing shared subsets of data, storing executable references, and ensuring that two keys return the same values without manual copy/paste) all without needing to spec and use Yet Another Thing™ that does only a fraction of what the programming language you're already using already does.
That also implies that you can't just test a foreign config file without first reading and understanding what it does, as just using one would imply arbitrary code execution.
This is a place where Tcl excels. You can easily create restricted sub-interpreters that can't do anything dangerous. If you need more power for trusted scripts you just reenable selected commands.
JSON5 is the way to go. It supports comments and trailing commas. Unfortunately it's going to be difficult to supplant legacy JSON, which is so pervasive.
Except parsing JSON5 in browser is super slow. Native JSON.Parse doesn't support it, non-native parsnips are slow, and the only fast way to parse it is `eval()`.
The desire to use a single interchange format for all data is the problem. There are plenty of reasons to support comments and minor syntax issues that JSON itself dislikes for human consumable and interactive JSON. I'd think software JSON could be just that.
I’ve never liked YAML. For whatever reason, it always feels like working in a mine field. It comes from the same cargo cult of people who think the problem with human machine formats is that it needs to be “clean”.
Clean, of course to them means some bizarre aesthetic notion of removing as much as possible. Only it’s taken to an extreme. I wonder if the same people also think books would be better with all punctuation be removed to make it look “clean”?
It’s unhealthy minimalism, causes more problems than it solves. As soon as I see a project using YAML I cringe and try to find an alternative because god knows what other poor choices the developer has made. In that sense, YAML can be considered a red herring and I’m usually right. The last project I used that adopted an overly complex and build-breaking YAML configuration syntax had other problems hiding under the covers, and in some cases couldn’t parse it’s own syntax due to YAML’s overly broad but at the same time opinionated syntax.
By its very name (and the fact that the MEANING of the name flip-flopped in mid-flight after launch) you can tell that the designers of YAML had no clue what they were doing, because originally they named it "YAML" for "Yet Another Markup Language", when it clearly was NOT a markup language.
Only AFTER YAML had been around and in use for a few years did those geniuses actually realize that they had made a mistake in naming it something that it's not, and retroactively changed the name "YAML" to mean "YAML Ain't Markup Language", which was a too clever by half way of whitewashing the fact that they originally CLAIMED it was "Yet Another Markup Language", since they had no idea what a markup language actually was.
I prefer to use markup languages and data definition languages that were designed by people who are situationally aware enough to know what the difference between a markup language and a data definition language is, please.
Hard pass on YAML, whatever it stands for this week.
I've often heard this argument about YAML being "clean", but over time I have realized that they are conflating minimalism with cleaninless, when they are two different things. That realization is what it took for me to realize why I didn't like it. I did _not_ find it clean, I found it "messy" by virtue of the increased cognitive overhead. But it is minimal at least compared to other formats. Other formats appear cleaner to me.
I'll give my opinion as someone who has to choose among JSON, XML, TOML, and YAML about two years ago for a new project. Whatever I chose had to be easy for end-users who don't know the specification to to understand later.
Here were my thoughts on the options.
JSON - No comments -> impossible
XML - Unreadable
YAML - 2nd place. Meaningful indentation also made me worried someone was going to not understand why their file didn't work. The lack of quotes around strings was frustrating.
TOML - 1st place. Simpler than YAML to read & parse. It truly seems 'obvious' like the name says.
I haven't encountered any situations where I wish I had more than TOML offers.
I have nesting up to three levels deep. I use inline tables^ for the many innermost (or other few-element) tables. It's never seemed excessively verbose.
It isn't. YAML and JSON are much more proven than HCL. HCL is used for some relatively small products. Just making something more complicated doesn't make it better.
Proven in what sense? Several implementations are broken are incorrect. HCL is used in very large products as well. Just because it isn't the majority currently doesn't mean that it isn't a worthy choice. HCL isn't more complicated if used as an alternative to YAML or JSON, in fact, I would argue that it is simpler. It bridges the pros of YAML and JSON combined, and addresses the nested complexity of TOML. It really is IMO the best, but you of course are free to share a different opinion. However, I would encourage you to actually try it out and re-evaluate.
Why are unquoted keys so critical? I feel like one of the strengths of a DDL like JSON or XML is that it's easy to tell what the data (key-value pair or otherwise) is, while with YAML and others, understanding data-vs-structure can be challenging.
TOML can't decide if it's a super INI file or a JSON cousin. You can represent the same information using two completely different representations and you can mix both styles in the same document. Manually navigating and editing values is error prone and hard to automate.
In that case, you might want to have a look at JSON5: https://json5.org/
It is pretty niche, but attempts to improve upon JSON in a multitude of ways, one of which is the support for comments: https://spec.json5.org/#comments
I think it's crazy that when I add a string to an inline list, I may need to convert that inline list to a list because this string needs different handling. I think it's crazy that "convert an inline list to a list" is a coherent statement, but that is the nomenclature that they chose.
I don't like that a truncated document is a complete and valid document.
But what is most unappealing is their whitespace handling. I couldn't even figure out how to encode a string with CR line endings. So, I downloaded their python client to see how it did it. Turns out, they couldn't figure it out either:
>>> nt.loads(nt.dumps("\r"),top="str") '\n'