Hacker News new | past | comments | ask | show | jobs | submit login
The cargo cult of versioning (akkartik.name)
198 points by chauhankiran on Nov 11, 2017 | hide | past | favorite | 183 comments



Here’s the post’s core thesis:

> At this point you could even get rid of the version altogether and just use the commit hash on the rare occasions when we need a version identifier. We're all on the internet, and we're all constantly running npm install or equivalent. Just say "Leftpad-17", it's cleaner.

> And that's it. Package managers should provide no options for version pinning.

> A package manager that followed such a proposal would foster an eco-system with greater care for introducing incompatibility. Packages that wantonly broke their consumers would "gain a reputation" and get out-competed by packages that didn't, rather than gaining a "temporary" pinning that serves only to perpetuate them.

Pinning is absolutely crucial when building software from some sources out of your control, especially with larger teams or larger software. You cannot rely on the responsibility of others, be they 100s of upstream library authors, or 100s of other peer developers on your product. One among those people may cause a dependency to break your build, either by releasing a bad version, or foolishly adding a dependency without carefully checking its “reputation” for frequent breaking changes.

In either case, without pinning, your build is non-deterministic and sometimes starts failing it’s tests without any diff. You can’t bisect that. Your only remediation is manual debugging and analysis. Work stops for N engineers because everyone’s branches are broken.

I don’t think any level of community fostering is worth that kind of risk.


> just use the commit hash

Does the author honestly fail to realize that commit hashes are incomprehensible to most humans and look like noise, it takes extraordinary mental effort to compare them? While difference between 1.2.3 and 1.4.5 is instantly apparent to any human?

> Pinning is absolutely crucial when building software from some sources out of your control

Amen. I am surprised how the author does not recognize it.

In fact, the only idea that does not come as outright false immediately after reading is "if you change behavior, rename". But thinking about it for a bit, it is wrong too. First, naming is hard. Finding good recognizable name for a package is hard enough, if each BC break would require a new one, we'd drown in names. Second, behavior breaks are not total. If RoR or ElasticSearch releases a new version with BC break, they do not stop being essentially the same package just with somewhat different behavior. Most of the knowledge you had about it is still relevant. Some pieces are broken, but not the whole concept. New name requires throwing out the whole concept and building a new one, essentially. It is not good for incremental gradual change.


My reading of that was that "rename" in that context meant "rename your package from Foo-1 to Foo-2", not come up with a completely brand-new name.


Mine too, eventually. I am surprised that the writer was unable to make even that very simple point clear.


I thought the author did a decent job - he had two examples (Rails-5 and Leftpad-17) that made the point pretty clearly.


it takes extraordinary mental effort to compare them

Worse is that you can't just sort on hashes alphabetically and recover the correct version order. That alone is reason enough to use some kind of sortable versioning scheme.


> While difference between 1.2.3 and 1.4.5 is instantly apparent to any human?

The author's point is that the difference between 1.2.3 and 1.4.5 can be anything from a bugfix to the software going from self-hosted photo software to now it's driving a car. The assumption that users have the same perception of a major and minor version as the developer responsible for the versioning, is assuming that all humans understand versioning 100% identically.


You're overthinking this. The only significant difference between the versions 1.2.3 and 1.4.5 is that 1.4.5 is instantly recognized as the newer version. Can you say the same about two arbitrary commit hashes?


But what is the value in knowing that it is newer, if you know nothing else than that it is newer?


Preach it brother. I work primarily in the enterprise Java world and I'd never just blindly compile in a dependency of LATEST and keep my fingers crossed. Any version bumps in my POM dependencies block are intentional. The transitive dependencies are hard enough to deal with already without having to worry if my dependent binary footprint changed significantly from the last build.


Within a given team new dev always breeds bugs. Yet some persume that's somehow not true for upstream dependencies? I can't see making that assumption.

I would think the best approch is "trust but verify" ANY update to a dependency. A dependency might save you some time but it's not a free pass to be irresponsible. There's not such thing as FOSS.


> Pinning is absolutely crucial when building software from some sources out of your control, especially with larger teams or larger software. You cannot rely on the responsibility of others, be they 100s of upstream library authors, or 100s of other peer developers on your product.

In fact that's the entire reason why "semantics versioning" doesn't really work:

* you can never be certain that the maintainer actually follows it

* one user's breaking change is one maintainer's bugfix

* while some packaging systems attempt to automate and enforce it (e.g. Elm's) even that only goes so far due to type system limitations (e.g. changing acceptable values from [0, 4] to [0, 4[ as a side-effect of some change deep in the bowels of the library's logic is pretty much guaranteed not to surface if you don't have a dependent type system, and may not do so even then)


> you can never be certain that the maintainer actually follows it

I write a bunch of Rust. The Cargo package manager has several layers of "defense":

1. The "Cargo.lock" contains exact versions of all transitive dependencies. (And Cargo doesn't allow overwriting a version once published, and lock files even override normal package "yanks", so these version numbers are very stable identifiers.)

2. The "Cargo.toml" file contains manually-specified semver information. I can run "cargo update" to get the latest, semver-compatible versions of all my dependencies.

3. In the rare case that somebody messes up semver, I can just blacklist certain versions in "Cargo.toml", or add more specific constraints. I think I've done this maybe two or three times in thousands of dependency updates.

4. My own projects have unit tests, and of course, Rust does extensive compile-time checks. So I'm likely to catch any breakage anyway.

So the absolute worse-case scenario here is that somebody messes up (which is very rare). At this point, I can just file a bug upstream, lock a specific version down in "Cargo.toml", and wait for things to get sorted out.

I have zero need to "be certain". I'm quite happy with "It works nicely 99.9% of time, and it has an easy manual fallback when the inevitable problems occur."


It only "doesn't really work" because this is ascribing too much responsibility to it. Semver is a social protocol designed to assist in solving--but by no means completely solve--the technical problem of how to resolve dependency versioning. It only gives us the barest protocol to allow us to begin describing the problem, and permits us to attempt to build technical solutions on top of it. Nobody's pretending it's perfect, but if anyone has a better solution, feel free to propose it (and neither the OP's, nor "romantic versioning", are any better at solving this problem than semver).


Package registries should implement some sort of required automated integration testing. If a package owner wants to update their package with what they claim is a non-breaking change according to semver, then the new version must pass all integration test cases of the previous. This could also include crowdsourced testing of that package, that allows users of that package to add tests over time to that particular version, so as to mitigate owners who don't test their packages well enough.


Then you’re just defining “breaking change” as “change that breaks the tests.” Anyone with any programming experience knows that “but the tests passed” is never a valid excuse for a bug.


Some verification would be better than no verification. You still need to verify and test package upgrades and pin versions; but it would make the semver contract at least specified to some degree.


I think this is an excellent idea, particularly for typed languages. For a typed language package manager, you can run the following verifications:

- for a patch version, all exported (name, type) pairs of A.B.C are identical to those of A.B.(C-1)

- for a minor version, enforce first rule, but allow new types for new names.

I think the Elm package manager does this, based on other comments here in this thread.

Either way, that sort of automatic “exports” verification could be a base layer in a semver verification package manager, with integration tests layered on top.


There's a lot more to in than that in practice. For example, in a language with function overloading and implicit conversions, adding an overload may silently change the meaning of existing code.

Sometimes, this can get very gnarly. For example, C# has overloading, and it also has type inference for lambdas; and it tries to make these two work seamlessly. So, consider something like this:

   int F(Func<int, int> g) => ...

   string F(Func<string, string> g) => ...
This is perfectly legal - just a couple of overloads. Now, suppose we call it like so:

   F(x => x.ToUpper());
There's no type for x here, so compiler has to infer it; but it also has two possible overloads to deal with. The way it goes about it is by considering both, and seeing if the lambda "makes sense" - i.e. if the body compiles, and its result type (accounting for possible implicit conversions) matches that specified in the function type being considered.

Now, when it tries Func<int, int>, it doesn't work, because if x is int, it doesn't have a method ToUpper, so the body of the lambda is ill-formed. On the other hand, given Func<string, string>, x is string, ToUpper() is well-formed, and returns a string, matching the function type. So that's the overload that gets chosen.

Now consider what happens if someone adds a method called ToUpper to int: now the code that was compiling before, and that was calling ToUpper on string, will suddenly fail because of ambiguity.

In more complicated scenarios (e.g. involving inheritance), it is even possible to have code that silently changes its meaning in such circumstances, e.g. when overloading methods across several classes in a hierarchy. It'd be rather convoluted code, but not out of the realm of possibility.


For C ABI, this exists since quite some time: https://abi-laboratory.pro/tracker/timeline/libevent/ (though, there is not really a way to enforce).


It's a wetware protocol, not formal, mathematically certain concept.

You also can never be certain that the car is not going to run through pedestrian crossing on red light when you step in - but it doesn't mean that traffic lights are useless.

Semver is like traffic lights. For reproducible builds/increased certainty harden it with some kind of pinning or local copy.


> * you can never be certain that the maintainer actually follows it

That's a bad argument. You can also never be certain any third-party code works at all, or did not change without changing versions, or does not have subtle bugs that surface only in your particular use case, etc. If you dismissed the whole concept on basis that somebody could fail to follow it you wouldn't be able to use any third-party dependencies at all. At which point the question of versioning is kinda moot.

> one user's breaking change is one maintainer's bugfix

This is a "grey zone" fallacy. From the fact that there might be disagreements on the margins, does not follow there is not a huge majority of clear-cut cases where everybody agrees that something is a huge or tiny change, and in this huge majority of cases it is useful. Even if there's no definition that would work in absolute 100% of cases, it works just fine in 99% of them.


> That's a bad argument. You can also never be certain any third-party code works at all

Perhaps in the philosophical sense that you can never be certain of anything, this is true. But that only place that thinking leads is to staying under the covers all day.

You actually can achieve a high degree of certainty that third-party code works to some reasonable standard by testing it. Which is the whole point - once you have tested a version, and are content that it works, you wouldn't want to blindly switch to another version without again performing a similar set of tests.


If you are pulling in 200 dependencies you're going to be in the grey zone fairly frequently.


>* you can never be certain that the maintainer actually follows it

Elm enforces semantic versioning with its type system, and one can also print the diff of values/functions between separate package versions. In principle, this is doable.


API compatibility (showing that a package contains the same functions, which accept the same kinds of inputs and produce the same kinds of outputs) is only part of the problem-- it's harder to show that the functions actually do the same thing, but that's arguably more important than API compatibility. Typically, a compiler will catch a breaking API change; a behavior change will cause problems at runtime if a function suddenly starts doing something completely different.


> it's harder to show that the functions actually do the same thing

Effectively impossible, with a Turing complete programming environment.

(“Effectively,” because technically our physical computers have a bounded number of states and inputs, thus aren’t technically Turing complete, thus Rice’s theorem technically doesn’t apply.)


There are however tools like QuickCheck (https://hackage.haskell.org/package/QuickCheck) that can bombard the function with large amounts of random (but legal) input data and makes sure it's behaving.


Yes, true. But still, having the compiler reject breaking changes when there should be none already goes a long way


> you can never be certain that the maintainer actually follows it

The same happens for the proposed model ¯\_(ツ)_/¯


I think you and many other people misunderstood my criticism, it wasn't a criticism of semantic versioning per-se, but was rather than explanation of the issues with semver, which the proposed "scheme" indeed does not fix in any way, shape or form.

The proposed author claims that "You cannot rely on the responsibility of others" but their scheme fails in the exact same way.


Yep, I've always felt versioning is more or less a form of marketing for your software. It's handy that people are kind of using convention to say things like "this is a HUGE change" vs "oops, bugfix". But it's really just conventions of communication, which, in a nutshell, comes with zero guarantee .

Semantic versioning and automatic "transitive" dependencies sure are handy, especially in that "naive" phase of development. But man, once you start having to start tracking performance and security, it's time to turn off any magic update.

Or, in other words, when you've got people depending on you, you need to start putting your security hat on and assume that there can be a breach with every change, or your performance hat and assume the system will crash due to a performance problem. No way would I want any rebuild OF THE SAME SOURCE subject to change.


I have a crazy theory about versioning, except it's not the idea itself that's crazy, but what you have to do to get it.

Software versioning is a case of the XY Problem. We don't know what we want, but we know something that might help us get it (and we are wrong).

We don't care if you add behavior to your library (unless it fulfills a need we had or didn't know we had). What we really care about is if you take behavior away. And in some cases it's not even all the behavior. If I'm only using 20% of your library and you haven't changed it, I can keep upgrading whenever.

What semantic versioning is trying and failing to achieve is to make libraries adhere to the Liskov Substitution Principle (LSP). If you had a library that had sufficient tests, then any change you make that causes you to add tests but not change an existing test means that it should satisfy the LSP (tests that correct ambiguities in the spec notwithstanding). People can feel pretty safe upgrading if all of the tests they depend on are intact.

The difficult part of this 'solution' is that reusable code would have to have pretty exhaustive test coverage. I think we are already bifurcating into a world of library authors and library consumers, so I'm okay with this. We should feel comfortable demanding more than what we currently get from our library writers. In this world, what you version is your test harness, and not your code.


> Pinning is absolutely crucial

Precisely this.

The problem isn't with versioning, it's with non-deterministic builds. Once you have a large enough project that you can no longer keep manual track of version changes, you build tooling which checks for the latest version and opens a pull request (and subsequent CI build on the pre-merge commit) with the latest version. This keeps your project up to date and prevents the woes of non-deterministic building.


> Pinning is absolutely crucial when building software from some sources out of your control.

IMHO it's a bad solution to a real problem.

Too often, I've seen software with out of date libraries or frameworks, with versions set a the start of a project and never touched once since. In some cases it even lead to security flaws not being patched.

Another side effect is that hard pinning complete versions makes for somewhat hard packaging (if you try to follow some distro guidelines (like the Debian ones), package libraries separately and not bundle everything together).

In some extreme cases, avoiding the small and incremental breaks & fixes in API can lead to code bases so far from the modern APIs of their dependencies that throwing away the code and re-implementing from scratch is actually a better solution.

I feel that was is missing in most languages is a way to properly maintain APIs (like tools native in the language that help ensure no breaks in the API contract) and language that have concepts of API versioning and API negotiation (kind of like rest API or protocol version negotiation) native to it. If this was also combined with a backward compatibility policy of "at least maintain version N + version (N-1)" and clear deprecation notifications, it would make for smoother updates and makes most systems more reliable.

I would love to see a language with these items as core features.

But we are far from this situation. What I generally do instead is to carefully chose my dependencies in projects that have a good track record of not breaking their APIs too often while being correctly maintained and also I tend to try to keep the number of dependencies in my projects to a minimum. I almost never hard pin dependency versions.

It's no wonder that the MIL-STD-498 has a large focus on interfaces (Interface Requirements Specification (IRS) and Interface Design Description (IDD) specification documents). It's actually one of the harder parts to get right when designing complex systems (including complex pieces of software).


Pinning isn't enough. People push history changes, rename things and just delete stuff all the time. I've had this happen to be more than once and would now never consider a application production ready until all dependencies are either vendored or forked.


There are plenty of strategies for making your dependencies deterministic. At Airbnb, we use caching proxies in front of all third-party language package managers, so the first time someone in the org pulls “foo 1.2.3”, the source tarball is frozen forever. The package can’t be yanked upstream, force-pushed over, etc.


This is complex vendoring, nothing more.


Yeap


why not just check in your dependencies like Firefox or chromium or WebKit. with your proxy only people using your proxy get those benefits. with checkin everyone gets those benefits.


pinning while more reliable is not actually fully reliable. If you want reliable you should be checking in copies of your dependencies either directly or as sub modules. then if you want to upgrade you check in the latest. Pinning still allows the person in control of the 3rdparty repo to mess you up. Your reproducibility is still at their mercy.


That depends on the repo. Some do not allow changes (short of the repo itself failing). Some allow only withdrawing versions, but no changes.

But yes, you should either commit deps, or have your own repo/caching-proxy which will neither change nor drop old versions.


I just use the date. Period.


In the same vein, I do not understand the popularity of semantic versioning. Why would I trust hundreds of people that their non-breaking changes are really non-breaking to me?

And suddenly, the auto-scaling fails in production because something has updated to include a new bug.


As with all things in life, semantic versioning is a heuristic for a largely stochastic process. You and only you are responsible for whether your code is broken or not. Semantic versioning is just this one standard most of us have agreed to follow as a community to make that responsibility a little easier to bear.


I think I was being unclear.

I didn't mean that semantic versioning is bad, I meant that specifying your dependencies in terms of semantic versions is bad.


It looks like this article is nonsense.

The author is confusing the (sometimes bad) behavior of package managers with version numbers.

Version numbers signal the intent of the release, but don’t guarantee anything. Any change in a dependency might break your project so no update is safe, regardless of what the version number says. That’s why people pin version (or, at least should be why.) It’s not a problem of version numbers at all.

That doesn’t make version numbers useless, though. Understanding the intent of an update is important info. It lets you estimate the cost/risk of taking an update.

So the author suggests that changing versioning will somehow improve things, when what we actually want is for package managers to be conservative by default.

He actually suggests that the latest version with the same major version be taken. That would work as long as... (1) none of your dependencies ever introduces bugs or accidental compatibility changes in a non-major update; and (2) all contributors update dependencies in lockstep (since any new code contribution could be dependent on a feature or behavior of a newer version of a dependency). In other words, that doesn’t work at all.

What we want is for package managers to lock the dependencies by default and for package repositories to require a version update when committing a dependency update. In fact, better to take the version number out of it: lock dependencies to the commit hash and leave version numbers for what they are good for, to communicate intent.


The point about renaming major versions is baffling. First of all, if your suggestion is that you use Rails 4 vs. Rails 5, that's a version number, one that may be handled a little differently by your package manager (or it may not, if you already pin to major versions).

But the premise that a major version conveys no more information than a full rename is simply wrong. Is libfoo n + 1 more similar to libfoo n, or libbaz m? In extremely rare cases, it will be more similar to libbaz m than libfoo n, but in 99% of cases, that's not true. Even if that is not perfectly reliable, that probability conveys information.

P.S. I'm not commenting on the other ideas raised by the article.


Yeah, the old name + a new version number does indeed convey information; there's going to be some common ancestry, anyway. It also means that the users of the library aren't forced to deal with a sea of cutesy names, we've already got too many of those.

"Well, we need to upgrade the Epic Flatuence server to Smelly Gangrene in order to stay compatible with whatever they're calling the next version of Wild Dingo --"

"Screaming Fist."

"The Russian version?"

"No, by a guy out of Vancouver."

"Right. Next, let's talk about Your Mother ..."


Right, he isn't really talking about changing the way we assign version numbers to software projects.

Rather he's talking (in a rather confusing way) about changing the way we convert names-and-version-numbers into identifiers used in package management systems.

The convention he suggests is basically what Debian has always done for C libraries (putting the soname in the package name), so it's certainly a plausible way to manage things.


If I stop using Debian some day, it will probably be because of that habit of pushing version information into part of a package's name.

It completely breaks upgrades. Newer versions aren't compatible with older versions, yet there's nothing telling apt that it should upgrade other stuff depending on the older version first. Also, who is to say wether foo5.3 is newer or older than foo5, and since foo5 depends on bar3.2, while foo5.3 depends on bar3.2a, what package do you have to upgrade first?


libbar3 still has a package version of 3.2.1, libfoo5 still has a package version of 5.3.0. Debian putting soname’s into package names is actually quite handy as it allows side-by-side installation of multiple versions of a library with ease.


It's very useful for applications that don't guarantee API or data format compatibility between major versions. If you were running PostgreSQL 8, then under no circumstances would you want a silent upgrade to PostgreSQL 9, because 9 would refuse to read the data storage files from 8 and not start up: PostgreSQL required the administrator to handle migration between storage versions.


> a major version conveys no more information

I understand where you're coming from: a complete project name change is something very different. I also think this is something that's commonly misunderstood.

From the article:

> conveys no more actionable information

The key here is actionable. What will change in what you as someone who consumes this third-party code will have to do if you adopt a major version bump versus a new library? Unless the new version is API compatible with the old one, you're likely going to have to do some major work in incorporating the new version; if the API is backwards compatible, it's not really a major version bump. The further a different library is from the one you're currently using is of course going to require a different level of rework, but that's part of what a major version bump signifies. I see what you're getting at with respect to some arbitrary new library, but if the new major version (with a new name) is from the same maintainer, Rails2 vs Rails v2 isn't so different from what you're going to need to do in your code.

As for which name you change, I think that's where the rubber meets the road. What's the name the code is referred to in your application? That's the one that really matters, and where a name change is warranted. If the language affords module namespacing, the new library is just a new require/import. Major versions aren't drop-in replacements. If the libraries are namespaced, you can conceivably use them side-by-side in the same codebase as you migrate. This is potentially very powerful. As a practical example, in one Java library I've been working on that has a long history, it's able to use various versions of JUnit in the same testing suite exactly because the different versions are namespaced.

Another example at a different level is being able to install more than one version of, say, python or sqlite, on the same system. Some packages name the binaries with a suffix, e.g., python27 or python3, so you can include and have both of them on your path without conflict. When you install current versions of sqlite, the binary is explicitly sqlite3 so as to not be confused with older, incompatible versions.


> Unless the new version is API compatible with the old one, you're likely going to have to do some major work in incorporating the new version.

A new major revision of a library means that there are some non-backward compatible changes, and in many cases this means a limited number of fixes have to be done. Also in general it is expected that one will want to upgrade, so probably those changes are documented. Also hopefully the two libraries will share the same main concepts so using the new version will not be terribly different from the old.

Of course it is not always the case, see Angular, were two unrelated frameworks share the same name (I really think they should have changed the name).


> "A new major revision of a library means that there are some* non-backward compatible changes"*

I agree. As I mentioned in a comment to your sibling[0], I think there's a bit of a disconnect as to what people are thinking of renaming. With respect to what's changed, the major version number doesn't give you any additional information as to what has changed. You need to refer, as you mentioned, to the documentation, or the results of your own testing. That's what I'm getting at when I point out the operable word actionable.

I completely agree that the two libraries, if understood to be the same project, should likely share the same core concepts: after all, that's likely what attracted people to the library in the first place, and that likely only a limited number of fixes have to be done.

[0]: https://news.ycombinator.com/item?id=15678226


It’s relatively actionable as well. When someone says they want to migrate from Rails 4 to 5, the default response is “we should really consider doing that at some point and evaluate how much work it will take.” When they say we want to migrate from Rails 4 to Django X, the default response is “you’re going to have to justify a total rewrite”. That’s even true for some brutal version changes: going from Python 2 to Python 3 is still typically less work than going to Go or Java or some other language.

Angular might be the one exception I know of to this pattern.


I agree that there are degrees of difference that can be conveyed. I think there's a bit of a misconception of what it can mean to rename a library, however, that's often (and understandably easily) missed in this particular discussion.

There are at least three levels where the names matter: the project (e.g., Rails), the artifact (e.g., a particular gem), and the module(s) (e.g., what gets referenced by require). They're each important in their own way. I think renaming the Rails project instead of a major version bump is not what is typically the intent when talking about renaming.

Artifacts are a way of referencing a collection of files, and often there's some way of embedding metadata about the artifact in the name of it, though that's accidental, and naming these artifacts is independent (or at least not necessarily dependent) of the collection it represents: the same file can be found in any number of artifact builds. I think the practical implication is that you're renaming what is being required and referenced in the code itself.

So, you'd still have the same understanding that a migration from Rails2 to Rails5 would be bigger than one from Rails4 to Rails5, and that would be a whole different kettle of fish than moving to Django (or even Sinatra). I think we may be talking past each other as to the level where renaming (or really, new, additional naming) would take place.


There's basically another part of the version-tuple that we compress into the same position as the "major" version: the number of complete codebase rewrites that the code has experienced. Often V1 and V2 of an application, especially, will have effectively nothing in common.

Often they'll be developed by separate people who happened to inherit the name of the original, or "take up the reins" after the original became abandonware—but, either way, decided to just write a new one themselves rather than continuing on with the old. At that point, there is really no difference between calling it "libfoo (n + 1).0" and calling it "libbaz 1.0".

If you want to convey ancestry, you could add another preceding number, and that's often what people do—in the package name. For example, sqlite3 vs. sqlite4. If you wanted to make this "part of" the version tuple, you could: just have a super-major number before the major number.


I disagree. A great recent example is that DirectX 12 is more similar to Vulkan, than it is DirectX 11 or OpenGL 4. I would actually say it's quite common. I've seen numerous times where a new version of a project is more similar to other projects than it's previous iteration. The reason is simple: the field has progressed. Now certainly there are new "major" releases that are simply breaking changes with no fundamental changes, but I think it depends on your field: native libraries rarely change because it's well worn, the web changes every few days it seems like. But my point is that if we should distinguish between: conceptual changes, breaking changes and the iteration number (e.g. non-breaking changes and patches), and we want to simplify it, then we should merge breaking changes and conceptual changes, as described by the article.


The whole thing made me think of "A plan for the improvement of spelling in the English language"

http://www.plainlanguage.gov/examples/humor/marktwain.cfm


To the best of my knowledge, that's not a joke, even though I assumed it was the first time I read it. Mark Twain believed in spelling reform (https://en.wikipedia.org/wiki/English-language_spelling_refo...).

But I find claims that he didn't write it: http://grammar.ccc.commnet.edu/grammar/twain.htm


Though of course 'th' should be replace by 'y' not 'x', which has historic precedent.


Not actually true. It’s a common misconception that gave us fake pseudoarchaic old English like “ye olde whatever”.

That comes due to the confusion of the old thorn character Þ or þ which got all mangled and misread.


pronouncing it as a vowel when it was supposed to be a 'th' is psuedoarchiac, spelling/typesetting 'th' as 'y' is a kludge with precedent though.


In the sense that early printing presses didn’t have a thorn character, yes.

But it was an incorrect substitution then, and still would be now.


I've been studying paleography for about a year now, and I really don't find this to be true. This seems to be a common story, one that I've heard before and one which makes plenty of sense, but also one which seems to be false. Certainly, the use of <y> in place of <þ> for certain common abbreviations would've become most common in the early days of the printing press—not because presses couldn't have a thorn character, but because it didn't make sense to spend resources making one if it wasn't necessary. However, the substitution of <y> for <þ> in scribal abbreviations started happening in English manuscripts even before the introduction of the printing press in England.

In the most common scripts at the time (cursiva anglicana and later bastarda anglicana), <y> and <þ> were already very similar (the only difference being which of the two strokes had the descender—both letters have a closed bowl in these scripts), so similar that it was common in bookhands to write a small dot above <y> to aid the reader even when <þ> was still visually distinct. You can see this clearly in [1], written c. 1399. Use of either character to represent <th> was already becoming quite rare in the late Middle English period, and remained in use almost exclusively as a scribal shorthand; it's not uncommon to encounter manuscripts in which <þ> and <th> are mixed freely.

I can't say for certain why it is that <y> started to be substituted for <þ> in manuscripts. Obviously in the common hands of the time they're visually very similar, yet I'd imagine anyone who'd been educated as a scribe would be able to spot and write the difference. Perhaps it was a stylistic choice. But it did happen: you can see an example at [2]. This document was written c. 1445. Gutenberg's printing press is thought to have been invented at this time, but it wouldn't arrive in England until Caxton set one up c. 1476. It seems very unlikely to me that the printing press had anything to do with the initial substitution, though I'd imagine that the press was a major nail in thorn's coffin.

[1]: https://en.wikipedia.org/wiki/Confessio_Amantis#/media/File:...

[2]: http://medievalwriting.50megs.com/scripts/examples/chancery6...

(both of these documents are written in a bastarda anglicana)


I’ve heard similar objections but the general consensus from what I’ve read still seems to lean to printing being the primary cause.

That said, it’s rarely that black & white so it’s quite possible it’s a combination of everything.


I definitely agree that printing was the main cause of the usage of <y> becoming commonplace, and it's likely because printing substantially lowered the cost of written material—both producing and purchasing. It enabled widespread dissemination of written documents for the first time, so any conventions printers happened to adopt then coincidentally became widespread. The arrival of the press is similarly credited with the development of a standard written form of English, though that would take some time to complete.

I spent some time poring over facsimiles of early printed English books. Caxton is credited with printing the first book in English, actually a couple years before he returned to England, while he was still in Flanders. In that book, it looks to me like there's two glyphs, one which resembles <þ> and one which resembles <y>, but they're used completely interchangeably as if they're the same character. However, pages printed on an actual press usually wound up smudging a bit when the page was peeled off of the block, so the differences I perceived could very well just be artifacts of the printing process rather than intentionally distinct glyphs.

That said, early books that were printed in England do have a noticeably distinct <þ> character, but it's use is limited to the common scribal abbreviations, which themselves are quite rare in this material. I couldn't pinpoint any precise date for <þ> -> <y>, but after around 1560, those old scribal abbreviations seem to disappear almost entirely (I only found one document after that which had any such abbreviations—from around 1680(!)—and this document unmistakably uses <y>). It's always apparent whether <y> is intended in its modern sense or as a substitute for <þ> because the abbreviations are always indicated with a superscript.

I wonder if abbreviations in certain types of work were viewed as un-ideal. They certainly seem less common (but still present) in longer works and in works which seem to be a bit more formal. In short works, especially where space was at a premium (broadsides, chapbooks, newsletters, etc.), they are more common, which is understandable. This also seems to be a continuation of a trend that started in manuscripts, where fancier and more elaborately-illuminated texts started employing abbreviations less often. But at the moment I haven't the foggiest why they appeared at all in longer works—especially when print comes into play—since their usage is rare and doesn't follow any obvious pattern—perhaps the typesetter was just running out of <t>'s!

The picture is pretty muddy, and further complicated by the timing of the press's appearance because we can't really know if the development of <þ> -> <y> would've happened anyway, or if it would've simply remained a quirk of certain scribes.

Anyway, when I woke up this morning I had no idea that this was how I was going to spend my day, and it was fun to dig through all this stuff :)

Slightly off-topic: while I was busy skimming facsimiles for abbreviations, I came across this gem, which I find both interesting and... "irrationally amusing" (I don't know how else to describe it, but if I weren't so robotic I might've been giggling like a child): http://hdl.huntington.org/cdm/singleitem/collection/p15150co...


I get the feeling I would love hanging out with you and nerding out over this stuff. :)


I think no pinning and just taking the latest package every time is a really bad idea.

Bugs can creep into software even with the best of intentions, and it is useful for consumers of packages to have a choice whether to upgrade immediately or maybe wait until they have done some risk analysis. Also, the ability to roll back is invaluable.

I don't think forcing everyone to upgrade is going to force package maintainers to make zero mistakes.

Also the package-3, package-4 thing seems odd to me. I use Nuget and have used NPM / Gem / Haskell Stack on spare time projects. I would find it annoying to use this scheme because you have to hunt around searching in a different way to look for major upgrades compared to what you do for minor ones.

But most of all: I want my build to be reproducible, going back into history. Having the pinned version numbers committed to source control in ensures this (As you keep an onsite cache of the packages, or you trust the package repo to stay available and for published versions to be immutable).


I agree that taking the latest package every time you're building for production deployment is a bad idea. And version pinning in and of itself doesn't really solve this problem either: if you're relying on an external source for reproducible builds, you've got a potential point of failure. To have a reproducible build you need to have access at the time of the build to the specific artifact you want to include in the build, which at the end of the day means having that library at hand locally as part of your build system. We don't have great stories yet about how to maintain these artifacts locally: setting up your own repos for many ecosystems isn't a walk in the park.

Specifying a version does allow you to know which artifact to include, but that's at the artifact level, which often is conflated with the particular library that artifact provides. A build number or other included in the artifact gives you the same selection properties, yet is independent of the code the artifact actually provides. Currently code and artifact are labelled similarly, but that's more a purposeful accident rather than a guarantee or deterministic property.

As for the package-3, package-4 thing you mention, I agree, it's kind of a mess. I think that's more a result of this lack of separation between the code and the artifact (if I'm understanding you correctly), and module (what is referenced in the code itself namespacing and aliasing. I don't think it's a solved problem.


> And version pinning in and of itself doesn't really solve this problem either: if you're relying on an external source for reproducible builds, you've got a potential point of failure

I grammerred badly, but this is what I meant when I said:

"As you keep an onsite cache of the packages"

In Nuget land this is quite easy but then you have some housekeeping to ensure the copy of the packages is backed up. This could be done with a separate source repo if you want to use that hammer.


You can vendor dependencies if being reliant on a package service is too unreliable.


Yes, that's definitely another option. Once you've done this, you've effectively added the code to your application, with the hope that it might be possible to have a drop-in replacement in the future.


> I would find it annoying to use this scheme because you have to hunt around searching in a different way to look for major upgrades compared to what you do for minor ones.

Either way, a major upgrade would mean a significant change in the way you use the API in your project.


Not always. A breaking change is just that, it doesn’t mean a wholly incompatible re-imagining. It’s possible to upgrade major versions without any actual code changes.


The proposal is to pin the major version and always use the latest minor version.

Assuming no accidental breaks of backwards compatibility in minor versions, this should work.


It's not as bad as the article makes it sound.

Package managers like npm5/yarn and Cargo have lock files that pin exact versions by default, for all dependencies, recursively. If you keep that file (commit it), your project will be immune to unexpected breaking updates.

Semver-based package managers also don't just naively always upgrade to the very latest version like the article suggests. The default behavior is to limit upgrades to semver-compatible versions. There are tools like greenkeeper which use your project's unit test to try out packages before upgrading, adding extra level of assurance.

While none of this is a perfect guarantee, it's IMHO working well enough.


I wasn't aware of greenkeeper, thanks. Can you give some examples of "Semver-based package managers"?

I'm curious what you think of the two posts I was basing mine on:

"Spec-ulation" by Rich Hickey: https://www.youtube.com/watch?v=oyLBGkS5ICk

"Volatile Software" by Steve Losh: http://stevelosh.com/blog/2012/04/volatile-software

I think there may partly be a "universes colliding" effect here, and partly just the future being non-uniformly distributed.


Based on pornels' example, "Semver-based package managers" are most modern ones (npm, bundler, cargo) that have the concept of Semver embedded in the version constraints for dependencies.

E.g. the tilde (~) and caret (^) operators in npm[0] allow you to specify version constraints on dependencies, that allow them to be resolved/pinned to higher minor/patch versions of that package, but not to a higher major version since that will by the Semver definition contain breaking changes and those might impact your project.

[0]: https://docs.npmjs.com/misc/semver#tilde-ranges-123-12-1


If that's true then Rich Hickey has refuted this approach to my mind. See my link in OP. I also mentioned npm and bundler in OP. This is precisely the approach I'm against. It's good locally, but deteriorates the eco-system at large when everyone follows it.


> There are tools like greenkeeper which use your project's unit test to try out packages before upgrading, adding extra level of assurance.

Some communities have also started doing the reverse, where as a library you can get all your users's tests to run to check that you have not broken something they were using.


What communities/projects do this?


I know that that happens for the Rust compiler, possibly some libraries as well. I don't know that it's systematic but I do know that they regularly leverage Cargo to find out major dependencies/users and run those against pending changes.


Agreed, but we've noticed some weird issues at work lately.

When someone is on an older version of npm or on a different system, (linux / mac os) things get fucky with the lock file. The lock file is changed and they just blindly commit it. So kinks still need to be ironed out with npm, IMO.


Bugs exist. That isn’t a design problem. Neither is carelessness. I don’t think either are the thrust of the article.


“To begin with, it's weird that versions are strings.” - what else would they be? An audio tone at a particular frequency? A vibration pattern? It’s written communication. Perhaps he’s just confusing data types with how they are communicated. But then, you’re free to use any data type you wish. If I’m not mistaken, Ubuntu & Jetbrains both use date fields in their versioning. Still communicated as a readable string.

He proposes tuples, but doesn’t care for semver. So what would the tuple mean?

As a human race, I’m sure we can do better with versioning semantics. Perhaps I’ll put some more spare personal time into that semantic schema kit I was designing. Ultimately it’s about data types, containment and meaning. And his proposal ignores most of that.


The ultimate suggestion of a name, version pair is easy enough to understand, but I also don't get what mentioning tuples is intended to show.

In practice, a string with a required format of "x.y.z" just is a 3-tuple of ints. Restricting the values to ints makes it easy enough to parse.

The example of npm shows some weird extensions like "1.2.3pre" which definitely complicate parsing, but that's not as much a difference between strings and tuples, as a difference between having a lot more options.


Author here. I said tuples right after that sentence you quoted, and then I showed that tuples too could be dropped. So it's unclear to me why you feel the need for snark, and it's making it hard to understand you. Can you elaborate on what I am ignoring?


Commenter here. Firstly my comments are designed to highlight confusion in your argument, or with your understanding of versioning as a whole. With regard to version identifiers, they aren’t simply strings, they are complex data types made up of iterable data types. Even a custom enumeration that holds values such as “alpha”, “beta”, “rc”, isnt just a string, but an enumerated type, meant to convey MEANING. The fact that they are serialised into a format that happens to be readable and parsable, is besides the point. You write “Let's just make them a tuple. Instead of "3.0.2", we'll say "(3, 0, 2)".” Why bother? You’ll end up treating them the exact same way. Or does the tuple have some structure that is different from the containment found in semver? Semver is a heirarchy. And what would it mean? Absolutely nothing if you were hoping to understand the difference between 2 versions simply marked like “Product-1 21” the “Product-2 13”. If you think every open source developer’s life is going get easier with strings that hold no meaning, you are sorely mistaken. I’m truly baffled by your reasoning. Others have commented, and I agree, after 20 years in the industry, pinning is vital. The Go community isn’t wrong. The request to pin is borne of experience. You are ignoring the fact that working with multiple dependency’s that are produced by coders you have no control over isn’t all rainbows & unicorns.


Ok, thanks for trying but I think we're talking past each other. I feel pretty confident that I'm not just ignoring 20 years in the industry. (I've spent 20 years in the industry.) If I'm missing something it's subtle and I don't think you're anywhere near it. So I'll stop engaging here.

m.nn.pp-alpha-rc0 may convey meaning to the author, but it is meaningless to users because the 'alpha' and 'rc0' mean different things in different libraries, based on their release processes.

You have spent more sentences on my tuple idea than I spent introducing then completely emptying it. That suggests that I have failed to convey my larger point to you. I'm going to agree to disagree and move on.


> m.nn.pp-alpha-rc0 may convey meaning to the author, but it is meaningless to users because the 'alpha' and 'rc0' mean different things in different libraries, based on their release processes

I don't think you are using the word "meaningless" correctly. rc0 may not mean exactly the same thing in different software packages, but it always means the first release candidate. Thus, it is not meaningless - it conveys meaning. And if the user wants, they can look up that software package, and find out exactly what rc0 means, gaining even more meaning from the description. Perhaps you are trying to say that rc0 doesn't convey as much information as quickly as the alternative you are proposing?


The problem with this article, and with semver, is that every change is either a new feature or a breaking change. Some new features are also breaking changes.

A bug is encoded in the behavior of the program. "Fixing" a bug is changing the behavior of the program - breaking it if someone downstream was implicitly or explicitly relying on the bug.

Most people want the bug fixed, and would tag the change as 1.0.1 rather than 2.0.0; but in someone's dependency tree is version 1.0.0 of your project, and upgrading to 1.0.1 will break them, even though semver tells us that ..1 is a minor change for most consumers.

Incidentally, this is also why evolving the web platform is so hard. Among the uncountable sites on the internet, even if the current behavior seems totally broken, someone is shipping code that relies on it.


> The problem with this article, and with semver

This is mistaken. As I've noted elsewhere in here, semver is a social protocol, not a technical one; in other words, it's wishy-washy meatspace stuff (and yet it's still the best we've got at the moment). As consequence of being wishy-washy meatspace stuff, it lets any library that uses semver define "breaking change" in any way it wants to. This is literally the first rule of the semver specification (where, for the entirety of the document, the "public API" is how breakage is defined):

"1. Software using Semantic Versioning MUST declare a public API. This API could be declared in the code itself or exist strictly in documentation. However it is done, it should be precise and comprehensive."

The only thing that matters is that the public API is documented. It would be totally semver-compliant to have a line in your readme saying "there is no public API #yolo" and then completely redesign your library between versions 1.0.0 and 1.0.1.

Obviously this is undesirable in many ways, but I have yet to see a solution that does away with the wishy-washy meatspace stuff (at best, I have seen some tools to help package authors determine whether the public API breaks inadvertently, for some definitions of "public API").


Assuming you have documentation, then any deviation in actual behaviour from what your documentation says is a bug.

If you're relying on undocumented behaviour in your programs, then you inevitably will find no/less value in semver.

And if your program isn't documented, then your code implicitly fills the role of documentation, which does mean that fixing any bugs can maybe be considered a breaking change. Or something like that.


If the behavior is the one your documentation said it would be, than it's a bugfix. If the behavior is different from what the documentation previously said it was, then it's a change.

If any program that used the previous behavior is still correct, then it's a non-breaking change. Despite what you say, this is a very common thing. Adding a new function, for example, usually fits here. But that is always verified orm the point of view of documented behavior, never looking for actual behavior.

If people are relying on undocumented behavior, they will have problems upgrading their libraries. They should be expecting it, if they are not, it's their problem not the library maintainer's. Wether to depend on undocumented behavior is a decision made by the library users at their discretion by weighting their opotions, it shouldn't impact upstream.


Software is too complex to document all intended behaviors. Any meaningful library will be too complicated for that. Software is hard and semver seems like it makes it a little more manageable, but practically it's not a lot better than single numbers because software is too complex to fulfill semver's guarantees.



Linux distributions have been including the major version in the name for a long time; when we started doing this at Red Hat I wrote down the rationale at http://ometer.com/parallel.html in order to convince upstream projects to do this upstream, as many now do.

For non-system package managers like npm, always using a lockfile is the best current option IMO, for those package managers that support it. https://blog.ometer.com/2017/01/10/dear-package-managers-dep...


Thanks for these links, they're super interesting. However, I should clarify that my post is about how we upgrade dependencies in dev, not how we deploy them in prod. I'm talking about version numbers in Gemfile, and I think you're talking about them in Gemfile.lock. Am I understanding you right?


"We're all on the internet, and we're all constantly running npm install or equivalent."

This is not at all accurate, at least in my work. I use version pinning to be resilient to changes in the greater internet: builds don't use the internet, and instead use internally pinned and vendored sources for third party libraries.

If you rely upon the internet to build and deploy your code, then someone else's outage (like github's) can become your outage, or make yours worse. If you work on a product which has high uptime requirements, this is unacceptable.

Just as important, suppose any one of those dependencies breaks the social contract described in the posted article: they make a backwards-incompatible change, and your code no longer compiles or builds or (much worse!) subtly breaks. Now you're relying very strongly upon the good intentions and competency of all third party maintainers.

Worst of all, suppose any of your dependencies gets compromised and now has a security vulnerability. What do you do?

This is not a nitpick; I think it's central. Because I need network-free, reproducible builds of known-good code, I need to maintain a copy of third-party code somewhere. When I start doing that, I need to keep track of what version (which could be a commit, sure) I'm storing to know when I'm out of date. Then, when my cached copy is out of date, I need to know how safe it is to update to match the remote version: should I expect builds to break, or not?

Semantic versioning is a way for package maintainers to signal their expectations. It's not something you can rely completely on - in these systems I'm describing, you still need automated tests to give you more confidence that upgrading is safe. But semantic versioning improves your confidence which helps avoid wasting work on an upgrade that will certainly fail, and helps identify problems in advance.

It's also helpful to have minor and patch versions so applications can concisely describe which features of third-party dependencies are required to build and run your code. This makes it easier to integrate new applications - you can clearly see whether your cached copy of the third-party code is up-to-date enough.


Author here. One thing I'm realizing I should have been clearer about is that I'm talking about problems of upgrading, not deploying. The fact that we pin versions in places like Gemfile.lock is totally fine. However, the fact that we pin versions in places like Gemfile is a smell. (You may not, and I do not[1]. OP was my attempt at finding middle ground with mainstream best practice.)

This discussion gets at the distinction: https://stackoverflow.com/questions/4151495/should-gemfile-l...

The use of left-pad as an example in OP was intended as an acknowledgement of the deployment issues you point out. But I was probably being so subtle that I veered into cuteness :)

[1] See my example of "exemplary library use" at http://arclanguage.org/item?id=20221


In the second half, GP is talking about upgrading:

> When I start doing that, I need to keep track of what version (which could be a commit, sure) I'm storing to know when I'm out of date. Then, when my cached copy is out of date, I need to know how safe it is to update to match the remote version: should I expect builds to break, or not?

> Semantic versioning is a way for package maintainers to signal their expectations. It's not something you can rely completely on - in these systems I'm describing, you still need automated tests to give you more confidence that upgrading is safe. But semantic versioning improves your confidence which helps avoid wasting work on an upgrade that will certainly fail, and helps identify problems in advance.

Putting these things in Gemfile.lock does not accurately reflect the workflow where all upgrades happen with humans in the loop, who need to be aware of and initiate any change that happens in your dependencies' code.


I actually interpreted the second half to be agreeing with me, so I glossed over it. "I need to know how safe it is to update.." Exactly what OP is about.

I agree that the best approach today is to maintain a local cache of a specific version, and to maintain unit tests to warn you of any breakage. In which case: what's the point of a package manager, again?

As I briefly mentioned above, my current approach is to be extremely conservative in introducing dependencies, inline any dependencies I do introduce, and then treat them as my own code, polishing them alongside my own stuff, deleting code paths I don't need, submitting patches upstream when I discover issues from my hacking. I'm investigating better ways to write automated tests, so that when tests pass we can actually gain confidence that nothing has regressed: http://akkartik.name/about. If we could do this we wouldn't need compatibility at all! But that is blue-sky research; OP is my attempt at meeting the rest of the world half-way. Assume package managers are trying to do something useful. How can we build them to actually do what they advertise?

> Putting these things in Gemfile.lock does not accurately reflect the workflow where all upgrades happen with humans in the loop, who need to be aware of and initiate any change that happens in your dependencies' code.

I agree that upgrades should be pull not push, so I'm not sure what you're disagreeing with here. We need upgrades to be easy to perform so that we'll be more likely to perform them and so that we'll perform them more often, thus keeping our projects up-to-date on the latest vulnerabilities and bugs.


Agreed, plus version pinning isn't mutually exclusive to running latest, head, or whatever you call now. You can still resolve now to some version. For example, Debian stable resolves to version 9. Stable points to the dependency tree called version 9, but you can also just reference it directly if you want. You can go back in time and reference dependency tree 8 if 9 was a doozy. It's foolish to think every version is going to better than the previous, in my experience. Sometimes you have to make the present the past until the future gets its act together.


> Next, move the major version to part of the name of a package. "Rails 5.1.4" becomes "Rails-5 (1, 4)".

And what if a random person decided to register the package named "Rails-6" ? This system is way too prone to malicious package name squatting.


>It's unclear to me if this is due to a failure of communication on the part of the original authors of Go, or if there's a deeper justification for go dep that I'm missing. If you know, set me straight in the comments below.

Sounds like an extremely trivial thing to solve technically.

Make all name-X belong to the same account -- that is, treat "name" as the unique identifier that binds a package name to an account.

So nobody but those that have registered name-1, name-2 etc can register name-66.

That's like 2 lines of code.


Solution that’s like 0 lines of code: namespacing. This way there’s e.g rails/rails-5 and rails/arel-3, or whatever. And if the project foo/name-41 gets abandoned it can recover at bar/name-42.

Note that rails is (officially) hell bent on not using semver at all as significant breakage happens even at minor versions.


I guess you could just skip over Rails-6 and go from Rails-5 to Rails-7. Windows 8 went straight to Windows 10.

The idea is that backwards incompatibility should cause a name change, so really any untaken name would do, e.g. Rails-McTavish, Rails-Ocelot, Rails-馄, Rails-0x389, Rails-3.14159, ...


> Windows 8 went straight to Windows 10.

The alleged reason for skipping windows 9: many software packages were checking for older Windows versions (namely 95/98) by checking if the Windows version string starts with "Windows 9"

https://www.reddit.com/r/technology/comments/2hwlrk/new_wind...


So what you're saying is that 'Windows 9' as a viable name was, in a sense, taken?


>https://www.reddit.com/r/technology/comments/2hwlrk/new_wind....

source: msft employee who isn't even close to the windows team.

that explanation ignores the fact that windows spoofs the windows version number for all applications by default (defaults to 8.1 i think), unless you have a special entry in your application manifest. so old applications will still work by default. not to mention that windows has compatibility shims to deal with this exact issue.


You're right: this type of thing could be an issue. This is one of the motivations for package naming in the spirit of Java: Packages are prefixed so such collisions don't happen. Not all module systems have this type of namespacing, or at least take advantage of it.


> Result: best practice of pinning major version using the twiddle-waka or pessimistic operator all over the place, and Steve Losh shaking head sadly.

I'm not sure why this is potentially seen as a bad thing.

Call me crazy but I enjoy version locking every dependency because it lets me know exactly what I'm using and it's something commit to version control so other people know as well. It becomes a form of documentation.


Author here. I'd love to have you read the two links I refer to at the top of my post. They articulate why it's a bad thing.

Incidentally, version locking every dependency is identical to just inlining every dependency in the repo. And I actually like this approach a lot. See my example of "exemplary library use" at http://arclanguage.org/item?id=20221. OP was an attempt to reconcile my priorities with the mainstream.


Christ on a motorboat, this entire article is a poster boy for Chesterton's Fence.

> To begin with, it's weird that versions are strings. Parsing versions is non-trivial. Let's just make them a tuple. Instead of "3.0.2", we'll say "(3, 0, 2)".

3.0.2 alpha 6 says what?

> Next, move the major version to part of the name of a package. "Rails 5.1.4" becomes "Rails-5 (1, 4)". By following Rich Hickey's suggestion above, we also sidestep the question of what the default version should be. There's just no way to refer to a package without its major version.

And maintainers, being already loath to bump the major, start bumping only the minor on breaking changes.

> And that's it. Package managers should provide no options for version pinning.

And now if the maintainer breaks the package (either wilfully or without noticing) the user is shit out of luck and jolly well fucked.

> A package manager that followed such a proposal would foster an eco-system with greater care for introducing incompatibility.

I've got this very nice bridge you may be interested in.

> Packages that wantonly broke their consumers would "gain a reputation" and get out-competed by packages that didn't

There is literally no reason for that to happen any more than it currently does.

> The occasional unintentional breakage would necessitate people downstream cloning repositories and changing dependency URLs

So pinning would still exist except you'd have to fork the project you depend on, and would forget to update it ever after? Yeah that doesn't sound like a recipe for complete disaster.

> As a result, breaking changes wouldn't live so long that they gain new users.

Call me back about the bridge thing, you'd love it.

> if you change behavior, rename.

Unless you don't care, or don't notice, then don't, and the whole edifice falls down once again.


It's interesting to discuss this as an observation or as a way to break cargo culting (as the title suggests), but I feel like this completely ignores the people aspect. Engineers want to work on interesting problems, not to be slaves to mistakes they made years ago when designing something in their free time. Semver at least seems like a compromise instead of everybody using something different from a graveyard of unmaintained LeftPads.

Of course, personally, I want every dependency I pick to be consistent and stable forever. But I want to break things and work on new and interesting problems. I'm also using an open source toolchain and aren't willing to pay for support or maintenance. Since I follow this stuff online, I'm also interested in new ways of solving old problems.


>Engineers want to work on interesting problems, not to be slaves to mistakes they made years ago

Maybe there should be a sense of responsibility. That seems to be lacking in this "engineering" climate.


> NPM for Node.js defaults to version tagged "latest". Again, if you go with the default version your project is liable to go boom at some future date.

I don't understand this criticism. The "default" in NPM is to use `npm install --save`, which adds the version of the package it installs to your `package.json` so future uses of `npm install` will automatically use a semver compatible version of that package.


Moreover, barring total ecosystem overhauls (Python 2v3, AngularJS vs Angular), people don't spend any effort wondering which version of a package they should use. In my experience, you install the latest. You may pin (or let yarn.lock do it for you) to avoid implicitly consuming a breaking change the next time you install from scratch, but that's to be able to explicitly test if the upgrades break your application. It's not likely because you presume the newer version is worse than the one you already have.


I've seen even point releases break something. For proper ci/cd you always lock to exact version, and run upgrades purposefully (say on a schedule). Last thing one needs is to hunt for breakage when a dependency changes unexpectedly.

You wouldn't let your teammates commit code without review, why trust a 3rd party... (Not saying need to look at diffs of third party libraries)


It's similar for Bundler (the de facto official Ruby package manager). After the initial install, you'll only get new package versions if you run the update command.

The complaint seems to be that the update command is not sufficiently safe by default (though the `--conservative`, `--minor` and `--strict` flags help there), which is fair enough, but why not just fix the default behavior?


> The complaint seems to be that the update command is not sufficiently safe by default

This is not the case in NPM. `npm update` will only update to the latest version that matches the selector in your `package.json`.

So if you ran `npm install --save` and it wrote 'foo@^1.2.3', `npm update` will not update to release 2.0.0 which includes breaking changes, but will update to 1.2.5 which includes fixes.

The ^ symbol is the default which will allow new features and fixes, but not breaking changes. You can optionally set '~' on a conditional basis or npm-wide default for fixes only, or pin packages only if that's your fancy. But the default seems pretty sensible in my opinion.


That's exactly right. (Author of OP here.)


Which version of NPM? Because you should be aware that

> The behaviour of package-lock.json was changed in npm 5.1.0 by means of pull request #16866. The behaviour that you observe is apparently intended by npm as of version 5.1.0.

> That means that package.json can trump package-lock.json whenever a newer version is found for a dependency in package.json. If you want to pin your dependencies effectively, you now must specify the versions without prefix, that means you need to write them as 1.2.0 instead of ~1.2.0 or ^1.2.0. Then the combination of package.json and package-lock.json will yield reproducible builds. To be clear: package-lock.json alone does no longer lock the root level dependencies! [0]

The release notes he references are an absolute farce. All of this bluster about how "npm@5's first semver-minor release!" is going to provide "a much more stable experience." And yet,

> It fixes [#16866], allowing the package.json to trump the package-lock.json.[1]

Fixes, ha! Even that link is broken.

Folks, that was a major breaking change. And they introduced it in a "minor" update. I agree with Rich Hickey that "semver" is an epic failure. He even uses package managers as an example in the cited talk, saying, What if you had to worry about what "version" of Maven central you were using? Well, that's exactly what npm did (only in the client).

How did I stumble onto this point? Because it broke the very first CI build I deployed to GitLab. Worked locally with Rollup 0.50.0, but Rollup 0.50.1 (which the CI used, because caret) introduced a regression that happened to break my package.

So yeah, npm's default is not appropriate for CI. It assumes that patch updates are non-breaking, and we all know that they're not.

/rant

[0] https://stackoverflow.com/questions/45022048/why-does-npm-in...

[1] https://github.com/npm/npm/releases/tag/v5.1.0


The trouble with version pinning is that, over time, technical debt builds up. Software is built using out of date versions. There are version clashes when A needs C version 1, and B needs C version 2, and A and B are needed by D. That's the key take-away item. When you pin a version, you begin to accrue technical debt.

This has long been a curse of ROS, the Robot Operating System, which is a collection of vaguely related packages which speak the same interprocess protocol. Installing ROS tends to create package clashes in Ubuntu, even when using the recommended versions of everything. This gives us a sense of what version pinning does to you like after a decade.

Tools for computing the total technical debt from version pinning in a package would be useful. You should be able to get a list of packages being used which have later versions, with info about how far behind you are in time and number of updates. Then at least you can audit version pinning technical debt. You can assign someone to bringing the technical debt down, updating packages to the latest version and running regression tests.


Am I the only one who doesn't feel like versioning has failed in some way? Sure, there are people who violate the constraints, or don't follow any discernible convention, but in my experience they are now the minority.

If you want to be absolutely anal about versioning, do your commit-based pinning all you want, just don't drag us all into it.


Names for libraries serve a dual purpose: they must be resolvable by computers (i.e. package managers, linkers, etc) but also meaningful to humans -- the programmers who supply the name as user input in the first place. Version numbers allow one to trace the provenance of later artifacts back to earlier ones somewhat linearly, instead of having to hunt for comparable artifacts unconnected by graph edges.

If libraries upon major version changes were renamed instead, the programmer would have to maintain a mental mapping between the predecessor library and the successor, and all code and configuration references to the other artifact would have to be updated 'by hand', without assistance to be gained from the package manager.

Including multiple versions of a library in your project is unsupported in many environments, so if the library author ever envisions a situation where the two 'versions' will be used concurrently, renaming makes sense. In the Java ecosystem, this approach was used for Apache Commons Lang, Apache HTTP Client, and Jackson.

Semantic versioning has good intentions, but adherence and accuracy of the signalling is variable. Absent external enforcement, that's just the way it is -- it's a self-asserted string that no one bothers to validate and everyone hopes they can rely on. If package repositories were more than just storage, perhaps this would be a different story.

It's strange to extoll the virtues of Go's dependency management, because by convention it binds the artifact identity to a resolvable external URL without any indirection. Meatspace maintenance changes like changing the hosting location of the artifact will cause Go to treat it as a different package, and inability to directly communicate externally will also break your build. Every other packaging solution has independently come to the conclusion that abstracting away artifact identity from artifact location is a good thing, but to each their own.


What this article arrives at is basically the perl/CPAN model. Don’t break backwards compatibility unless you absolutely have to; as in cases of security or when the original functionality never worked in the first place.


That's interesting. Could you point me at a good link for learning more? I spent a little time with CPAN back in 2001/2002 and ran screaming. Perhaps it's gotten better since.


Sure, https://perldoc.perl.org/perlpolicy.html#BACKWARD-COMPATIBIL... that’s the policy for the language itself, but as it mentions there it’s considered generally a “community” virtue. I wasn’t programming back in 02 so I’m sure it was extremely different (and the source I’ve read from modules from that era is pretty frightening. Any major framework or generally used module will try to maintain backwards compatibility at least as far as documented behavior. I’ve never come across the kind of stuff I see all the time with node and go and python; package maintainers changing parameters and such because they weren’t happy with the original API.


Well look at it this way: If we all switched to speaking German, would that rid the world of bad ideas?

[Granted there's some evidence that language does affect thought patterns both for good and for ill, but overall I'm going to say my answer is...] No. Someone expressing a bad idea in German is still an idiot, except he's more of a Dummkopf.

So I think the problem with this piece is that the title and part of the argument seem to assert that versioning doesn't guarantee anything about package quality or "breaking-ness" (which is true). But then it goes on to assert that this other versioning system would help change that. I doubt it. Naming and versioning are just descriptors of a thing, and may not accurately describe the thing. In other words, a turd by any other name still smells like a turd.

The package itself is the thing. The only way to know anything about it is to test it with your own stuff before you migrate to it. Therefore always pin the last version that works, and when you become aware of a newer version, test with it, read the changelog, make a decision. Treat it like your own software, in other words. Because you're making it part of your own software - the software you'll be held responsible for.


It's interesting that an almost identical copy of his proposals have already been implemented in Golang, where they are 1) widely reviled 2) seen to be a failure and 3) are in the process of being abandoned.

If the author is serious about the proposals, he should really do some work to figure out why they only people to have actually tried them hate them so much, and not just relegate it to an offhand comment in a footnote.


We should give up on reproducible builds because version numbers do not give enough information? What?

Version numbers are what you expose to the real world, something which semantic versioning tries to standardize. I shouldn't have to care if you use git, hg or copy-paste-versioning internally.


Reproducible builds are great! I was talking about the flow for upgrading at development time, not deployment. You have your versions fixed in production but want to periodically upgrade your dependencies to pick up security fixes and bugfixes. Today that process is far from smooth. OP was an attempt at thinking through how one might design an eco-system to improve things.

Pinning versions in Gemfile.lock is totally fine. Pinning versions in Gemfile is a bad idea.


No mention of Java, but I feel like this works reasonably well: https://github.com/nebula-plugins/gradle-dependency-lock-plu...

You specify what version constraint you want for each dependency (latest.release, major.+, major.minor.+, a specific version number...). The tool does the work of resolving those constraints to specific artifact references, which you commit to your repo in the form of a dependency.lock file that gives you reproducible builds. You can then resolve new artifact versions with `gradle generateLock`, or revert it to a previous commit if there is a breakage.

It's not clear to me what problem OP is trying to solve here.


Author here. Unfortunately, I seem to have failed utterly at communicating. Guess I'll have to try again at some point.

Perhaps my two links at the start will be better uses of your time. Particularly http://stevelosh.com/blog/2012/04/volatile-software.


I think I see your point, but this is where we disagree fundamentally:

> the fact that manually specifying version numbers to avoid running newer code is commonplace, expected, and a “best practice” horrifies me

I think version pinning is not only a "best practice", I think it is absolutely necessary. I demand reproducible builds.


I agree that reproducible builds are absolutely necessary. Nobody's proposing taking those away.

There's a distinction here between how you pick a version in production, and what happens when you do an `npm update` or `gem update`. I'm getting the sense since my last comment that in the Java world the second flow doesn't actually exist. What is your typical workflow for updating your libraries (fixed in the Gradle lockfile) to newer versions? In the Rails world you specify your dependencies in a file called Gemfile, and running `bundle install` fixes the versions chosen for them in a file called Gemfile.lock. In this context, Steve Losh is complaining about the former. Versions in Gemfile.lock are absolutely fine. It's auto-generated after all. Versions in the manually managed Gemfile are a smell.

Does this make sense? I think we have a misunderstanding rather than a fundamental disagreement. I'm actually kinda glad to hear that you had this confusion even after reading the OP (and thanks for doing so!). It makes me feel better about my own writing.


Ok this makes sense. So the way we do this with Nebula + the dependency locking plugin is that we specify a version constraint for each dependency. Typically that would be something like latest.release for internal dependencies, major.+ for important 3rd party dependencies like guava, and potentially a specific version number for less important 3rd party dependencies.

Re-generating the lock file would pull the latest versions that satisfy the constraints. That lock file is not supposed to be edited manually.


Thanks for that info! I'm very happy to be able to clarify things. I'm not a very clear writer yet :)


Thank you for following up! I'm glad we were able to get past the surface misunderstanding


The main reason for trying to jam everything into three numbers of "semver" is that otherwise you get total freeform chaos. Just look at Debian/Ubuntu; you get things like

30~pre9-5ubuntu2

1:7.6+12ubuntu2

1.1.1-1

5.1.1alpha+20110809-3

1.5.0-1~webupd8~precise

Ultimately the important things to the user are "is this version greater than this other version (implying better)", "will upgrading to this break anything" (extremely hard to answer), and "do I have to upgrade this to remain secure" (which may also break everything).


There is one key part of this article which I absolutely support wholeheartedly (the rest of the article is kind of moot if we understand that version management is non trivial)

When a major change is made to something, as in a breaking change, where you as the developer are making the conscious decision that you will cause other people's shit to break if they keep pulling "latest" I absolutely think that the major version number should just be part of the naming scheme, not the versioning scheme.

We've actually implemented a similar system where I'm currently working. For versioning APIs, if you want to bump a major version then we force developers to start working with a whole new git repository, a whole new pipeline etc. If we have to patch some defect fix back to the old version, we can just cherry pick across forks, but most of the time we want to manage the life cycle of two bits of development that no longer do the same thing as being exactly that: bits of developed code that do different things


> When a major change is made to something, as in a breaking change, where you as the developer are making the conscious decision that you will cause other people's shit to break

And what if that decision is not conscious (breaking changes due to minor changes happen all the time), or what if the breakage is conscious but necessary to fix something considered a bug? What then?


The article does a good job of highlighting some problems of versioning but I don't think the solution it proposes is any better.

The format of Product A Version B.c seems to be most actionable as long as it's consistent.

A could be a major shift in the product. So something like Windows 98, Windows 2000, iPhone 3 and 4, Android Gingerbread and ICS.

Version B would update whenever it's unstable and possibly breaking. So a version 2.0 or 3.0 would introduce lots of new things, but new things tend to be unstable, even after extensive testing.

Version .c is more stable as the number increases. So a version 4.88 is more stable than version 4.81

I don't see how more dots (e.g. version 4.81.2 of 4.81c) are actionable.

As the article brings up, the problem is when the defaults highlight to a new shiny unstable version instead of a stable older one. So the defaults should point to the highest c number, just a version below B. If the latest build is Version 7.2, the stable build might be 6.214.


The article makes good points, but they feel impractical and/or risky to follow.

> At this point you could even get rid of the version altogether and just use the commit hash

Right. Until someone brazenly force pushes to a repository, and makes it hell for your package manager. You would now have to manually hunt your dependency's repository for the closest commit hash to the one that was removed. :|

> Package managers should provide no options for version pinning. A package manager that followed such a proposal would foster an eco-system with greater care for introducing incompatibility.

We all want to live in this ideal world. But this requires developers to trust the authors of upstream dependencies. Sometimes this is hard to do, especially for critical projects. I just want my shit to continue working, and not be subject to the whims or genuine human errors of upstream authors.


I fail to see any actual advantage in the approach proposed by this article; it would have the exact same problems, or very similar ones, since, versioning or not, everything ultimately comes down to trusting the package's authors. Nobody can guarantee that they won't introduce breaking changes, specially by accident. Versions offer something that, at least, resembles a deterministic way to know what code one's actually shipping.

On another note, I always understood that using the latest version by default was sensible. For me, it means that, in a new project, one should start with the latest version of whatever third-party packages one wants to use. Once development actually gets underway, then the latest versions at that time should be pinned.


What a terrible idea.

If my package is named "LeftPad-17", what's to stop someone else from creating a package named "LeftPad-18", which innocent folk may assume is the latest version of LeftPad.

A project name is semantically different from the version number of that code.


"Rich Hickey pointed out last year that the convention of bumping the major version of a library to indicate incompatibility conveys no more actionable information than just changing the name of the library."

Except that it's not. Sure, as far as halfway-guaranteed compatibility goes, it's true, but the name identifies the project:

* Roughly speaking, the purpose of the software doesn't change with major versions.

* Also, roughly speaking, the people working on the software don't change with major versions.

Both of which are, arguably, more important than raw compatibility.

"I recently encountered this post from 2012 by Steve Losh, pointing out that if version numbers were any good, we'd almost always be looking to use the latest version number."

No, we wouldn't. When I build a piece of software, I build it with a particular version of its dependencies. Then, I test it with those versions. After I've tested it, I absolutely do not need changes in the underlying software changing how things work. Compatibility is never perfect.

"In particular, Semantic Versioning is misguided, an attempt to fix something that is broken beyond repair. The correct way to practice semantic versioning is without any version strings at all, just Rich Hickey's directive: if you change behavior, rename."

Semantic versioning includes a major number, a minor number, and a micro number; they have different semantics. Mapping that scheme to Java, a major number change means no compatibility is guaranteed. But minor and micro changes do make partial compatibility guarantees. Specifically, if you add a method specification to an interface, everything that implements that interface has to change to match it, but methods that accept that interface as a parameter do not need to change.

If user software implements an interface, changing that interface requires a change to at least the minor number. If user software uses, but does not implement, an interface, changing that interface may require only a micro number change.


[naive wish mode]

I'm really fond of how Google does versioning in their internal codebase (from the public information about it that I know). They have a monorepo containing a single version of every library. When updating something, you're supposed to take into account all the places it's being used, and avoid breakages. Tests and tooling play an important role in making it possible.

I really want something like that for the open-source world, instead of having to deal with maintaining multiple versions, and dealing with the complexity of interactions between the all the permutations of different versions.

[/naive wish mode]


At least in the Linux world, this is part of the value that Linux distributions can offer. Distributions like Fedora and openSUSE tend to lead the pack here by actively testing updates before they're merged into the distribution repositories, and developing tooling for supporting continuous verification of everything working together.

c.f. Koji+Bodhi for Fedora, Open Build Service for openSUSE, and the OpenQA instances for both distributions.


I once wrote a fairly extensive post on REST API versioning techniques with similar points (lost my backup so wayback machine):

http://web.archive.org/web/20160407080723/http://stucharlton...

In short, we really seem to have forgotten the power of backward and forward compatibility because we assume a very short lifespan for our services (months, not decades).


No version pinning means no repeatable builds. A build from a month ago when ran today should use exactly the same version of any libraries it references, not something that was published in the meantime.


The whole situation with language package managers like Ruby and Node make them completely inappropriate for end users and deployment.

Most apps merrily pull in hundreds of dependencies. Some deps need a complete build environment, others may require specific versions that quickly lead to fragility, dependency hell and thousands of wasted hours.

These kind of package managers only make sense and see maximum use in SAAS type apps or dev centric environments explaining perhaps why some devs do not realise the issue. But this just doesn't make sense for deployment.


It's a time problem. Container packaging eliminates the "works on my machine" problem, which is a space problem (environment pinning). It can be used to eliminate the time problem (version pinning).

SemVer is like language (e.g., English), more people speak the common language, easier to communicate. But doesn't prevent others who might speak EmojiVer to communicate. Certainly won't stop people using packages from both worlds.


> RubyGems defaults to newest version available. So if you don't specify a version for a dependency and they create a breaking v3.0

No it doesn't. RubyGems allows you to say "I want to depend on the latest 2.x", and if your dependencies use semantic versioning it works pretty well.

You still want to pin your dependencies versions for deploying the versions you tested, but you should be able to safely upgrade using that scheme unless your dependencies screw up.


I've examined the same issue between reproducible builds vs semantic versioning while relating it to Nix. This post also shows how to use Git submodules to achieve something similar. https://matrix.ai/2016/04/04/content-addressed-dependencies-...


Note that Dhall language has an interesting take on dependencies where each module can be a content addressed code supplied via ipfs https://github.com/dhall-lang/dhall-lang


It's true that using version ranges is incredibly dangerous. My take on it: https://www.lucidchart.com/techblog/2017/03/15/package-manag...

I don't love lockfiles, but you need some way deterministic dependency resolution.


I just use major+sha. Major changes mean you are breaking the API. The sha is like, what code built it (not the git sha, But a hash tree of the actual source).

I disagree with the “change the name” business. Sure, if your math routines turn into string routines, you probably need a name change. But if you are sticking to generally the same problem domain, but clients can expect breakage, then keep the name. The name is important.


Versioning is a mess of a problem area where the majority of the difficult problems that exist today are dealing with previous solutions.

This is not to say there have been bad ideas. Nor that there have been nothing but bad solutions. Indeed, I'd wager most solutions were perfectly fine, in isolation. They typically don't play well with other solutions, though. :(


I have to agree that semantic versioning is definitely more of an art than a science. I wrote an article about it a while ago https://hackernoon.com/its-not-a-bug-it-s-a-feature-the-prob...


> All software comes with a version

Nitpicking, but this is not true. Lots of software is never formally released, and therefore doesn't have any version. Sometimes such software is useful enough to make its way into a software distribution, in which case distro maintainer has to invent a version number for it. Ugh.


I'm not totally sold on the heavy BDSM type systems, but one of their clear advantages comes from trivially handling this problem. Elm's package manager simply won't let you publish an incompatible update without bumping the version, all handled by type check.


People using semantic versioning when its totally unnecessary is a peeve of mine. Maybe if youre releasing an api or something it makes sense (I dunno, I dont do that) I make games and just number my releases / versions 1,2,3,4,5,6 etc.


You can also do this with semantic versioning; I have indeed seen people who, when forced to use semver, just do 1.0.0, 2.0.0, 3.0.0, 4.0.0... It's completely valid, because you, as the package author, are given ultimate freedom by semver in determining when major version bumps are appropriate.


Semver is marketing, enumerating builds is engineering.

Sure, semver should be associated (pinned?) to build numbers. Invariant. Immutable. Never changing.

Semver is the public face. But internally, I only care what build number someone is talking about.


Why is parsing "1.23.4" more difficult than "(1, 23, 4)"? :)


OP brought up Rails, NPM, and Go but not Rust.

The Rust ecosystem might be a demonstration of the inverse problem. Pinning versions is trivial in Rust because its default. You will never be pushed over a major version boundary without intentionally doing it (and "always get the latest" is just the "" version).

The problem then becomes that crates never reach 1.0. Flask is probably the oldest piece of software I use that similarly never reached 1.0 - albeit its been pretty close ever since the project was reorganized last year - but the entire Rust ecosystem is buried in Flasks. Software that is 99% of the way to what the creator wants 1.0 to be, but the last 1% is something nobody wants to do. When you are 99% of the way there, the software works in 99% of use cases, and that last 1% is... the last 1% anyone wanted to do. Which means they didn't want to do it.

Probably the most blatant example of this in Rust is regex, which is blessed crate that is still only at 0.2, among many others (just from my hobby Rocket project are base64, chrono (and time), cookie, dotenv, envy, error-chain, lazy_static, num-traits, rand, redis, and uuid all still in version 0.x. That being said, the impl period going on right now and this entire year of direction from the core team was explicit in bringing as much of these fundamentals to 1.0, so we will see how successful the effort is.

Probably the greatest problem with semantic versioning in that* context becomes feature creep. It is scary to go to 1.0 - you feel free in 0.x land where you can make major breaking changes in a .x release whereas you need to increment that scary major version number to do it later on. And besides the fear there is a lot of logistical headache properly maintaining semantically versioned software - you should expect if you never release a 2.0 to have someone trying to contribute 1.x bugfixes down the road, with the need to release 1.x.y bugfixes for all time. Because 2.0 is, like the article says, a new library - its a different API, you changed the meaning. So you are now maintaining two libraries.

There is also one final fear with the way Rust setup its crates ecosystem - if you are bold and break things but don't end up delaying 1.0 for way too long, you might end up incrementing the major version a bit. And there is a cultural and subconscious aversion to anything on crates.io you see at version 3.0 or heavens forbid 4.0 or more. That software is unreliable, the developer is changing it all the time!. But then you go and use a 0.15 crate that is having the same problem anyway, just without saying "this probably does its job" like a 1.0 can.

In the end, versioning truly is almost meaningless, even in an enforced semantic versioning system the intent breaks down and meaning is lost just because different people release software differently. But that is a real deep almost - because its still more information than not having it, and in Rust right now at least it gives more helpful information than not. I'd call it a success at that point - way more than say Linux, where the major version is incremented whenever Linus wants to run a Google+ poll...


As far as I understand, the situation with 0.x.y libs on crates.io is working-as-intended; a major version of 0 denotes libraries that the author does not yet want to commit to having a stable interface (and may not ever want to commit to such), which is crucial information for any potential users of that library. The fact that there are many such libraries can be attributed to the fact that having a stable inferface was impossible before Rust stabilized in May 2015, and so most libraries are exceedingly young (even disregarding those packages which see one aspirational release and are then forgotten, which make up the bulk of every package repository for every language regardless of versioning scheme :P ). If the situation were to never improve that would indeed be concerning, but the Rust devs acknowledge the concern have been striving towards 1.0 for all of their "blessed" libs (e.g. regex, which you mention, is having its 1.0 release imminently) in order to help encourage people who might be on the edge to take the plunge.

> And there is a cultural and subconscious aversion to anything on crates.io you see at version 3.0 or heavens forbid 4.0 or more. That software is unreliable, the developer is changing it all the time!

I've never seen anyone at all express this attitude, in fact I see the opposite: if you're at any version past 0, you get heaps of praise for being willing to do the work it takes to commit to having stable releases.


Author here. This is a really useful comment, thanks. You don't appreciate that there's no silver bullet until you've tried to cast a few.


project is liable to go boom at some future date

I've seen so many people come and go at my last job and at other companies, I don't think anybody cares.


These proposals feel like shuffling the deck chairs around on the Titanic. I get the goal of attempting to increase accountability via tweaking expectations of package managers, but that ignores the deeper problems limiting software quality that cause this whole situation in the first place. Consider this quote:

Packages that wantonly broke their consumers would "gain a reputation" and get out-competed by packages that didn't [...]

This quote illustrates a host of fallacious assumptions in the article:

1. That there's a large enough competitive field for any given package that one that publishes a "bad" release can be "punished" by being overtaken by competitor(s). This is weird on so many levels. The costs to switch to a competing library can often be quite high, more than just fixing the current one. Related, there's often only one viable library choice in a given domain, which implies both zero competitors and a high cost of using an alternative (whole-cloth rewrite).

2. QA is expensive: time to maintain automated tests, time to run any needed ad-hoc testing, time and experience to develop the library in a really robust way. But this proposal seeks to put in a competitive incentive that would fragment efforts to solve the same problem. So there's this assumption that the publisher is only ever just incompetent, not starved for resources to implement great software quality. A major success of open source packages is that they enable consolidation of effort (i.e. a community of contributors).

3. Sometimes a library is used in contexts that weren't foreseen by its authors, so a "breaking" change for some consumer is accidentally introduced. This may have happened before a package consumer strongly adopted it. This dashes up against an implication in the article that breakage is only ever introduced, vs. already existing. Packages are never in some pristine state of function and concept. Except for some kinds of simple packages, they're often ongoing, living exercises in understanding the real-world problem domain and applying software techniques to the solution.

Overall, I think saying that modern package managers "default" to unpinned is a straw-man: no sane package manager is intended to be used this way except for a few minutes after setting up a new project. "bundle install" that first time and now you've got a Gemfile.lock. The next time, you get the same packages as the prior install. Likewise, "bundle update" is an operation that should be dirtying your source tree, requiring the usual passes of building, testing, and other software quality process before it lands. AFAICT, nothing in this proposal would ever change these steps: initial package adoption, and software change evaluation. If there are problems with upstream package quality, those root at deeper issues of "why software quality is hard" rather than being the fault of current package managers. It's difficult to see how our package managers could influence this real-world social problem at all.


It would be interesting to rethink versioning, but this proposal is not the answer.


The author seems to not understand how people work; we all make mistakes.


> Rich Hickey pointed out last year that the convention of bumping the major version of a library to indicate incompatibility conveys no more actionable information than just changing the name of the library.

Unless, you know, you care about the purpose or authorship of the library.

I don't think pinning dependencies at package levels is a bad idea - ubuntu does this for a lot of packages but this relies on properly followed semantic versioning to work.


and that is exactly like Perl 5 versions are, nowadays... "5.22.1" is "perl 5 version 22, patch release 1"


> Let's just make them a tuple. Instead of "3.0.2", we'll say "(3, 0, 2)".

And how do you encode development versions? Alphas/betas/rcs? Versions are structurally more complex than a simple tuple, and they're strings b/c that's a convenient and easy representation.

And you can't just tell people to not have alphas, etc. Sometimes I need to test stuff on branches, so it can't have a normal, mainline version number; I need to signal that this is a proposed, test version, etc. Ignoring the human consequences here defeats the point of having a version number.

> package managers uniformly fail to provide the sane default of "give me the latest compatible version, excluding breaking changes."

Cargo? NPM? (And yes, I realize the author lists this one!) As best as I can tell, the author's complaint seems to be that you have to actually notate enough information for the package manager to determine "latest compatible version, excluding breaking changes", which both Cargo and NPM are very capable of. (E.g., in Cargo, you'd need to say something like ^1.2.) But being able to pull down the absolute latest is useful in a new project; I'll often use "dep" to start, get comfortable, and then restrict that to dep = "^3.2" or whatever.

> Since we always want to provide the latest version by default, the distinction between minor versions and patch levels is moot.

The distinction is a human, social level communication construct.

> At this point you could even get rid of the version altogether and just use the commit hash

And is 9583af87a8b newer or older than ab8a61fe9ac?

> And that's it. Package managers should provide no options for version pinning.

I can't tell if this article is advocating that a package's name always refers to a single, unchanging version of the software (which effectively removes any notion of version handling from the package manager entirely) or if we're just putting the major in the name, and always running the latest version for that major. This latter stance seems to imply that the version does carry useful information. But not being able to pin is nuts: if some breakage is introduced, you should just screw everyone? How would reproducible builds work?

> In particular, Semantic Versioning is misguided

The article doesn't convince me that it understands the reasons behind semver: communication between humans as to what types of changes a potential upgrade contains. The tooling supports this so that I can pull in changes that shouldn't break the build, test them to see if that actually is correct, and then deploy them automatically, with minimal risk. Between security fixes and bug patches, I have a need to know what types of updates I'm looking at, and to ensure that the updates that aren't breaking changes get applied diligently — and that's why we have the notion of semver: so that we can all communicate this the same way, and so that our tools can understand it, and act accordingly.

Now, just because semver says that 1.3 should be compatible w/ 1.2 doesn't necessarily mean it is. Mistakes happen. That's what tests are for, and it doesn't mean you need to blindly upgrade; this is why Cargo has a separate notions of the versions that the project should be compatible with, such as ^1.2 recorded in a Cargo.toml, and what versions the project is actually using, exactly, which are contained in the lock file, so that the project can be rebuilt exactly as a previous build was, but can also be trivially upgraded where and when possible.

The article doesn't acknowledge the very real problems that semver tries to solve, and doesn't tell us how its proposal (whatever that is) would solve these issues. Instead, we get "if you change behavior, rename"; the very real outcome of this proposal is that either people would never apply another security patch again, and that dependency would rot, or they'd start reinventing the wheel named semver.


None of this is a good idea.


Pointless article.

What we need is good, language independent, tooling to automatically select which versions to use, how risky the update is going to be, what problems are being fixed.

Tracking and correlating successful and failed updates centrally - call it "distributed CI".

Tentatively upgrading dependencies, running tests and rolling back on error.

Don't forget that even minor, bugfix releases introduce bugs or break code by actually fixing bugs that people were inadvertently relying upon.

Lack of automated tools only encourages the "vendorize, ship and forget" model that is so popular.


The NixOS community is building towards this with hydra, ipfs, nixpkgs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: