I had an idea that I've been sitting on for a while and it looks like I'll probably never get around to it so I thought I'd offer it up for others to think about:
We've flirted with separating interface from implementation over and over again. We also talk sometimes about the tests being the actual requirements definition for the system.
If we versioned the tests and declared which version of the tests pass with a given version of the code, we might get closer to a workable system. Most of the time a bug fix should result in new tests, not changing old tests - unless there are backward incompatible changes. This may require us to up our test writing game, but that sounds like a good carrot to me.
Semver is mostly about 'increment a number on every change, increment another number when the changes are not backward compatible (covariant?)' for some definition of compatible. But if we have tried to define 'compatible' in a rigorous way, I haven't seen it. I suspect someone in Formal Methods has something up their sleeve. I'd be curious to know if they have anything we can cherry-pick.
Check out Elm. The package manager already forces you to bump your semver when you make some kinds of changes. It's not perfect, but it gets you most of the way there and helps narrow down some of the discussion about what is a breaking change and what isn't.
> But if we have tried to define 'compatible' in a rigorous way, I haven't seen it
My understanding (and the way I've versioned my projects) is that a major version denotes a stable API and behavior you can safely rely on. What functions are exported, what arguments they take, etc. That's a pretty easy thing to not break.
Things that _aren't_ covered are implementation details such as the location of files in the package, the reuse of objects internally, or other not explicitly-defined behavior.
A couple of examples:
1. In JS, you can import from sub-folders: `let const = require('somePkg/a/b/c)`, but that depends on internal organization. That's not guaranteed across versions.
2. You can also add properties to an object. If we export a function that takes a request object and returns a response, we don't guarantee those are identical objects. We _do_ ensure that the object returned from the function has properties x,y,z, but not that anything you defined before calling the function will still be there.
We would have this back-and-forth on how to version fix releases. "Well, if someone was depending on this buggy behavior, this update will break their code". Ultimately, we said "this is what we guarantee will work. For anything else, run tests anytime you upgrade".
We're not perfect (all software has bugs), but this has worked pretty well.
The problem is that just about everything breaks the API. I have had situations where obscure bug fixes break my app because I was depending on the behaviour of that bug.
The answer IMO is to just do a full test of your stuff after doing updates. I update many packages at once to make the most of my time.
Then the problem is with you. You wrote code that had a dependency on a specific non-guaranteed feature (the bug) and then feel jilted b/c they changed the non-guaranteed feature. What you should have done was written defensive code and tests around the NGF so that when it changed, your tests would catch that and either not upgrade or allow you the chance to fix it.
I don’t feel any negative feelings towards the library developers. I have my versions locked so these unexpected surprises don’t happen. In this case it would have been extremely difficult for me to have noticed that the behaviour seen was not intended and supported by the developers.
My point is not that this is an unsolvable problem, but that it’s not a good idea to go “semver says this is non breaking, let’s just chuck it in production”
I once wrote a python module that applied diff patches as a part of the deployment process. This let us 'fix' libraries we depended on without having to maintain a full fork until the maintainer fixed the bug. If a new version came down, we could test the patch and update it if needed.
People have toyed with this sort of thing to verify API compatibility in Rust, because you can quantify all the sorts of changes that are considered compatible and incompatible, e.g. adding a type is compatible, changing the return type of a method is incompatible. Though in some cases it’s surprisingly difficult to decide whether a change should be considered compatible or not, as in Rust it’s possible for adding a method to a trait to break downstream code by introducing resolution conflicts—so should you allow adding methods to a trait, or should you consider that backwards-incompatible? And if you have a prelude which will conventionally be used as `use some_crate::prelude::*;`, adding an item to that can cause conflicts, so is adding public items not backwards-compatible after all? (This is one of the arguments against glob imports.)
I hope that eventually it will be possible to confirm API compatibility (however it is defined) as part of making a release. Whether that can be done on the crates.io server or not is another matter, because it could vary by platform, building could require local dependencies, &c. This isn’t full tests, of course, just API, so there’d still be scope for error.
- And there has been quite a bit of effort into preventing semver requirements from fracturing the ecosystem. This revolves around the compiler working with multiple major versions of a single library: https://github.com/dtolnay/semver-trick
I also like the idea of extending the "standard" tests with your own "custom" tests if your app happens to rely on some undocumented behavior (Hyrum's Law). Then you get a nice error message when that observable behavior (not part of the contract) breaks, instead of the app falling over.
Tests aren't the actual requirements doc once your code is used by a nontrivial number of people. People will always depend on your undocumented implementation details. Always.
Here’s the thing though. If the test suite declares the API, I could always write my own test suite that is more precise, which does cover the things I use “off label”. I may get caught a couple times, but eventually I’ll get it sorted.
Being able to do that depends on whether I can assign such things after the fact, or if the author has to do it.
The version of the API and the version of the artifact that implements it are distinct, not the least because the artifact can implement several (major) versions of the API.
Using tests across versions is a definitely a trick to consider but your coverage may vary. Probably better to do it in addition to other methods.
Yes, that’s basically what I was thinking. New test suite runs against old code, new tags are created if the tests pass.
As for code coverage, that is always a problem, and not one to sneeze at.
So follow me here. I do a lot of tech selection for teams. Not just which libraries have he features we need, which ones are going to result in. One pair programming due to human error.
If I see that the poorly documented (accidental?) feature I care about has poor code coverage, should I use it? Should I use it and file a PR with pinning tests?
I think in this case the complexity of the interface tests will need to be as complex as the code itself, so you are back to the same quality issues but with the test framework itself.
The Haskell's Stackage people has some interesting discussions about this on their lists. (Caring about types too, obviously.)
TLDR, it doesn't change much, what means it doesn't create a lot of problems, but doesn't really solve any either. Versioning is a humane concept, and you can't solve it with pure tech, you'll always need guidelines and procedures.
> This is also true for ecosystems like Rust or Go, that have SemVer baked into their packaging toolchain².
> 2. […] Bumping major in Go means creating a new import path. Which is so painful that even Google is weaseling around with their own projects.
That’s not Go baking in SemVer, that’s them almost denying the existence of version numbers. (I last tried Go before go pkg was a thing, do they have version numbers in at least some shape now? Certainly they used not to.) Creating a new import path to bump major? You can do this in any ecosystem. Python, for example, has urllib, urllib2, urllib3…
So I object strenuously to this classification. Say that Rust and npm bake in SemVer, perhaps, but Go doesn’t.
The SemVer in Rust/Cargo is pretty good with some caveats:
- Minimum Supported Rust Version doesn't play into it, and that's been a topic of debate.
- Build time dependencies don't play into (I think).
But generally the Rust ecosystem takes code stability fairly seriously and the SemVer oriented bits have done a lot of good.
It does get hairy if you try to explicitly use older versions (e.g. early minor versions of a specific major version) as Cargo tries to pick the most recent release available. I haven't much about this in practice.
Baking SemVer in and having an ecosystem-wide convention to use SemVer (which I hope will in the future be strengthened by machine-checking API compatibility and defaulting to rejecting incompatible changes) gives you sane defaults. You can still specify weird dependencies if you choose to: `foo = ">= 1.2.3, < 1.4.5"`.
I definitely wish MSRV was considered in Cargo’s resolution algorithm. This is definitely the biggest issue in Rust package versioning. But if it was, that’d leave the security arguments in play, as now you could be being silently fed an old, insecure version of a package.
Go only supports semantic versions, and has created quite a bit of friction if you want to use any other sort of versioning scheme (especially if it kinda looks like SemVer, such as vYEAR.WEEK.BUILD).
Pushing it even further, Go puts a gun in your mouth and makes it so that you cannot use any other version scheme but SemVer. So you either pull the trigger and deal with the pain of SemVer, or you do what growing number in the community are doing: taking the gun from their mouth and using v0 forever.
The impression I’m getting is that Go modules have version numbers, but that the version number is basically just a sort of string where the only available option is comparing version numbers, so that there’s no resolution algorithm (e.g. “>1.16, <2”), and consequently no potential for version maintenance (which would be why major versions have to have a new URL). Semantically, then, this is no different from version numbers being just integers.
This is not quite correct. Go understands `-u=patch` vs. `-u` to refer to patch vs. minor version. It also forces you to use a different path for each major version, by parsing the major version (and conversely also ignores version tags less than the path version so you can't get a "v2" path with "v1.1.3"). So it simultaneously understands semver in the "porcelain" tools, but version resolution per se is really only bounded on the low side because any upgrade preserving the path should be safe.
It's not "minor" though, `-u` just means "update it". If that happens to be v1.1.0 to v200.3.4, it'll just make that change.
The reason it's often claimed to be minor is: modules at v2 or beyond should have a different import path than v1, so they're actually a 100% different library with no implied relation to the v1 version (there may be one, but none is implied). So v2.x -> v3 will never auto-update - that'd be crazy, they're different libraries. You can even have them both installed at the same time, at different versions, node_modules-like, just like any other two different libraries.
I say "should" because it's not enforced - if you ignore this and just tag a v200.3.4 with no v2...v200 folders or other providing modules, it'll be marked as v200.3.4+incompatible, and -u will plow through every other major semver version blindly. Because it doesn't understand semver, it just has imports and can sort strings.
I mean the `+incompatible` gets added there for a reason, it's not doing it "blindly" because it "doesn't understand" it. You might not like what what go tooling does with its semver awareness, but it is semver-aware.
It's not incompatible with semver though, it's just incompatible with Go's "semver is folders".
Go's "semver" is full of weird corners. E.g. if you tag v1.2.3 and then someone does a `go get ...@master`, they get "version" `v1.2.4-timestamp-gitsha`. `v1.2.4` doesn't exist anywhere, Go just makes it up: https://golang.org/ref/mod#pseudo-versions . It works for their system with minimal changes, and is thus a clever hack, but it's definitely weird.
"But they won’t be happy about it and it may bring some negative vibes to your tropical vacation, financed by the millions you’ve made from maintaining a FOSS project."
I love this line.
There's a tragedy-of-the-commons problem with respect to version pinning. What is good for you isn't good for the community.
If your project is closed source depending on open source modules, go for it. However, if your project is open source, you're caught in the middle between the interests of your sanity and your downstream.
If you pin too tightly, it increases the likelihood that your downstream users won't be able to construct a valid set of versions. If you pin too loose, then you're rolling the dice that the next upstream update won't break something.
The only real solution is to constantly evaluate your upstream and constantly update your pins, which is quite a treadmill.
This really is a solved problem within Rust & Cargo. That's why Cargo is such a critical part of Rust & why it really gains traction. The entire build system is so sane that I'll gladly pay the penalty of having to read & write Rust (as a novice) to have a sane build system I can trust to keep many parts of my build supply chain safe.
There is one thing in Rust that's not so rosy but I think could be solved if it's a big enough problem in practice. If a shared dependency has a version that satisfies all preconditions across all downstream libraries, that one dependency will be used. Where that can fall down is if that version actually does break one of your dependencies because of some oversight on the community of suppliers. A nice addition would be a way for downstream to add additional constraints to override things (like blacklisting version dependency chains). That might become more of an issue if there's a package that gets popular enough & then goes stale (i.e. dealing with garbage in the broader Rust community).
> If A pins version <= X of dependency B, but C depends on B > X, component D can include both A & C as dependencies & it will still build & work correctly.
Actually, I believe that's only true if the two versions of Bs have semver-incompatible version numbers.
So, for instance, if your dependency A requests B=1.5 and C requests B=2.3, then cargo will do what you say and compile both versions separately.
But if A requests B=1.3 and C requests B=1.4, cargo will try to find the highest semver-compatible version that satisfies both constraints. If A pins B<=1.3 and C pins B>1.4, cargo will give you an error.
IIRC, npm already does the same thing. If A and C depends on two compatible version of B, it will make them use the same package. If two version aren't compatible, it will use two package and it still works.
I think part of what makes these discussions difficult is that each system uses similar-but-different approaches. For example, in Rust, even if your library has checked in Cargo.lock, it won't be used for resolution of that library. This suggestion of yours is literally coded into Cargo directly. But someone coming from Node may not instantly get that, even if both ecosystems "use semver."
Notwithstanding this (that Cargo.lock doesn’t apply to libraries), you can pin in Cargo.toml by specifying exact dependencies: `foo = "= 1.2.3"` will accept only version 1.2.3.
If you’re writing a library or a plug-in that is going to work with different major versions of some containing application, you have to be somewhat flexible in what dependencies you require. You’re going to be coexisting with other plugins and libraries. If you want requests 2.23.0.1, and the other one wants 2.22, and the other one wants 2.24, all to simply do requests.get, madness follows from the people trying to use the library.
This leads to either: forking, serial pip installs, or madness. Or all three.
Libraries shouldn’t have a preference for tzinfo, or certifi. They certainly shouldn’t have conflicting preferences for them, and the only way that can happen is if the libraries defer to the application for the version of widely used core dependencies.
There are exceptions, and generally fall along the lines of libraries that have tightly coupled dependencies, or cases where the specific advanced features are used. I’m thinking here of awscli and boto, but there have been epic disagreements on both between awscli and awsebcli.
This is really the meat of the issue I think. The root of the problem AFAICT is that there are people involved in these software systems. People who can make mistakes, people with emotions, egos and scarce time.
I don’t see how any of the author’s very valid points about unintended consequences go away if, instead of pinning to 1.2.3, you pin to 123. Or 2021-03-02, or 57d4fca9.
Software is only becoming more modular and reusable. We need more discipline, not less by throwing out protocols while blaming them for not solving the problems we create with them ourselves. That’s not “taking responsibility” IMO.
I don’t think semantic versioning is an absolute rule, but it’s helpful. I expect that 3.14 won’t break all my existing code using 3.13. It doesn’t mean I rely on it absolutely, but it does help with focusing attention.
I appreciate when projects use semantic versioning because I can then look more closely at 4.0 rather than 3.14.
A few rebuttal points that came up for me while reading this.
SemVer is a promise by the library author to the user. It is a social construct, not a technical one. The maintainer might break that construct accidentally. If users report it and the maintainers agree, a patch is created fixing the issue (or revoking the old library & putting up one with a fixed semver). Alternatively the maintainer may say "no" to this social contract (ZeroVer) or may decide they no longer want to say they're using SemVer. They can of course be dicks and choose to continue claiming they are using SemVer when they aren't, but in those scenarios, hopefully & I think usually in practice, their communities abandon them for being untrustworthy.
2. Rust does allow for conflicting package versions within its dependency chain. That it's a problem for Python is clearly an issue within how it does packaging, not something inherent to all packaging or at all related to SemVer.
So I would say pin the dependencies of smaller packages or ones whose communities you don't trust to be large enough to enforce the social contract. For larger libraries with strong user communities, rely on the social construct.
This social construct, by the way, goes both ways. Users of a library should really make sure they're set up to make sure their any dependencies older than some time period (at least a year) are up to date or find alternatives if not worth the maintenance cost. Maintainers of smaller projects (or really any project) have no responsibility to support anything but the latest release unless they have a policy around that.
As a point of context, Linus has famously made guarantees around ABI stability with userspace. That is used not just to quickly make decisions on arguments about kernel changes. When the kernel has been found to break userspace they talk about it, quickly pull the release & develop a fix or pull the change out for further development. This ultimately is a social contract. Some times you can enforce parts of it automatically, or at least to validate that the contract is holding, like an ABI compliance checker [1] that at least can be used to check that the syntactical parts of the ABI are following the versioning rules the project is committed to. At its core though it's still a social contract & should be understood within that context. The fact that promises & social contracts could be broken (intentionally or otherwise) doesn't undermine their utility if you can build trust of your stewardship within your community of users.
The author is just not familiar with mature software engineering process. I wish the open source world was more attentive to proper release compatibility, it would make everyone's life so much better. It does take a bit of discipline but it's really not difficult at all.
At Sun there was strict adherence to release taxonomy and interface contracts, along the lines with what has recently been getting called "semantic versioning".
An interface was anything something external could depend on. Any library or other code must document its public interfaces. That's not just API calls, but all public touch points something might depend on. The author gets to decide what is part of the public and what isn't. That documentation is the contract.
If anything in the set of public interfaces ever changes in an incompatible way, that forces a major release increment. Ideally you never do that. But if you must, it's a new major release number so everyone knows.
Having this documented contract makes everyone's life easier and more predictable. Consumers of the library know exactly on which interfaces and behaviors they can count on not ever changing unless the major release number changes. The author of the code knows exactly when they need to declare a new major release.
Personally on all my open source projects I faithfully maintain this discipline, so if you depend on 1.x of anything I write, it will never break compatibility on public interfaces until I call it a 2.0 and then you know.
You lead with dunking on the author (yes, me) for describing a reality that you later acknowledge yourself. The article only says that SemVer in practice cannot be relied upon and describes ways how to deal with that fact of life. Nothing else.
Between opening up with "Over the years, well-intentioned people experimented with adding meaning to those numbers..." and a reference to Hyrum's Law, your article definitely reads as a criticism of SemVar and pins it as a failed process. It isn't until 1/3 of the way through that you call out the actual failure in the process: The user. Personally, I think the 'failure' of Semvar is in large part the fault of the Node community and their lack of rigor in their releases. SemVar works fine if used correctly.
That's not part of the article, but I indeed think that a process that (almost) nobody applies correctly, is not a good process. Telling people to just try harder is never the solution to anything.
I acknowledge the value of SemVer both here and in the article, but in the practical world it simply does not deliver the value that its proponents claim and users have to deal with it. The article is supposed to help with that.
I'm not familiar with the specific debate around the package I question, but what the author attacks is a very unusual, or very exaggerated, argument for SemVer.
The usual argument is not that it prevents all pain; it is widely accepted by its proponents that downstream users need to test minor and even patch releases before deploying them.
What SemVer does do is set expectations. A SemVer major change means that even if you rigorously rely on only the explicit guarantees of the public API, you need to carefully read the channel of before upgrading even for testing unless you are fond of wasted time. Also, you may want to consider whether any new features are useful.
SemVer minor means if you don't rely on things outside the guarantees of the public API, you should still do comprehensive regression testing, but the intent is that you should be safe. And you still might want to explore new features.
SemVer patch is essentially the same as minor for testing, and you don't need to consider new feature.
For SemVer minor and patch, it's also a statement that breakages are considered bugs, which might involve a commitment, or at least greater likelihood, of a fix from the project if you run into any.
SemVer is about communication of intent and setting expectations, not eliminating need for testing dependency updates or avoiding all pain.
‘Many forms of <versioning> have been tried, and will be tried in this world of sin and woe. No one pretends that <semantic versioning> is perfect or all-wise. Indeed it has been said that <semantic versioning> is the worst form of <versioning> except for all those other forms that have been tried from time to time.…’
I have been toying with the idea that version numbers should really be release.risk, where release is an incrementing number and risk is the projects estimate of the risk of breakage from the previous release. Eventually the risk part of a release would not be static, so as time goes on it can be changed to better suite how risky that release was.
This would then allow people to normalize risk (after enough releases) so they can have their own risk preferences. Also since we would be no longer trying to indirectly measure and indicate risk, we as an industry can get better at managing it as we would be getting higher quality feedback on how well we estimated or accepted risk.
Tooling could be developed that examines your codebase and the dependencies deltas to better estimate the risk to you of taking the update.
Risk tells you nothing. We're trying to condense a changelog into a few small numbers. There's no sense from it.
Upgrading software almost always requires checking things. I've never seen the point of versioning schemes and instead find simple linear numbering way more reasonable. There is no simple way to compare two versions of software, generically, aside from a human reading them.
The same people who tout semver as being the end-all-be-all are also okay with lock files. It makes no sense to me.
Risk is an indicator for how you should allocate your limited time. Since most software projects these days directly depend on dozens of software packages, you can't really afford to carefully vet each and every change to all of your third party dependencies (this is for the typical non-critical CRUD app, other applications it may vary).
Almost nobody has time to vet every line of code that changed or if the dependency is a binary blob, its expensive (and likely prohibited by licensing) to tell. That is why I would like a risk score over reading the tea leaves of a projects/products loose adherence to semver.
> you can't really afford to carefully vet each and every change
It's not about vetting. It's about documentation. Putting a list of breaking changes on the Releases page, if any, and who each affects (e.g. "users of the .getFirstName() method will need to call .getName() now") is much more valuable than "2.4". You're trying to fit the former into the latter, which you... just can't. Information theory forbids it.
I don't understand why we always jump to this dichotomy of "either you have a simple version scheme or you have to vet each and every single diff in every dependency and sub-dependency". No you don't - just read the release docs...
I think this thread is an interesting contribution to this debate and could provide useful tools (or inspiration for useful tools) that could help with the issues mentioned in Hynek's post.
"FASTEN and dependency analysis at call graph level"
Perhaps the most important practical point in this article is relegated to footnote 4:
> 4. Funny enough, a change in the build system that doesn’t affect the public interface wouldn’t warrant a major bump in SemVer, but let’s leave that aside.
If the Python cryptography package had been using SemVer, the change that some people are complaining about that introduced a Rust build-time dependency would not have bumped major, because SemVer is about the API, which didn’t change. And indeed most users can take a binary wheel distribution and so don’t need to build it from source and so don’t need to worry about the Rust dependency.
Notwithstanding this, I disagree with at least the strength of this article’s conclusion. SemVer won’t solve all your troubles, but in an ecosystem that supports you along the way (e.g. Rust + Cargo, Node.js + npm), it’s extremely close, though you will still probably want to pin versions automatically and update deliberately (so SemVer isn’t a total panacea because of this caveat). In such a proper ecosystem managed correctly, relying on SemVer:
• Can’t prevent breakage, but can make breakage very rare, which is better than nothing. (And if you shift your weight a bit so that you’re not quite relying implicitly, it helps a lot.)
• Does not lead to version conflicts. (Note that it is possible to get version conflicts if you’re not careful about how you expose types and values from your library’s dependencies, but if you design things the correct way, this won’t happen, hence “managed correctly”. However, I would say that Rust gives you the tools to do it the correct way, and that in the future this will be able to be machine-checked, while Node.js doesn’t hold your hand in any way, though you can do it properly.)
• Does not lead to security problems. I’m leaning toward judging this point in the original article specious, because it’s only applicable if other people have made a terrible mess of things (writing very bad version dependencies that everyone would frown upon), and that’s nothing to do with SemVer or otherwise—rather, I’d say that SemVer has helped here, because it makes it so that you have to deliberately go out of your way to deny security updates.
• Should add no meaningful burden on maintainers, though tooling is generally not quite there yet—notably, it’s generally somewhere between difficult and impossible to sanely bump the versions of the language or standard library that you depend on outside a major version bump, but that doesn’t make everyone happy, so this is still definitely something SemVer and periphery are not solving in Rust or Node.js lands.
> There’s also plenty of high profile projects that look like SemVer but aren’t: Linux, Python, Django, glibc…it’s fine!
Another interesting thing to note about these examples: they all maintain multiple branches simultaneously for security and bug fixes. Which I feel might argue against an important part of the article.
I don’t think that I’d try to slip an additional build time compiler in in a minor version update. While it’s not a runtime change, it is going to affect a large number of people, and they will all post comments on any issue that matches their particular “can’t install because we can’t find the header foo.h”.
(Speaking as someone who maintains a popular c/python package that is difficult to compile. )
If it were something that required building from source, I’d be more open to considering the build dependencies part of the API (though I’d still assess it on a case-by-case basis), but so long as you provide wheels, people don’t need to build the library, and so build dependencies definitely aren’t a part of the API. In the case of cryptography, I gather that the vast majority of complainants were on Gentoo (which builds things from source) or Alpine (for Docker containers, because many-musl wasn’t a thing, so no wheel could be provided) or an ancient architecture that should be allowed to rot anyway (for reasons that have been discussed in other recent articles). Yeah, it’s a fair number of people affected in the end, but probably a very small fraction of everyone; and once many-musl is defined in pip and cryptography uses it, it’ll be close enough to no one as no matter.
Remember also that if you’re using SemVer, a major version bump forms a schism because you can’t expect everyone to upgrade immediately, so major version bumps should not be performed lightly, especially for any sort of security-critical thing if you’re not going to maintain the old. I agree with Alex that, if using SemVer, bumping major would have been wrong here.
What I'm saying is that despite providing tens of wheels per release, pip still sometimes gets the source version, and it's predominately the unskilled users that descend en-masse to complain that it won't install, there's this error message, and what does it mean. As a maintainer, this sucks. Even if it's a very small fraction, the internet is large, and my time isn't.
I disagree with Alex. I understand his point of view, but I disagree.
I also disagree with his project's decision to print a non-disableable warning on import in python 2.7, and then justifying the noisy warning based on "we didn't want people to disable it". One legacy project I'm working on is trying to get off 2.7. It's a long process, we know about it, and warnings like this all over the logs simply don't help.
But incrementing the major version wouldn't have helped those who weren't pinning to an earlier version, something I suspect applies to many of those unskilled users.
> You want to claim that version 3.2 is compatible with version 3.1 somehow, but how do you know that?
I don't, not really. All I want to claim is that I to the best of my knowledge, the public API I've documented will not change. [1] It's up to the user of the library to decide what to do with that. Usually, that should be some manner of trust, but verify: a mostly-blind upgrade, but with automated sanity checks ran afterwards.
But the other way around is more important: if I bump the major version, I want them to check my release notes, because I'm expecting it to be likely for them to need to follow some instructions there.
Other than that I completely agree with this article. SemVer is a service to provide a quick tl;dr of your changelog, and the rest is up to consumers.
We've flirted with separating interface from implementation over and over again. We also talk sometimes about the tests being the actual requirements definition for the system.
If we versioned the tests and declared which version of the tests pass with a given version of the code, we might get closer to a workable system. Most of the time a bug fix should result in new tests, not changing old tests - unless there are backward incompatible changes. This may require us to up our test writing game, but that sounds like a good carrot to me.
Semver is mostly about 'increment a number on every change, increment another number when the changes are not backward compatible (covariant?)' for some definition of compatible. But if we have tried to define 'compatible' in a rigorous way, I haven't seen it. I suspect someone in Formal Methods has something up their sleeve. I'd be curious to know if they have anything we can cherry-pick.