Hacker News new | past | comments | ask | show | jobs | submit login
ABI – Now or Never [pdf] (open-std.org)
63 points by xilni 24 days ago | hide | past | web | favorite | 36 comments

C++ ABI issues seem to be in this paradoxical situation where the language standard powerbrokers simultaneously feel that

a. Having a real binary compatibility story is beneath C++, but

b. The accidental ABI compatibility that exists today is too widely adopted to break.

Basically every modern platform (eg free of 90s mistakes) uses the itanium ABI, which defines vtable layout, RTTI layout.

But platforms define the final memory and calling conventions so that can’t be part of any language spec - this is not unique to C++.

Windows has its own ABI, which it has had for a long time, so they can’t change it, so on x86 windows it will always be that.

The ABI discussed here is a bit higher level than that. Even with a fixed ABI for vtables, calling conventions, etc. you still have to care about what happens when you change a "vocabulary type" in the standard library- some changes are source-compatible but binary-breaking.

For example, C++11 broke ABI at this level by changing the representation of std::string.

Oof I was unaware of the string ABI break (just had to look it up) - that’s kind of gross :-/

That said in general ABI compatibility is expected of all changes, without very good reasons - security seems like a good one, performance alas not - I assume std::string got the small string optimization because they were already going to break ABI for the threading issues.

Of course it doesn’t help that c++ is still C like so implementations/use of data layout gets compiled directly into the client application :-/

Am I wrong in thinking that this mild form of schizophrenia ( no official ABI but dont break the ABI) is part of the reason why we now live in a world where all applications talk to each-other through sockets, incurring huge overheads?

You can use shmem instead of sockets, which is pretty fast. Not as fast as a direct call, but pretty good; good enough for most purposes.

Have you actually measured? The overhead isn't the socket, but the marshaling and unmarshaling, and you need these things for sockets, shared memory, or any other IPC.

You don't need to (un)marshal necessarily; stuff like strings, or arrays of integers, can go straight across. And if you have to pass large amounts of data, they'll probably take a form that's something like that.

shmem = shared memory?



Applications talk via sockets to reduce coupling, thus fragility.

What is the "runtime overhead involved in passing unique_ptr by value"?

The reference is a 1h YouTube video.

Short version: it passes a pointer to the pointer forcing a double indirection rather than a single.

Simple attempts to fix don't really work. Not even sure an ABI break will be enough, but it would at least be a minimum requirement.

The standard would probably need to introduce destructive move semantics and trivially relocatable types (unique_ptr for example) to enable simplifying the ABI for such types. Then of course an ABI break is required to actually apply the changes.

Makes me wonder why anybody takes them by value, and not by rvalue-reference.

But unique_ptr in public interfaces feels like code smell. I have done it, but not proudly.

Quite a few times, I've made C++ library APIs like this:

    struct iface
        virtual ~iface(){ }
        virtual void doSomething() = 0;
        static std::unique_ptr<iface> create();
I don't know good ways to replace unique_ptr here.

Requiring library users to #include the concrete implementation is not good: inflates compilation time, pollutes namespaces, pollutes IDE's autocompletion DB.

Can go C-style i.e. pass double pointer argument to the factory, or return raw pointer. But then user needs to remember to destroy the object.

Returning them by value is fine. *RVO avoids inefficiencies.

But it would often be better to make a move-only value type, and return that instead.

Seriously: if you have to pass its address regardless, why pay to construct and destroy another one that you only wanted to std::move out of anyway?

The video: https://www.youtube.com/watch?v=rHIkrotSwcc

I wrote up a reddit post for a possible workaround for removing the overhead. It's standard C++, no ABI break is required. It's not without caveats though: https://www.reddit.com/r/cpp/comments/do8l2p/working_around_...

It's the overhead VS. passing a raw pointer. The itanium ABI says that std::unique_ptr has to be passed by address due to its special member functions (the ABI doesn't know if it stores a pointer to itself).

Compilers have an attribute to remove this overhead, but it's an ABI break to do it.

Thanks, I always assumed that a unique ptr was identical to a regular pointer under the hood.

Discussion of whether the next (or any future) c++ release should break abi compatibility.

Does C++ have an ABI?

Every implementation implicitly defines an ABI due to the size and layout of classes/structs defined in the STL.

Here's a simple example. Suppose we define std::string with the following layout (for simplicity I'm removing the template stuff, SSO, etc.):

  class string {
    // various methods here...

    size_t size() const { return len_; }
    char *data_;
    size_t len_;
    size_t capacity_;
When a user calls .size() on a string, the compiler will emit some inlined instructions that access the len_ field at offset +8 bytes into the class (assuming 64-bit system).

Now suppose we modify our implementation of std::string, and we want to change the order of the len_ and capacity_ fields, so the new order is: data_, capacity_, len_. If an executable or library links against the STL and isn't recompiled, it will have inlined instructions that are now reading the wrong field (capacity_).

This is what we mean by the C++ ABI. This is a simple example, but there are a lot of other changes that can break ABI this way.

That's not exactly the same. What you're referring to is the library ABI for the c++ standard library. Every library which can be linked to dynamically has its own ABI. The language ABI, on the other hand, describes how every library's ABI is defined, describing things like layout and name mangling. So if, for instance, the language ABI were amended to say that class members are arranged in alphabetical order in memory, then capacity_ will always be at offset 0, data_ at offset 8, and len_ at offset 16; if you change the order of declarations then, the library's ABI won't change. But if the library were compiled with an old compiler that targeted the old ABI, then it would put data_ first, followed by len_, then capacity_. So if you then compiled a new piece of code with a new compiler targeting the new ABI, but linked against it the library, there would be a mismatch.

It doesn't have a standard ABI.

Nevertheless, certain language changes can force a breaking change to any existing ABI (or even all of them, and the C++ committee does not work in a vacuum. They work with existing implementations and must agree with implementers before making changes to the standard.

For example, there was a change to the definition of std::string in C++11 that forced a break in all commonly used ABIs (MSVC, Itanium at least). This was deemed necessary, but the cost of it to real-world programs has proven higher than anticipated, and may be a regretted decision (it apparently still causes problems and requires special flags even today).

C++ doesn't but Windows C++ does, to the extent it's implicitly used for things like COM. Any breaking change would have to be managed carefully.

On the contrary one of the reason of COM is to not depend on the C++ ABI, which is not stable at all under Windows (it has been de-facto stable for the three last version of MSVC, but was broken each time before, and the recent stable stride is not an indication this ABI compat will continue - actually it is well known that MS internally maintains an ABI incompat version of the STL that will be very probably used in the future, to fix some issues and optimize things)

On another platform, libstdc++ is mostly backward compatible, within reason.

The C++ standard is not "officially" concerned by stability, except that in practice people in the committee care a lot (because some major implementations care a lot) so some modifications are rejected because they would break the ABI currently used in practice.

In regards to COM I was specially thinking about how virtual functions and methods are layed out. This cannot change without breaking a lot of code. Already this causes issues in non-C++ languages (e.g. the difference with thiscall in C++).

The vtable layout is much smaller and much more stable than C++ as a whole. COM has a ton of value.

And it got even better with the UWP changes.

You don't get a stable ABI by accident. MS chose to maintain stability on these recent releases. This is a change from past practice, in response to customer needs.

Customers who needed stability were staying on ancient compilers. MS probably would rather have them using new versions, and exercising new features.

No, but it has a large number of implementing ABIs subject to complicated requirements. Both those explicit requirements upon ABIs and individual implementer decisions create legacy that conflicts with things that could make the language better.

All platforms have a standard ABI. Windows' (more specifically, MSVC's, as mingw g++ does not follow it) is mostly undocumented, but substantial portions are reverse-engineered. Most other platforms use some modification of the Itanium ABI, which describes the ABI in terms of a C structs and functions. ARM uses Itanium, with a somewhat different mechanism for exception handling.

Not one described by the C++ Standard. Neither does C.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact