
ABI – Now or Never [pdf] - xilni
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1863r1.pdf
======
gok
C++ ABI issues seem to be in this paradoxical situation where the language
standard powerbrokers simultaneously feel that

a. Having a real binary compatibility story is beneath C++, but

b. The accidental ABI compatibility that exists today is too widely adopted to
break.

~~~
olliej
Basically every modern platform (eg free of 90s mistakes) uses the itanium
ABI, which defines vtable layout, RTTI layout.

But platforms define the final memory and calling conventions so that can’t be
part of any language spec - this is not unique to C++.

Windows has its own ABI, which it has had for a long time, so they can’t
change it, so on x86 windows it will always be that.

~~~
Rusky
The ABI discussed here is a bit higher level than that. Even with a fixed ABI
for vtables, calling conventions, etc. you still have to care about what
happens when you change a "vocabulary type" in the standard library- some
changes are source-compatible but binary-breaking.

For example, C++11 broke ABI at this level by changing the representation of
std::string.

~~~
olliej
Oof I was unaware of the string ABI break (just had to look it up) - that’s
kind of gross :-/

That said in general ABI compatibility is expected of all changes, without
very good reasons - security seems like a good one, performance alas not - I
assume std::string got the small string optimization because they were already
going to break ABI for the threading issues.

Of course it doesn’t help that c++ is still C like so implementations/use of
data layout gets compiled directly into the client application :-/

------
ClumsyPilot
Am I wrong in thinking that this mild form of schizophrenia ( no official ABI
but dont break the ABI) is part of the reason why we now live in a world where
all applications talk to each-other through sockets, incurring huge overheads?

~~~
earenndil
You can use shmem instead of sockets, which is pretty fast. Not as fast as a
direct call, but pretty good; good enough for most purposes.

~~~
quotemstr
Have you actually measured? The overhead isn't the socket, but the marshaling
and unmarshaling, and you need these things for sockets, shared memory, or any
other IPC.

~~~
earenndil
You don't need to (un)marshal necessarily; stuff like strings, or arrays of
integers, can go straight across. And if you have to pass large amounts of
data, they'll probably take a form that's _something_ like that.

------
petters
What is the "runtime overhead involved in passing unique_ptr by value"?

The reference is a 1h YouTube video.

~~~
chandlerc1024
Short version: it passes a pointer to the pointer forcing a double indirection
rather than a single.

Simple attempts to fix don't really work. Not even sure an ABI break will be
enough, but it would at least be a minimum requirement.

~~~
ncmncm
Makes me wonder why anybody takes them by value, and not by rvalue-reference.

But unique_ptr in public interfaces feels like code smell. I have done it, but
not proudly.

~~~
Const-me
Quite a few times, I've made C++ library APIs like this:

    
    
        struct iface
        {
            virtual ~iface(){ }
            virtual void doSomething() = 0;
            static std::unique_ptr<iface> create();
        }
    

I don't know good ways to replace unique_ptr here.

Requiring library users to #include the concrete implementation is not good:
inflates compilation time, pollutes namespaces, pollutes IDE's autocompletion
DB.

Can go C-style i.e. pass double pointer argument to the factory, or return raw
pointer. But then user needs to remember to destroy the object.

~~~
ncmncm
Returning them by value is fine. *RVO avoids inefficiencies.

But it would often be better to make a move-only value type, and return that
instead.

------
rwbhn
Discussion of whether the next (or any future) c++ release should break abi
compatibility.

~~~
jacobush
Does C++ have an ABI?

~~~
eklitzke
Every implementation implicitly defines an ABI due to the size and layout of
classes/structs defined in the STL.

Here's a simple example. Suppose we define std::string with the following
layout (for simplicity I'm removing the template stuff, SSO, etc.):

    
    
      class string {
       public:
        // various methods here...
    
        size_t size() const { return len_; }
     
       private:
        char *data_;
        size_t len_;
        size_t capacity_;
      };
    

When a user calls .size() on a string, the compiler will emit some inlined
instructions that access the len_ field at offset +8 bytes into the class
(assuming 64-bit system).

Now suppose we modify our implementation of std::string, and we want to change
the order of the len_ and capacity_ fields, so the new order is: data_,
capacity_, len_. If an executable or library links against the STL and isn't
recompiled, it will have inlined instructions that are now reading the wrong
field (capacity_).

This is what we mean by the C++ ABI. This is a simple example, but there are a
lot of other changes that can break ABI this way.

~~~
earenndil
That's not exactly the same. What you're referring to is the library ABI for
the c++ standard library. Every library which can be linked to dynamically has
its own ABI. The language ABI, on the other hand, describes how every
library's ABI is defined, describing things like layout and name mangling. So
if, for instance, the language ABI were amended to say that class members are
arranged in alphabetical order in memory, then capacity_ will always be at
offset 0, data_ at offset 8, and len_ at offset 16; if you change the order of
declarations then, the library's ABI won't change. But if the library were
compiled with an old compiler that targeted the old ABI, then it would put
data_ first, followed by len_, then capacity_. So if you then compiled a new
piece of code with a new compiler targeting the new ABI, but linked against it
the library, there would be a mismatch.

