
C++ Language Interface Foundation (CLIF) - matt_d
https://github.com/google/clif
======
rayiner
C++ is an extremely challenging language to write a wrapper generator for. I
did a C++ to Common Lisp generator about a decade ago for my GSoC project,
based on GCC-XML (with the goal of being able to wrap QT). The two challenges
you run into are that the GCC C+ ABI is extremely complicated and requires a
runtime on top of whatever low-level C FFI you're working with.

The second is that many semantics just don't map very easily. I imagine the
latter problem has only gotten worse with heavier use of templated types in
APIs. It looks like CLIF doesn't expose the template per se (at least in the
Python examples) but requires explicit instantiation of each specialization as
a distinct Python class. Templates mess with the very concept of an FFI
generator, which tends to assume a reasonably clear separation between compile
time and run time (so you don't need to run a C++ compiler as a pre-pass on
any code that uses a binding).

All that is a roundabout way of saying: for god's sake write your system APIs
in C.

~~~
chubot
Yeah I've grown to appreciate C++ in many ways, but what I noticed at my last
job is that the baroque interfaces give it a viral effect.

Once you have a little C++ code, the easiest thing to do is to continue
writing more C++ code, even if it's not the best tool for that job. It's too
hard to use a C++ interface from any other language. Even wrapping it in C is
annoying, although as I understand LLVM does exactly that for some of its API.

IMO there are a lot of systems where C++ is the best language for about 10% of
the code. C++ really is unique in terms of offering zero-cost abstraction. But
then the remaining 90% gets written in C++ too. It can be remarkably awkward
for many problems, and your build times scale nonlinearly too.

In some way this is inherent ... the compiler's job is to erase all those
abstractions and generate straightforward machine code. In other ways it is
the fault/result of adhering to the C linking model, which ironically was for
interoperability.

C++ is sort of like a universal receiver and not a universal donor. It can
assimilate any C code, and thus it gets code in other languages transitively.
But other languages can't assimilate it, at least not without great effort.

That said, Clang has a better API than GCC-XML, so the problem might be
solved. Athoough honestly C++ just keeps growing more features with C++ 11,
14, 17 that make it harder to interoperate with any other code. It's this huge
compile-time language completely separate from the C linking model.

~~~
atilaneves
> It's too hard to use a C++ interface from any other language.

D has the best support for this. It's not as simple as `#include
<cppheader.hpp>` but it's better than any other language at it. Name mangling
matches, C++ exception support and C++ abstract classes can be declared as D
interfaces and everything works.

> C++ really is unique in terms of offering zero-cost abstraction

Rust and D (at the very least) would disagree.

~~~
chubot
OK, good to know about D.

As far as zero cost, my understanding is that D is still trying to get rid of
GC from the standard library.

And I'm explaining why I think Rust is having a hard time getting adoption. It
makes an effort to be compatible with C, but not with C++. But really C++ is
its "competitor", not C.

And C++ has the network / lock-in effect I described. All languages have a
network effect to some degree (libraries, documentation), but I think the
situation with C++ is especially acute.

Also, I would say that safety and zero-cost abstractions are the goal of Rust.
Those two goals conflict somewhat -- e.g. in the decision over bounds
checking. Although I guess you can say that without bounds checking you have
no abstraction; you just have a pile of buggy code at zero cost :)

------
gpshead
I wondered how long it would take before someone posted CLIF here since we
haven't put up a blog post about it yet... :)

Interesting stat: Around 80% of our the Python C++ extension module wrappings
being added to our code base are now being done using CLIF instead of SWIG. We
are actively working on forbidding new Python SWIG wrappers from being added
and migrating important legacy wrappings off of it onto CLIF.

Who am I? I am TL of the team that create CLIF (design and code reviewer,
_not_ a primary author; they can identify themselves as they see fit). -gps@

~~~
obstinate
When are y'all doing Go and Java, though? The fact that this exists just
compounds the pain of writing SWIG for the other two languages. The grass
really is greener.

~~~
mish33
Go and Java claimed they can do without C++, so let them suffer :)

It takes time and because we love Python we did it first. Others will come.

------
jzwinck
Having used both SWIG and Boost.Python professionally, I find manual work
required by the latter to be 100% worthwhile and usually necessary.

C++ and Python are not very similar, and the result of auto-binding them
typically gives up a combination of fluency and performance.

A simple example of the former is a C++ algorithm which writes to an output
iterator. In Python this might be expressed as a generator. But none of the
auto-binding tools can do this transformation.

As for the latter--performance--I have seen 100x performance penalties in
practice when the blind lead the bind. A good example is that in Python we
have memoryview and the buffer protocol, but auto-binding tools take these no
further than std::vector (if that). A C++ API which produces a large stream of
numeric data just begs to be bound using the buffer protocol or perhaps even
NumPy directly. But if a C++ API produces small values really quickly, an
auto-bound one will produce the same small values really slowly.

Some people see the writing of bindings as manual labor to be avoided. I see
it as an optimization opportunity. There are huge gains available.

~~~
stinos
> I find manual work required by the latter to be 100% worthwhile and usually
> necessary

Roughly the same experience. Automatic generation for anything more than
trivial examples always seems to lead to the need for more and more
configuration to keep everything in line, to the point where just writing
everything by hand becomes less work. This is likely due to the type of
software we use it for, but still, the amount of time I lost on SWIG really
seems wasted instead of leaving the feeling of having learnt at least
something. It was all not very pleasant. The Python side of our software gets
exposed to less tech-savvy users and is meant to be a more friendly layer over
C++ functions and classes. But the C++ side isn't really an API in the sense
there is no hard distinction between the 'internal' layer and the API layer.
Some class which is only used internally might the next day also be wrapped to
be available in Python. So there's no single directory or so one can point to
and say 'everything in there is the API'. Also given a C++ class, sometimes
only a couple of methods have to get exposed to the users, sometimes with a
different name even, or some arguments defaulted etc. Preferrably without
having to write a wrapper just for the sake of exposing it. All this turned
out to be a nightmare in SWIG. Just manually writing one or more lines for
registering the function manually (not Boost.Python but similar) is usually a
one-time-almost-never-look-back thing which, for us, is way less work in the
end. Even with the hundreds of functions we have already. And of course
performance is the other advantage.

~~~
mountain_lion
CLIF is used inside Google for some very non-trivial projects.

~~~
stinos
Well I'm always open to new ways of doing things. Any estimate of how hard it
would be to generate MicroPython [1] wrappers for C++ code (in a rather
configurable way as described earlier)?

[1]
[https://github.com/micropython/micropython](https://github.com/micropython/micropython)

~~~
gpshead
It seems entirely reasonable for someone to create a MicroPython wrapper
generator. It appears to have a C/C++ API so it should be similar to the
existing CPython Generator and Runtime.

------
gumby
Finally! I have been suffering under SWIG and have been hoping for some time
that someone would get the compiler to do this.+

Not to say SWIG isn't an amazing effort, but it's a whole C++ compiler
maintained by a very small team.

\+ I already put my time in at the gcc salt mines so no, I didn't try doing
this myself.

~~~
m-j-fox
I'm cautiously excited. The problem with C++ is there's just so many good
frameworks but you can't use them without creating more C++.

SWIG should be called WIG. And anyway, I can't see it surviving modern C++.

As long as I'm ranting, what is modern C++ but a bunch of new languages that
are incompatible with C++, each other, and everything else? Forgive me if I'm
wrong but this seems like a terrible idea.

~~~
pjmlp
Modern C++ means picking up the ideas from Alexandrescu, avoiding unsafe C
style programming unless profiler tells otherwise and using the higher level
features from C++ for writing nice, usable, safe libraries.

Many of the idioms actually already possible back in the C++ARM days, before
C++98 was a thing, but spoiled by C refugees.

~~~
m-j-fox
Maybe I got my terminology wrong. I'm talking about the wave of new standards.
I feel like they're a bunch of new languages which are improvements over C++
but not backwards compatible with C++. Every problem you have interoperating
two different languages you have between these different C++es but worse
because there aren't tools like SWIG to help you.

~~~
pjmlp
You can say the same about any programming language that enjoys wide market
adoption, except maybe for C that still thinks computers are like PDP-11's,
with C99 and C11 being very tiny evolutions with little regard to improve the
overall productivity.

Even Fortran and Cobol(!) have evolved more than C.

------
malkia
There are several issues that I usually hit with bindings to C/C++ and other
languages:

    
    
      - Calling a native function that itself takes callback, and that callback might be your non-native code. (trampolines?)
      - Dealing with memory allocation. Who owns what.
      - Exceptions or long_jmps
      - Green threads, fibers, etc.
      - Other?

~~~
dekhn
callbacks in swig are straightforward. you have a C/C++ reciver function that
gets registered along with a pointer to the scripting language target
function. I've used this design for over a decade.

~~~
malkia
I was mentioning FFI in general, like Common Lisp's FFI, LuaJIT, others. For
example for callback back, you need to have a trampoline, where the "C/C++"
trampoline would call back your language runtime "engine". But this introduces
gap in the "stack", and might not work with all VM's, or you may run out of
trampoline slots, and not being able to allocate new, since this would require
dynamic code generation, and say on iOS and certain other game consoles that's
not allowed.

~~~
dekhn
there are no trampoline slots required. it's a function pointer that's
provided as an argument (user data). Can you point to an FFI that doesn't
support callbacks in this way? This would be a major problen in the FFI
implementation.

~~~
ihnorton
LuaJIT limits both the number of registered callbacks [1], and re-entering the
interpreter from a JIT'd callback [2] (it will try to detect and prevent JIT
for such functions, but if that fails you get a panic).

[1] [http://stackoverflow.com/questions/31042530/why-does-
luajit-...](http://stackoverflow.com/questions/31042530/why-does-luajit-
produce-a-too-many-callbacks-error-from-this-simple-code)

[2] [http://stackoverflow.com/questions/25924755/c-and-lua-
unprot...](http://stackoverflow.com/questions/25924755/c-and-lua-unprotected-
error-bad-callback-how-is-this-possible)

~~~
dekhn
that's a limitation of a specific FFI, which is apparently a deisgn choice
motivated by limited memory systems.

~~~
malkia
Yes, sorry - I'm simply speaking as an user of a given language runtime
connecting to "C" exported functions. I understand that given a different
implementation this would've not be needed.

------
waynecochran
Having trouble groking this. Can someone give a simple use case / example? If
the Parser generates language agnostic data, how can this data be passed to
the Matcher which parses C++ headers (i.e., C++ headers are _not_ language
agnostic).

A good README should give a "high level" description of what the things is and
what its used for.

------
krona
The repository mentions _" other languages"_ but I could only find Python
examples?

While I think wrapper generators are a noble idea, I doubt I would choose this
over the convenience of IDL-like metaprogramming approaches (e.g. pybind11, or
Boost.Python)

~~~
gpshead
Metaprogramming such as pybind11 does is neat. Thanks for the pointer to the
project. It still looks like manually written C extension modules to me based
off of what I see in
[https://github.com/pybind/pybind11/blob/master/docs/basics.r...](https://github.com/pybind/pybind11/blob/master/docs/basics.rst).
Just much shorter code with a lot of the hard to get right details taken care
of you. Good! Better than the status quo. But it doesn't abstract the problem
away very much. (no doubt some will consider that a feature)

What if I also wanted my wrapper available on a non CPython VM? The proper way
to use PyPy is not via its fake CPython API support (slow and memory hungry).
Imagine if a PyPy Generator were added to CLIF. It'd use the same interface
definition Parser but generate an entirely different set of code. In PyPy's
case that would probably be a C library wrapping the C++ for generated cffi
based Python code to interact with.

Admittedly all hypothetical here until other Python VMs have CLIF Generator
implementations.

~~~
ihnorton
Semi-related, in case you haven't seen it:

[http://doc.pypy.org/en/latest/cppyy.html](http://doc.pypy.org/en/latest/cppyy.html)

~~~
gpshead
Thanks! Eek! An XML interface definition. In 2017!

------
santaclaus
Can this wrap Qt (or barring Qt, wxWidgets)? There are great Qt bindings in
Python, but in other languages, not so much.

------
ddobrev
Fellows, I would suggest you check
[https://github.com/mono/CppSharp](https://github.com/mono/CppSharp). It's
Clang-based as well and despite the name, it's not bound to C# or .NET,
generators for any languages can be added. It's feature complete with the
exception of templates which are being worked on as we speak. It's also fully
automated, manual intervention is only required if the user wants binding-
specific customisation.

------
nerdponx
How much C++ do I need to know in order to wrap some else's C++ library?

~~~
mhh__
Quite a lot, or at least by proxy given that you would have to talk to library
and compilers that are written in C++.

However: Some libraries/packages do expose a C api, which you could use in a
language assuming it had a decent FFI.

A lot of major C++ projects use exceptions, which you will have to catch and
handle inside of the C++ because even if C could catch exceptions, there is no
guarantee of ABI compatibility. Some languages, I think D can do it, can catch
an exception thrown in C++ code.

You might have better luck integrating a standard data interface between the
C++ and any other language: However, that could be a little slow.

------
cjhanks
I like this idea, I will try this soon. Is there any plan for somewhat
transparently supporting NumPy?

~~~
mish33
NumPy support is a borderline issue for CLIF.

Typical usage I saw is the following: C++ has some class(es) that best
represented as NumPy arrays. So because those classes are specific to the
project (not generic like std::) they don't fit into CLIF runtime, but the
project supply a C++ library with custom conversion functions as described in
ext.md that use NumPy C API tells CLIF that those classes are convertible to
Python objects (that NumPy objects are).

That all NumPy integration CLIF needs and it has to come from the user.

