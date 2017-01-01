The second is that many semantics just don't map very easily. I imagine the latter problem has only gotten worse with heavier use of templated types in APIs. It looks like CLIF doesn't expose the template per se (at least in the Python examples) but requires explicit instantiation of each specialization as a distinct Python class. Templates mess with the very concept of an FFI generator, which tends to assume a reasonably clear separation between compile time and run time (so you don't need to run a C++ compiler as a pre-pass on any code that uses a binding).
All that is a roundabout way of saying: for god's sake write your system APIs in C.
Once you have a little C++ code, the easiest thing to do is to continue writing more C++ code, even if it's not the best tool for that job. It's too hard to use a C++ interface from any other language. Even wrapping it in C is annoying, although as I understand LLVM does exactly that for some of its API.
IMO there are a lot of systems where C++ is the best language for about 10% of the code. C++ really is unique in terms of offering zero-cost abstraction. But then the remaining 90% gets written in C++ too. It can be remarkably awkward for many problems, and your build times scale nonlinearly too.
In some way this is inherent ... the compiler's job is to erase all those abstractions and generate straightforward machine code. In other ways it is the fault/result of adhering to the C linking model, which ironically was for interoperability.
C++ is sort of like a universal receiver and not a universal donor. It can assimilate any C code, and thus it gets code in other languages transitively. But other languages can't assimilate it, at least not without great effort.
That said, Clang has a better API than GCC-XML, so the problem might be solved. Athoough honestly C++ just keeps growing more features with C++ 11, 14, 17 that make it harder to interoperate with any other code. It's this huge compile-time language completely separate from the C linking model.
D has the best support for this. It's not as simple as `#include <cppheader.hpp>` but it's better than any other language at it. Name mangling matches, C++ exception support and C++ abstract classes can be declared as D interfaces and everything works.
> C++ really is unique in terms of offering zero-cost abstraction
Rust and D (at the very least) would disagree.
As far as zero cost, my understanding is that D is still trying to get rid of GC from the standard library.
And I'm explaining why I think Rust is having a hard time getting adoption. It makes an effort to be compatible with C, but not with C++. But really C++ is its "competitor", not C.
And C++ has the network / lock-in effect I described. All languages have a network effect to some degree (libraries, documentation), but I think the situation with C++ is especially acute.
Also, I would say that safety and zero-cost abstractions are the goal of Rust. Those two goals conflict somewhat -- e.g. in the decision over bounds checking. Although I guess you can say that without bounds checking you have no abstraction; you just have a pile of buggy code at zero cost :)
http://erlang.org/doc/tutorial/nif.html
This is part of the language's design.
Bjarne's goal after being forced to use BCPL instead of Simula was never to use a bare bones language ever again, and C with Classes needed to fit into AT&T's C tooling.
Hence why we also didn't got modules back then and a funky name mangling to fit into those bare UNIX linkers.
As for being an universal donor, the story is a bit different in OSes where the ABI is not C based, e.g. BeOS, Symbian, Genode, Windows (COM, .NET, UWP), OS/400 (TIMI), z/OS (ILC).
But I'm not sure the goal was to make C++ hard to use from other languages. I think that just fell out of the focus on zero-cost abstractions.
COM seems like the right middle ground between baroque C++ interfaces and RPC/message passing. You write native code, but it can interoperate dynamically with components in other languages, in the same address space. But I think it is overly tied to OOP, and that doesn't play well with the style of Unix.
I wonder if it would be possible to do better, or if that ship has sailed. I'm not overly familiar with Windows... I know there were some problems with COM but it seemed basically sound. I used JScript once and it was pretty powerful.
OS/2 SOM was better in that it supported implementation inheritance and meta-classes.
Apple also had some nice ideas for Copland and how to further develop Taligent, but there is little documentation left of that effort.
Or at least expose a C-compatible ABI.
Much better and we get to use a modern OO API.
Hence why better Windows support is relevant for Rust, if you want it to be more seriously taken by Windows devs.
EDIT: Same applies to mainframes, which also don't follow a C ABI, rather their own native languages.
(edit) Clang's internals are also several orders of magnitude easier to work with than GCC's internals. In fact, clang is designed to be used as a library, and CLIF leverages that very, very highly.
See, for example, https://clang.llvm.org/docs/ClangTools.html . Clang tools are very powerful, and much of CLIF is based on a Clang tool, not a GCC internals hack.
I know that LLVM is notable for not maintaining API stability -- not sure about Clang.
That API doesn't expose absolutely everything though, which is one reason it can be so stable. Tools that need more are typically shipped and built along side the clang source code. CLIF is the latter.
You can read about both of them here:
http://clang.llvm.org/docs/Tooling.html
And cliff takes care to error out when the wrapper description doesn't match the C++. "When in doubt, refuse the temptation to guess."
Our internal clients really like it. Someone described it the other day as "magic".
In can mean something so beautifully advanced that you cannot distinguish it from advanced tech and you really don't care how it works, since it works so flawlessly you couldn't bother looking behind the scenes. I.e. a praise.
Or it can mean something that uses some arcane unearthly constructs and at the moment you need to look behind the scenes you find yourself utterly lost. I.e. a critique.
At least if you weren't in the room as well. I'm assuming you weren't?
I understand why SWIG needs a .i file, as it doesn't understand much about the code. But when you control the compiler you can 1> as in the example I gave, look at actual explicit instantiation instead of doing the synthetic thing SWIG does, as well as make other, smarter decisions and 2> use #pragmas or attributes to direct translation at the point of use.
And the author even said himself that "code two files is a pain."
Thus my question remains: why the need for a .clif file?
Structured comments could work, but because the C++ compiler doesn't parse them, you essentially have a file-with-a-file, and 80% of the problems you encounter with the dual-file system.
You also have the problem of the API's client trying to understand what the API looks like. We don't want to make them read C++ code in any way, and a pyclif file looks a lot like python.
Finally, CLIF is substantially more terse than SWIG--on the order of 1/10th the number of lines. This makes it less of a big deal to have a separate file.
No. Use an IDL to define component interfaces and a generator for any particular language. That's how C++ components interoperate on Windows. (COM is just a formalization of vtables generated from an IDL file and compiled with midl compiler.)
Interesting stat: Around 80% of our the Python C++ extension module wrappings being added to our code base are now being done using CLIF instead of SWIG. We are actively working on forbidding new Python SWIG wrappers from being added and migrating important legacy wrappings off of it onto CLIF.
Who am I? I am TL of the team that create CLIF (design and code reviewer, _not_ a primary author; they can identify themselves as they see fit). -gps@
It takes time and because we love Python we did it first. Others will come.
C++ and Python are not very similar, and the result of auto-binding them typically gives up a combination of fluency and performance.
A simple example of the former is a C++ algorithm which writes to an output iterator. In Python this might be expressed as a generator. But none of the auto-binding tools can do this transformation.
As for the latter--performance--I have seen 100x performance penalties in practice when the blind lead the bind. A good example is that in Python we have memoryview and the buffer protocol, but auto-binding tools take these no further than std::vector (if that). A C++ API which produces a large stream of numeric data just begs to be bound using the buffer protocol or perhaps even NumPy directly. But if a C++ API produces small values really quickly, an auto-bound one will produce the same small values really slowly.
Some people see the writing of bindings as manual labor to be avoided. I see it as an optimization opportunity. There are huge gains available.
Roughly the same experience. Automatic generation for anything more than trivial examples always seems to lead to the need for more and more configuration to keep everything in line, to the point where just writing everything by hand becomes less work. This is likely due to the type of software we use it for, but still, the amount of time I lost on SWIG really seems wasted instead of leaving the feeling of having learnt at least something. It was all not very pleasant. The Python side of our software gets exposed to less tech-savvy users and is meant to be a more friendly layer over C++ functions and classes. But the C++ side isn't really an API in the sense there is no hard distinction between the 'internal' layer and the API layer. Some class which is only used internally might the next day also be wrapped to be available in Python. So there's no single directory or so one can point to and say 'everything in there is the API'. Also given a C++ class, sometimes only a couple of methods have to get exposed to the users, sometimes with a different name even, or some arguments defaulted etc. Preferrably without having to write a wrapper just for the sake of exposing it. All this turned out to be a nightmare in SWIG. Just manually writing one or more lines for registering the function manually (not Boost.Python but similar) is usually a one-time-almost-never-look-back thing which, for us, is way less work in the end. Even with the hundreds of functions we have already. And of course performance is the other advantage.
[1] https://github.com/micropython/micropython
But a "there be dragons" caveat applies as getting the CPython API correct is complicated. Reference counting and error checking bugs are common in hand written CPython C API code.
For most cross language bindings the primary goal is to "just work reliably". Optimization can happen later after you have profiles to figure out where it is worth doing and in what way.
$LANG (e.g. Python) <-> JSON <-> Pipe <-> JSON <-> C++
Python <-> ctypes <-> C-API <-> C++
Not to say SWIG isn't an amazing effort, but it's a whole C++ compiler maintained by a very small team.
+ I already put my time in at the gcc salt mines so no, I didn't try doing this myself.
[0] Actually, there is one tiny exception: CLIF occasionally separates a C++ qualified name on the scope-resolution operator "::".
SWIG should be called WIG. And anyway, I can't see it surviving modern C++.
As long as I'm ranting, what is modern C++ but a bunch of new languages that are incompatible with C++, each other, and everything else? Forgive me if I'm wrong but this seems like a terrible idea.
Many of the idioms actually already possible back in the C++ARM days, before C++98 was a thing, but spoiled by C refugees.
Even Fortran and Cobol(!) have evolved more than C.
- Calling a native function that itself takes callback, and that callback might be your non-native code. (trampolines?)
- Dealing with memory allocation. Who owns what.
- Exceptions or long_jmps
- Green threads, fibers, etc.
- Other?
- int64 -> uint64 conversion
- int64/uint64 -> double conversion
[1] http://stackoverflow.com/questions/31042530/why-does-luajit-...
[2] http://stackoverflow.com/questions/25924755/c-and-lua-unprot...
A good README should give a "high level" description of what the things is and what its used for.
While I think wrapper generators are a noble idea, I doubt I would choose this over the convenience of IDL-like metaprogramming approaches (e.g. pybind11, or Boost.Python)
What if I also wanted my wrapper available on a non CPython VM? The proper way to use PyPy is not via its fake CPython API support (slow and memory hungry). Imagine if a PyPy Generator were added to CLIF. It'd use the same interface definition Parser but generate an entirely different set of code. In PyPy's case that would probably be a C library wrapping the C++ for generated cffi based Python code to interact with.
Admittedly all hypothetical here until other Python VMs have CLIF Generator implementations.
http://doc.pypy.org/en/latest/cppyy.html
I think the author of Swig shares this sentiment: (http://code.activestate.com/lists/python-dev/109281/).
Typical usage I saw is the following:
C++ has some class(es) that best represented as NumPy arrays.
So because those classes are specific to the project (not generic like std::) they don't fit into CLIF runtime, but the project supply a C++ library with custom conversion functions as described in ext.md that use NumPy C API tells CLIF that those classes are convertible to Python objects (that NumPy objects are).
That all NumPy integration CLIF needs and it has to come from the user.
However: Some libraries/packages do expose a C api, which you could use in a language assuming it had a decent FFI.
A lot of major C++ projects use exceptions, which you will have to catch and handle inside of the C++ because even if C could catch exceptions, there is no guarantee of ABI compatibility. Some languages, I think D can do it, can catch an exception thrown in C++ code.
You might have better luck integrating a standard data interface between the C++ and any other language: However, that could be a little slow.
