
Pybind11 — Seamless operability between C++11 and Python - tony
https://github.com/pybind/pybind11
======
aldanor
Some may find it useful: I've written a Jupyter notebook extension for
pybind11 a while ago, so you can write C++ in the notebook and it gets
automagically compiled and imported into the kernel (plus you get C++ syntax
highlighting and a few other goodies):

[https://github.com/aldanor/ipybind](https://github.com/aldanor/ipybind)

(still not on pypi but the plan is to release it soon)

------
KKKKkkkk1
I've been using boost::python, and the experience is what I'd call cargo-cult
programming. You copy and paste stuff off the tutorial, and when you get a
compiler error, you just try other stuff at random, because the inner workings
of the library are inscrutable. I vastly prefer cython to this. Is Pybind11
any better?

~~~
autopoiesis
Pybind is derived from boost::python, so it's not much different. I wrote a
large Python binding in the past for a moving C++ library target, using
boost::python, and keeping the C++ binding code up to date was a nightmare
akin to maintaining a fork of any fast-moving project.

Try cppyy [1]. It's very nice, though quite fresh. It's used extensively (as
far as I can tell) at CERN, and derives from their cling C++ JIT interpreter.
Plus it does lots of nice automatic transformations to make a C++ API more
pythonic.

[1]
[http://cppyy.readthedocs.io/en/latest/index.html](http://cppyy.readthedocs.io/en/latest/index.html)

~~~
wjakob
Not quite -- while pybind11 was originally inspired by boost::python, it is an
entirely different project that is intended to address many of its problems.

~~~
autopoiesis
Indeed; I was using "derived" somewhat loosely. But I think there is certainly
a visible lineage there. Indeed, pybind11 is much nicer than boost::python
(and I did evaluate it for the future of the project I mentioned above).

However, the very nice feature of cppyy is that it does much (most?) of what
pybind11 does, but it can also be completely on-the-fly, in the sense that it
relies on the cling JIT interpreter. This means that there is absolutely no
need to maintain a compiled C++ part for your bindings, and so the problem of
keeping the interface up to date is greatly mitigated: the equivalent changes
to match the interface when using cppyy are _much_ smaller.

Often, one maintains a "C++ interface" layer in one's Python bindings (which
could be created by pybind11, boost::python, or SWIG, for example), with a
pure Python shim layer on top of that. cppyy allows you to do away with this
two-layer structure entirely; all you need is the shim, if you need anything
at all.

~~~
aldanor
The catch with cling and the derived libraries is in that you have to download
a whole bunch [1] of CERN stuff and then build a customized LLVM as part of
the build process. That's a bit too heavy for the nice reflection- and REPL-
like features that you gain.

    
    
        [1] https://github.com/antocuni/cppyy-backend/blob/master/create_src_directory.py
    

Boost.Python is better in that regard since you "just" have to build boost; on
some platforms, you can just snatch that via a package manager; that being
said, you still need to build it. SWIG, aside from being ugly, requires an
extra build step.

> This means that there is absolutely no need to maintain a compiled C++ part
> for your bindings, and so the problem of keeping the interface up to date is
> greatly mitigated

I tend to disagree. I would never consider the raw (swig or cling) 1-to-1
bindings of C++ code satisfactory for end-user use in Python. Ideally (in my
subjective opinion and previous experience) Python-side bindings would closely
mirror the C++ API, to the point where downstream code in either language
looks very similar, but they don't reference any C++ stuff, be it vectors, or
maps, or template arguments, or anything else. This implies you would have to
maintain a set of higher level bindings on top of swig/cling ones anyway --
and _these_ are the ones that'll break as the code evolves and that you'll
have to maintain manually. As such, I'd rather maintain one set of bindings
than two.

~~~
autopoiesis
You're right about the CERN stuff, though recent efforts seem to have been
made to split at least some of that out. I hope that it continues. I think,
aside from anything else, that cling is a really cool project, and if it could
be easily available more widely, that would be great.

I was being quite literal when I wrote "no need to maintain a compiled C++
part": of course, you probably do want to maintain /some/ extra layer! And, in
that sense, I do think that cppyy lets you maintain just one set of bindings
(not two); and, as in pybind11, the ultimate aim is to transparently translate
any "vectors, or maps, or template arguments" into idiomatic Python: this is
why cppyy has a 'Pythonization' API.

Perhaps there are just two slightly different niches: cppyy is good when you
need more a interactive interface to C++, for prototyping or exploration (for
example), because of its JIT nature; and pybind11 is good for building
something more static in the longer term, and where you don't mind the cost of
keeping the compiled part up to date with the relevant C++ API.

It's certainly an interesting space at the moment, and I do hope both projects
keep the momentum up and keep innovating!

~~~
wlav
I'm the author of cppyy and was just made aware of this thread.

The big dependency is LLVM, not any more CERN code (there's some left, but it
nowhere near takes up the disk space or compilation time than the patched
version of LLVM does). The CERN code exist b/c LLVM APIs are all lookup based.
The bit of leftover code merely turns that into enumerable data structures.
Once pre-compiled modules can be deployed, everything can be lookup. That will
greatly reduce the memory footprint, too, by making everyhing lazy.

It is hard to trim down the source of LLVM, but trimming the binary is easier
to achieve and that's what I'm working on right now. The end result should be
a binary wheel less than 50MB that is usable across all python interpreters on
your system, and would be updated something like twice a year. Since that gets
it down to a level where even an average phone won't blink, pushing it beyond
that leads to vastly diminishing levels of return, and I'll leave it at that
unless a compelling use case comes along.

That said, there is an alternative: on the developer side, you can use cppyy
to generate code for CFFI
([http://cffi.readthedocs.io/en/latest/](http://cffi.readthedocs.io/en/latest/)).
The upshot is that LLVM only has to live on the developer machine, but would
not be part of any deployed package. Of course, w/o LLVM, you have to do
without such goodies as automatic template instantion.

Finally, note that cppyy was never designed with the same use case as e.g.
pybind11 in mind. Tools like that (and SWIG, etc.) are for developers who want
to provide python bindings to their C++ package. The original idea behind
cppyy (going back to 2001), was to allow python programmers that live in a C++
world access to those C++ packages, without having to touch C++ directly (or
wait for the C++ developers to come around and provide bindings). Hence the
emphasis on 100% automation (with the limitations that come with that). The
reflection technology already existed for I/O, and by piggy-backing on top of
it, at the very least a python programmer could access all experimental data
and the most important framework and analysis classes out-of-the-box, with
zero effort.

------
ivan_ah
other projects in that space are: cython (need to manually do wrapping),
xdress, SWIG, SIP, clif.

Does anyone have experience with using these and able to compare/contrast?

I'm afraid of C++, but if I can wrap it and learn how to use from a python
REPL, I think I can handle... any recommendations/tuotirals/howtos would be
much appreciated. (The library I want to wrap is
[https://github.com/openzim/libzim](https://github.com/openzim/libzim) )

~~~
ThePhysicist
I built bindings for various C++ libraries in SWIG (not only for Python but
also Lua and Ruby) and I was always very impressed with the result. What's
amazing about SWIG is its ability to easily port very complex and rich code. I
successfully exported very complex class hierarchies and code relying heavily
on pointers and STL to Python and/or Lua. The process consists of writing
several header-like configuration files that SWIG ingests and uses to generate
bindings, which is often pretty straightforward. Often you can simply import
the normal C/C++ header files in those configs and add some glue code / extra
hints to help SWIG in cases where it can't figure out what to do with a
particular type.

In my experience, the most difficult aspect of binding generation is not
writing the glue code, but doing so in a scalable way. Manually writing
bindings for any real-world C++ codebase would therefore be extremely tedious
(IMHO), so having an automated system that does this work for you is a huge
time saver.

Some larger libraries / frameworks have their own wrapping generators Btw,
PyQt (which provides extremely good bindings to QT) for example has SIP which
is worth looking at (it's open source).

~~~
brational
Have you used pybind? I've used SWIG extensively so I'm familiar with that -
you talked about SWIG but not how it compares to this new one.

Reading through some docs, it looks like pybind suits better for a python
codebase that occasionally needs C++ pieces added in as you have to setup the
C++ code to handle the py needs.

SWIG I was always impressed with exactly your point. You could sometimes just
make 1 big header file in a facade pattern to expose some "start" buttons
needed to run a huge c++ application and nothing else needed to be setup. No
going back and forth.

Example usage here was a single heavy duty image processing application but
using python has the distributor and work manager. C++ code was a "worker" and
python and zeroMQ did everything else. So the only "connection" we needed was
a start button and some metadata to the c++. Maybe 1 header file with 100-200
lines of code to make the bridge for SWIG.

(to anyone else reading) does that sound trivially simple to the do the same
with pybind?

~~~
ThePhysicist
I haven't used pybind yet, so I can't compare the two unfortunately. From how
I understand the docs it seems you would have to write the interface
definitions in C++ and the compiler would generate the bindings for you, which
sounds like a nice approach that could be a bit more cumbersome though as
(again, in my understanding) forces you to write and adapt all bindings by
hand.

~~~
brational
>could be a bit more cumbersome though as (again, in my understanding) forces
you to write and adapt all bindings by hand.

That was my interpretation as well.

------
nmalaguti
At what point should I start thinking about rewriting computationally
expensive parts of my Python application in C++ and making bindings?

Does anyone have experience taking one expensive method and replacing it with
C/C++? Were the trade offs worth it?

~~~
sametmax
First, try to optimize your Python. It's unexpected what you can do with it.
E.G: slicing assignment is crazy fast, removing calls and lookups goes a long
way, using built-ins + generators + @lru_cache + slicing wins a lot, etc. Also
Python 3.6 is faster, so upgrading is nice.

Then, you try pypy. It may very well run your code 10 times faster with no
work on your part.

If you can't or the result is not as good as expected, you can start using
EXISTING compiled extensions. numpy, uvloop, ujson, etc.

After that, and only now, should you think about a rewrite. Numba for number
related code, cython for classic hot paths or nuikta for the entire app: they
can turn Python code into compiled code, and will prevent a rewrite in a low
level tech.

If all of that failed, congratulation, you are part of the 0.00001%.

So rewriting bottlenecks in a compiled language is a good option. C or C++
will do. But remember you can do it in Rust too!

~~~
yig
I love Python, but this is one of the pain points. Working through a sequence
of domain specific languages when you could have just written it in a fast one
to begin with (e.g. Julia or C++).

~~~
sametmax
You usually don't, that's the point.

You may, late in the project, write partially some of your code in a DSL.

While with C++, you start from the beginning with a handicap for the whole
project.

~~~
sqeaky
The reason so few projects are rewritten in C/C++ is that many people know up
front that their project will require that performance and just start there.

If you are building a high end 3d video game with anything like current fancy
graphics no amount of python or ruby is going to make it work. You must start
with C or C++ to make effective use of modern hardware (even using the C#
unity provides leaves a lot of performance on the table).

If you are building a system designed to be faster that some other well
defined system then starting with C or C++ is a good idea. If your Java or C#
system could handle 1 million transactions a second you might be able to
complete 1.5 million/s with C++.

Some projects never need that level of performance, building those projects on
C++ can cost you some time. Most webpages are in that vein, how many hits a
day does a typical website get? only a few of the biggest retailers and search
engine need that level of performance.

That time cost is also shrinking, but not shrinking as fast as I would like.
C++11, 14 and 17 it took chunks off development time by polishing some of the
sharp corners of the language. Memory leaks are harder to make. Threads and
time are easier to work with. Error message are better than ever.

There is still progress to make. Every C++ project still needs some time
dedicated to configuring the build system. There needs to be some plan for
checking for memory issues, there needs to be... I think C++ will continue to
get more Rust-like and Rust will continue to grow in popularity and
performance. Eventually, I think Rust or something like it will be the
preferred high performance language.

------
teajunky
Has anyone looked at
[https://pypi.python.org/pypi/cppyy](https://pypi.python.org/pypi/cppyy) ?
This seems to be even easier than pybind11

------
fulafel
I like how the first basic example invokes undefined behaviour without
mention:

    
    
      int add(int i, int j) {
          return i + j;
      }

~~~
devrandomguy
Looks fine to me. Are you supposed to check for integer overflow before doing
the addition, or something?

~~~
pjmlp
Yep,

    
    
        add(INT64_MAX, 1)
    

Is undefined.

If using Ada, similar kind of code would produce "raised CONSTRAINT_ERROR :
program.adb:7 overflow check failed"

To be fair, Ada's behavior is easy to reproduce with a checked integer class
like
[https://accu.org/index.php/journals/324](https://accu.org/index.php/journals/324)

~~~
coldtea
So?

There millions of lines of code, from the best C/C++ programmers, even the
kernel, that add two ints in all kinds of programs.

Why is this suddenly a valid concern, especially for a code example, not
NASA's missile code or Tesla's self-driving libs?

~~~
pjmlp
It is not a sudden concern, it has been ignored since C and C++ exist, and
only became worse with code exposed to the world via the Internet.

"Many years later we asked our customers whether they wished us to provide an
option to switch off these checks in the interests of efficiency on production
runs. Unanimously, they urged us not to--they already knew how frequently
subscript errors occur on production runs where failure to detect them could
be disastrous. I note with fear and horror that even in 1980, language
designers and users have not learned this lesson. In any respectable branch of
engineering, failure to observe such elementary precautions would have long
been against the law."

\-- Tony Hoare, "The 1980 ACM Turing Award Lecture"

Millions of people also drive without security belt or helmet, apparently they
are an useless extra.

