
Cling: Running C++ in an interpreter - coldgrnd
http://blog.coldflake.com/posts/2012-08-09-On-the-fly-C%2B%2B.html
======
sedachv
To provide more precedents and a little history:

The first C "interpreters" I know of were for Lisp machines: Symbolics' C
compiler
([http://www.bitsavers.org/pdf/symbolics/software/genera_8/Use...](http://www.bitsavers.org/pdf/symbolics/software/genera_8/User_s_Guide_to_Symbolics_C.pdf))
and Scott Burson's (hn user ScottBurson) ZetaC for TI Explorers/LMIs and
Symbolics 3600s (now available under the public domain:
<http://www.bitsavers.org/bits/TI/Explorer/zeta-c/>). Neither of them are
interpreters, just "interactive" compilers like Lisp ones are.

I am writing a C to Common Lisp translator right now
(<https://github.com/vsedach/Vacietis>). This is surprisingly easy because C
is largely a small subset of Common Lisp. Pointers are trivial to implement
with closures (Oleg explains how: <http://okmij.org/ftp/Scheme/pointer-as-
closure.txt> but I discovered the technique independently around 2004). The
only problem is how to deal with casting arrays of integers (or whatever) to
arrays of bytes. But that's a problem for portable C software anyway. I think
I'll also need a little source fudging magic for setjmp/longjmp. Otherwise the
project is now where you can compile-file/load a C file just like you do a
Lisp file by setting the readtable. There's a few things I need to finish with
#includes, enums, stdlib and the variable-length struct hack, but that should
be done in the next few weeks.

This should also extend to "compiling" C to other languages like JavaScript,
without having to go through the whole "emulate LLVM or MIPS" garbage that
other projects like that do. I think I figured out how to do gotos in
JavaScript by using a trampoline with local CPS-rewriting, which is IMO the
largest challenge for an interoperable C->JS translator.

As to how to do this for C++, don't ask me. According to the CERN people, CINT
has "slightly less than 400,000 lines of code."
(<http://root.cern.ch/drupal/content/cint>). What a joke.

~~~
mahmud
That Oleg link has me nerd-snipped.

What I can't wrap my head around is how one would implement pointer-arithmetic
with these closures? C pointers are not just references to cells, but those
cells are guaranteed to be contiguous (up to a certain limit, be it an VM
allocation unit, say "page", or all available system unit in VM-less systems)

That is to say, C pointers are not like ML references. Along with SET and REF
they also allow addition, subtraction, scaling, etc.

For closures to model C pointers, wouldn't they need to _order_ the allocation
of cells in some manner? say, big array? And if so, this could get expensive
very quickly (worst-case being "modeling" of entire memory, i.e. emulation)
without certifying compiler or at least some exhaustive pointer analysis.

Hope I'm wrong on this.

~~~
cwzwarich
The C standard basically only guarantees that pointer arithmetic works when
the pointers involved all point to the same array object (it also allows for a
pointer just off the end of an array object). Other pointer arithmetic or
comparison is undefined behavior and an implementation can do whatever it
pleases.

~~~
mahmud
So I implemented this stricter definition of C pointers and it's neither
interesting, nor representative of Real World uses that I know of. Need to
investigate a bit more.

~~~
dkersten
_stricter definition of C pointers_ \- this made me laugh, as the less-strict
definition of C pointers is called _undefined behavior_.

This leads me to this statement based on what you said: _interesting Real
World C programs make use of undefined behavior_

~~~
Arelius
> interesting Real World C programs make use of undefined behavior

I assume you've worked with a significant amount of real world C programs?
Because they surprisingly often do. The difficulty with porting many programs
to 64-bit for instance, is due to relying on implementation defined behavior.

~~~
dkersten
_I assume you've worked with a significant amount of real world C programs?_

Only embedded systems (AVR & PIC24). I have much more experience in C++, which
I've used for both Desktop apps and telco server components.

The beauty of C (over C++) is that the standard is actually readable. C++
especially is a quagmire of undefined behavior. The scary thing about C/C++ is
that its easy to hit undefined (or, at least, as you state, implementation
defined) behavior and not even realize. Often the code looks valid, does what
it looks like it does, yet is actually undefined or implementation defined and
will break elsewhere.

With that said, while I don't expect everyone to have memorized the standard,
I do hope most would have at least enough familiarity to avoid _most_ cases of
undefined behavior.

~~~
coldgrnd
Most of what I learned about C++ I got from books, blogs and some great C++
guys (mostly from the boost community). I have to admit I did not read the
standard at all.

could you give an example of a case where you hit undefined behavior since I
hardly seem to recall a case where that bit me in the past? (I'm mostly
working on embedded systems (PPC & ARM))

The cases that come to my mind for C++ all involve initialization...

~~~
dkersten
Before I give you an example, I will define what the term _undefined behavior_
means by quoting the standard - Section §1.3.12 (of the C standard, not the
C++ standard which is horribly ginormous and hard to read):

    
    
        behaviour, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements 3.
    
        Undefined behaviour may also be expected when this International Standard omits the description of any explicit definition of behavior.
    

The reason undefined behavior is dangerous is that the standard does not
guarantee any particular behavior and the implementation is free to do
_whatever it wants_ \- ignore it, give an error message, delete everything on
your hard drive.. whatever.

The two most commonly cited piece of undefined behavior is modifying a
variable twice in one sequence point. The standard says:

    
    
        Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.
    

This code snippet invokes undefined behavior:

    
    
        a = b++ * ++b;
    

because _b_ is modified twice within one sequence point. More information here
[http://stackoverflow.com/questions/4176328/undefined-
behavio...](http://stackoverflow.com/questions/4176328/undefined-behavior-and-
sequence-points)

\-----

For more real world examples of where undefined behavior may bite you in the
ass in C++, take a look at Washu's _simple_ C++ quiz. It's only four
questions: <http://www.scapecode.com/2011/05/a-simple-c-quiz/>

Take a moment to answer the questions before looking at the answers.

Once you've done that, here are three more quizzes by the same guy - these
ones are about OOP in C++, so may be much more relevant to your question:
<http://www.scapecode.com/2011/05/c-quiz-2/> and
<http://www.scapecode.com/2011/05/c-quiz-3/> and
<http://www.scapecode.com/2011/05/c-quiz-4/>

------
joebo
I've used libtcc from tcc to do something similar on a prototype listening on
a socket to do queries over a 2 gig memory mapped file. I'd pass over the
query as a string of C that would be dynamically compiled and executed by
libtcc. It worked really well but ultimately didn't go anywhere other than
research. Here's an example from the distribution (first google result for the
file):
[http://www.koders.com/c/fidC76C8B834DFF05F1D0BD61220AC19E246...](http://www.koders.com/c/fidC76C8B834DFF05F1D0BD61220AC19E24685698A9.aspx).
TCC can be found here: <http://bellard.org/tcc/>

~~~
sedachv
I am surprised more projects don't use TCC for that. It's an awesome little
compiler and using it that way saves way more time over writing DSLs. Even
projects that have to compile C all the time (like Lisp->C compilers such as
ECL and Gambit Scheme) use GCC, which is stupid slow.

------
codedivine
Well, for anyone interested in Cling, check out this Google tech talk:
<http://www.youtube.com/watch?v=f9Xfh8pv3Fs>

This is a CERN project and it uses Clang from the LLVM project. The idea is
simple: use the clang to generate LLVM and then use LLVM just-in-time
compiler.

~~~
malkia
Before that CINT was used (another C++ interpretter) - here is the page where
they announce the future transition CINT -> CLING -
<http://root.cern.ch/drupal/content/cling>

------
jlarocco
It's a cool idea, but his reason for creating it is kinda dumb.

Creating an entire "project" just to check a code snippet is just silly.

Just create a "testing" directory and throw your one off test files into it
and compile/run them there.

I start mine with a comment explaining what I'm testing, why I'm testing it,
and what special compilation flags are required, if any. I even have an Emacs
macro that fills in the boilerplate includes and main function. The overhead
involved is probably less than 15 seconds.

It has the advantages that I can test multiple compilers and I keep a history
of the things I've tried.

~~~
geofft
The ROOT project has been using CINT for ages. Talk to your favorite physicist
friend -- they'll be confused why the CS crowd _doesn't_ have this technology.

This is just an updating of what must be an awful hack to be built on LLVM
infrastructure.

~~~
Create
Because actually using ROOT is painful, and the CS crowd had lisp and now
python/pypy.

<http://www.insectnation.org/howto/living-without-root>

<https://linuxfr.org/nodes/18919/comments/632920>

<http://comments.gmane.org/gmane.comp.lang.c%2B%2B.root/5924>

------
Mon_Ouie
Although getting an interactive environment definitely is a nice thing, I'd
argue you don't have to create a whole project directory, etc. to play with an
idea; I usually write it in a single file.

------
deckiedan
Could you not just use a project directory template? All you need really is a
.c(xx|pp|whatever) file, a Makefile, and then the workflow for a new idea is:

cd ~/src;

cp -R c-idea-template foobar;

cd foobar;

$EDITOR test.c*

and in your editor (say vim) just run :make

or write a .sh file with all of the above in it so it's just one step. No
complex install procedure, you get all your normal tools and stuff.

Alternaitvely, create a 'test projects' git(hub) project, with the files you
want in it, and create a new branch for each idea. That way you get backups as
well.

~~~
coldgrnd
@fferen, @jlarocco, @Mon_Ouie, @deckiedan: I know you can probably set up all
you need within the blink of a second (using emacs or vim or a template
setup). Actually I'm so hooked on this idea I even wrote my own rake based
C/C++ Buildsystem (<https://github.com/marcmo/cxxproject>). But still
sometimes I prefer not having to set up anything and not having to clean up
anything after I end my experiments. ... Idea - drop into REPL and try out -
exit - done.

------
mgurlitz
To be clear, those empty #include's are typos, not Cling inferring desired
header files. Both are:

    
    
        #include <iostream>

~~~
coldgrnd
Uuhhh...damn! I messed that up. corrected now! thanks!

------
fferen
I just keep a fixed .cpp file with a bunch of common includes and directives,
and alias the g++ command to link to common libraries, so my workflow goes:

> vim temp.cpp

> g++ temp.cpp

> ./a.out

Very similar to my process with Python, actually. Every time I use the REPL
for some experimenting, the code ends up outgrowing it and I have to stuff it
in a file anyway, so I may as well cut out the middleman to begin with. YMMV.

~~~
sanxiyn
I too put the code in a file even in Python, but REPL is still valuable. I use
"python -i" which runs your code (defines functions, classes, etc.) and then
stop, and put you in that environment. This is very useful. It seems Cling
could be used the same way: .x command.

~~~
voltagex_
I must be reading the wrong Python tutorials. Thank you for this

------
freepipi
I think it is very useful if you want to know more about c++,especially some
feature you 're not very sure about. interactive interpret make it very
intuitive, and it will save you time,because you don't need to compile the
code. but one limit is that now it doesn't support template , you still have
to write a source file if you want to use template.

------
tree_of_item
The LLVM infrastructure gets more amazing every day; Emscripten and Cling are
very exciting projects. C++ gets a lot of flak but it's still got a huge
amount of life in it.

------
solenskiner
Could this be used to make a c/c++ plugin for lightroom, once it is out and
stable?

------
shadyabhi
So, what are it's limitations, if any?

~~~
coldgrnd
I'd say error recovery could be better. You often get kicked out of the
session and than your environment is lost. Also: they claim the 'auto' keyword
is implicit, does not seem to work for me. But it's pretty cool to use!

------
zenogaisis
Why would I do that :S

~~~
jeremiep
Why not? It's great to quickly test C++, I have yet to find a developer who
doesn't love a REPL for one.

C++ could also be used as an embedded extension language using something like
this.

~~~
pmr_
I share your enthusiasm but there are limits. C++ compilation speed (and thus
interpretation speed) will make it quite impossible to use it as an extension
language. And while I like and use C++ a lot I don't think it is the right
language for this purpose. It can do quite a lot of things, but working in a
homogeneous language in your extensions as well as your core product doesn't
have enough benefits compared to using a systems programming language and an
extension language.

~~~
Arelius
> C++ compilation speed (and thus interpretation speed) will make it quite
> impossible to use it as an extension language.

I think you vastly underestimate the speed of the clang C++ compiler (or any
modern C++ compiler at that). I can't imagine that the compilation time for
anything that could be classified as an "extension" would be significant in
any meaningful way.

~~~
pmr_
clang doesn't outperform gcc on most projects I work on or that speed increase
is not significant. I run quite a lot of test-suites - maybe I should try to
measure them and provide some real numbers for reference instead of hand-wave
myself through this argument.

If I where to provide extensions points to my C++ project they would certainly
contain templates and that would also mean that they would require significant
compilation time (compared to what you would expect). Even if I didn't, a
single header that pulls in a huge preprocessor library could ruin speed. I'm
inclined to believe you if we are talking about a Qt-style C++, but that is
only a subset of possible code. I would be happy to be proven wrong, too.

~~~
Arelius
> I'm inclined to believe you if we are talking about a Qt-style C++, but that
> is only a subset of possible code. I would be happy to be proven wrong, too.

If the alternative is to use a separate language altogether to implement
extensions, then you are already limiting yourself to a subset of all possible
(in C++) code, so what's wrong with doing that in C++ instead?

That being said, most of your problems can be solved with header optimization,
we had gotten a full-rebuild of our multimillion line code-base down to about
2 minutes. So if an extension is only a file or two I'm sure you could manage
very reasonable compile times, esp with optimization turned off.

> Even if I didn't, a single header that pulls in a huge preprocessor library
> could ruin speed.

In the case that you really need that header (If you are using as an extension
language, it should be designed so you don't!) precompilation of certain units
can greatly improve speed. clang in particular has very impressive
precompilation support.

------
mlvljr
Has anyone mentioned CH already (<http://www.softintegration.com/>)?

Kind of an interesting (hi-level) tool, too.

------
clobber
For small programs, something like CodeRunner is nice too:
<http://krillapps.com/coderunner/>

