The first C "interpreters" I know of were for Lisp machines: Symbolics' C compiler (http://www.bitsavers.org/pdf/symbolics/software/genera_8/Use...) and Scott Burson's (hn user ScottBurson) ZetaC for TI Explorers/LMIs and Symbolics 3600s (now available under the public domain: http://www.bitsavers.org/bits/TI/Explorer/zeta-c/). Neither of them are interpreters, just "interactive" compilers like Lisp ones are.
I am writing a C to Common Lisp translator right now (https://github.com/vsedach/Vacietis). This is surprisingly easy because C is largely a small subset of Common Lisp. Pointers are trivial to implement with closures (Oleg explains how: http://okmij.org/ftp/Scheme/pointer-as-closure.txt but I discovered the technique independently around 2004). The only problem is how to deal with casting arrays of integers (or whatever) to arrays of bytes. But that's a problem for portable C software anyway. I think I'll also need a little source fudging magic for setjmp/longjmp. Otherwise the project is now where you can compile-file/load a C file just like you do a Lisp file by setting the readtable. There's a few things I need to finish with #includes, enums, stdlib and the variable-length struct hack, but that should be done in the next few weeks.
As to how to do this for C++, don't ask me. According to the CERN people, CINT has "slightly less than 400,000 lines of code." (http://root.cern.ch/drupal/content/cint). What a joke.
What I can't wrap my head around is how one would implement pointer-arithmetic with these closures? C pointers are not just references to cells, but those cells are guaranteed to be contiguous (up to a certain limit, be it an VM allocation unit, say "page", or all available system unit in VM-less systems)
That is to say, C pointers are not like ML references. Along with SET and REF they also allow addition, subtraction, scaling, etc.
For closures to model C pointers, wouldn't they need to order the allocation of cells in some manner? say, big array? And if so, this could get expensive very quickly (worst-case being "modeling" of entire memory, i.e. emulation) without certifying compiler or at least some exhaustive pointer analysis.
Hope I'm wrong on this.
(defun allocate-memory (size) ;; shared by malloc and static allocation
(make-memptr :mem (make-array size :adjustable t :initial-element 0)))
(defmacro vacietis.c:mkptr& (place) ;; need to deal w/function pointers
(let ((new-value (gensym)))
`(make-place-ptr :closure (lambda (&optional ,new-value)
(setf ,place ,new-value)
(defun vacietis.c:deref* (ptr)
(memptr (aref (memptr-mem ptr) (memptr-ptr ptr)))
(place-ptr (funcall (place-ptr-closure ptr)))))
(defun (setf vacietis.c:deref*) (new-value ptr)
(memptr (setf (aref (memptr-mem ptr) (memptr-ptr ptr)) new-value))
(plate-ptr (funcall (place-ptr-closure ptr) new-value))))
(defmethod vacietis.c:+ ((x number) (y number))
(+ x y))
(defmethod vacietis.c:+ ((ptr memptr) (x integer))
(make-memptr :mem (memptr-mem ptr) :ptr (+ x (memptr-ptr ptr))))
(defmethod vacietis.c:- ((ptr1 memptr) (ptr2 memptr))
(assert (eq (memptr-mem ptr1) (memptr-mem ptr2)) ()
"Trying to subtract pointers from two different memory segments")
(make-memptr :mem (memptr-mem ptr1) :ptr (- (memptr-ptr ptr1) (memptr-ptr ptr2))))
This leads me to this statement based on what you said: interesting Real World C programs make use of undefined behavior
I assume you've worked with a significant amount of real world C programs? Because they surprisingly often do. The difficulty with porting many programs to 64-bit for instance, is due to relying on implementation defined behavior.
Only embedded systems (AVR & PIC24). I have much more experience in C++, which I've used for both Desktop apps and telco server components.
The beauty of C (over C++) is that the standard is actually readable. C++ especially is a quagmire of undefined behavior. The scary thing about C/C++ is that its easy to hit undefined (or, at least, as you state, implementation defined) behavior and not even realize. Often the code looks valid, does what it looks like it does, yet is actually undefined or implementation defined and will break elsewhere.
With that said, while I don't expect everyone to have memorized the standard, I do hope most would have at least enough familiarity to avoid most cases of undefined behavior.
could you give an example of a case where you hit undefined behavior since I hardly seem to recall a case where that bit me in the past? (I'm mostly working on embedded systems (PPC & ARM))
The cases that come to my mind for C++ all involve initialization...
behaviour, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements 3.
Undefined behaviour may also be expected when this International Standard omits the description of any explicit definition of behavior.
The two most commonly cited piece of undefined behavior is modifying a variable twice in one sequence point. The standard says:
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.
a = b++ * ++b;
For more real world examples of where undefined behavior may bite you in the ass in C++, take a look at Washu's simple C++ quiz. It's only four questions: http://www.scapecode.com/2011/05/a-simple-c-quiz/
Take a moment to answer the questions before looking at the answers.
Once you've done that, here are three more quizzes by the same guy - these ones are about OOP in C++, so may be much more relevant to your question: http://www.scapecode.com/2011/05/c-quiz-2/ and http://www.scapecode.com/2011/05/c-quiz-3/ and http://www.scapecode.com/2011/05/c-quiz-4/
Byte-wise access to objects is legal and completely portable between conforming implementations. What isn't portable are arbitrary type conversions through pointer casts as these violate the effective typing rules. Such casts may break in practice due to mis-alignment or because of aliasing behind the optimizers back.
In a way, C is a strongly typed language - the type system is just really unsound.
That makes absolutely no sense. Just think about endianness for example. Type conversions in C are extremely tricky, and in many cases are not guaranteed to be portable across different compilers even on the same architecture. There is a good explanation of what you can and cannot count on in chapter 6 of Harbison and Steele's C A Reference Manual.
While the values of the bytes are not specified, the ability to get at them is, and a conforming implementation needs to provide this ability.
This is a CERN project and it uses Clang from the LLVM project. The idea is simple: use the clang to generate LLVM and then use LLVM just-in-time compiler.
Creating an entire "project" just to check a code snippet is just silly.
Just create a "testing" directory and throw your one off test files into it and compile/run them there.
I start mine with a comment explaining what I'm testing, why I'm testing it, and what special compilation flags are required, if any. I even have an Emacs macro that fills in the boilerplate includes and main function. The overhead involved is probably less than 15 seconds.
It has the advantages that I can test multiple compilers and I keep a history of the things I've tried.
This is just an updating of what must be an awful hack to be built on LLVM infrastructure.
cp -R c-idea-template foobar;
and in your editor (say vim) just run :make
or write a .sh file with all of the above in it so it's just one step. No complex install procedure, you get all your normal tools and stuff.
Alternaitvely, create a 'test projects' git(hub) project, with the files you want in it, and create a new branch for each idea. That way you get backups as well.
> vim temp.cpp
> g++ temp.cpp
Very similar to my process with Python, actually. Every time I use the REPL for some experimenting, the code ends up outgrowing it and I have to stuff it in a file anyway, so I may as well cut out the middleman to begin with. YMMV.
C++ could also be used as an embedded extension language using something like this.
I think you vastly underestimate the speed of the clang C++ compiler (or any modern C++ compiler at that). I can't imagine that the compilation time for anything that could be classified as an "extension" would be significant in any meaningful way.
If I where to provide extensions points to my C++ project they would certainly contain templates and that would also mean that they would require significant compilation time (compared to what you would expect). Even if I didn't, a single header that pulls in a huge preprocessor library could ruin speed. I'm inclined to believe you if we are talking about a Qt-style C++, but that is only a subset of possible code. I would be happy to be proven wrong, too.
If the alternative is to use a separate language altogether to implement extensions, then you are already limiting yourself to a subset of all possible (in C++) code, so what's wrong with doing that in C++ instead?
That being said, most of your problems can be solved with header optimization, we had gotten a full-rebuild of our multimillion line code-base down to about 2 minutes. So if an extension is only a file or two I'm sure you could manage very reasonable compile times, esp with optimization turned off.
> Even if I didn't, a single header that pulls in a huge preprocessor library could ruin speed.
In the case that you really need that header (If you are using as an extension language, it should be designed so you don't!) precompilation of certain units can greatly improve speed. clang in particular has very impressive precompilation support.
I would very much like to use D as an extension language.
Kind of an interesting (hi-level) tool, too.