
PicoC: A very small C interpreter - adamnemecek
https://github.com/zsaleeba/picoc
======
zik
Hi, author here. When I started writing PicoC it was mostly because I was
thinking about how small AppleSoft BASIC was back in the day and I was curious
how small you could make a C implementation. I also had in mind to use it for
robotics/drone scripting on STM32 processors which have about 64KB of RAM.

PicoC runs ok in 64KB although it is a bit cramped. I like that you can write
scripts in C on the actual device without needing a host computer of any kind.
It's also been fairly popular for embedding as a scripting language in desktop
applications, mostly because it's small and easy to integrate. It's really
designed for scripting so don't expect it to be fast though.

~~~
endergen
Hey zik, has anyone tried getting it running through emscripten? It would be
very cool to have interactive C purely client side in browser. Could make some
cool c based jsbin like hosting websites from that.

~~~
Tloewald
Given it's written in C, it seems like a direct port to Javascript might be a
much better option.

~~~
vardump
Any particular reason for that?

~~~
Tloewald
A lot of c will translate very simply to JavaScript and any parsing or string
manipulation will be insanely easier.

~~~
vardump
How can parsing and string manipulation be easier than just compiling C to
Javascript using Emscripten? The interpreter already works in C. Wouldn't
translating it manually to Javascript be extra effort?

~~~
Tloewald
Parsing and string manipulation are (hugely) easier in Javascript than C.
Compiling the interpreter to Javascript via Emscripten is certainly easier
than translating to Javascript -- but porting C to Javascript should still be
very easy.

------
andrewchambers
Shameless self promotion of my work in progress C compiler
[https://github.com/andrewchambers/cc](https://github.com/andrewchambers/cc)
which is attempting to create a modern cross platform C compiler.

One of my goals is to make the entire toolchain rapid to port and hack on. I
am pretty sick of gcc and llvm taking 20 mins just to build from source.

I would love to find serious collaborators.

~~~
beagle3
I am all in favor of hacking for studying and fun. But for practical reasons -
are you familiar with Bellard's tcc?

~~~
rswier
The last time I checked, tcc lacked even a simple AST. This led to some pretty
weird emitted code (such as swapping parameters on the stack.) Implementing an
AST is not hard, just push and pop nodes on a stack. It also makes a nice
front-end/back-end interface.

~~~
andrewchambers
Eliminating memory allocations made tcc extremely fast, They use a value stack
rather than an AST. I just think an AST is a bit easier to follow because it
means the parser has less code generation logic embedded in it.

~~~
rswier
No malloc/free necessary. Just pile up nodes on a stack, then "deallocate" to
any saved position.

------
gbl08ma
I have used this to bring a form of scripting to the graphic calculator I used
and hacked on during my high school years. I think this really shines on
embedded systems: it's quite easy to map existing syscalls and functions to
PicoC functions with minimal overhead, more complex things like struct passing
are supported, and memory addresses can be accessed directly just like with
"real" compiled C (something that has upsides and downsides, but I like this
kind of freedom a lot, and as far I know, other scripting languages like Lua
don't support it).

Unfortunately, function pointers aren't supported, and it's 10x slower than
equivalent compiled C, if not more, and perhaps slower than Lua (and we aren't
even talking about LuaJIT). There also appear to be some issues yet to be
dealt with, as can be seen on the previous project page at Google Code:
[https://code.google.com/p/picoc/issues/list](https://code.google.com/p/picoc/issues/list)

~~~
vardump
> more complex things like struct passing are supported, and memory addresses
> can be accessed directly just like with "real" compiled C (something that
> has upsides and downsides, but I like this kind of freedom a lot, and as far
> I know, other scripting languages like Lua don't support it).

Not Lua itself, but at least Luajit's FFI does:
[http://luajit.org/ext_ffi.html](http://luajit.org/ext_ffi.html)

------
to3m
Obligatory nitpick: C libraries should be careful to prefix all exposed
identifiers, to minimize the risk of collision. But when you include picoc.h,
which is careful to do this, you also get interpreter.h, which... is not.

~~~
to3m
While I'm on the subject the same also goes for header include guards - when
you get a conflict, it's actually quite annoying to track down. (Not least
because it's such a rare occurrence that you probably won't expect it and will
likely end up on a wild goose chase at some inconvenient moment.)

I stopped using the file name at all in my include guards a few years ago, and
use a GUID instead. For example:

    
    
        #ifndef HEADER_6AFF21D71B5B43DEB079AA612E4118B4
        #define HEADER_6AFF21D71B5B43DEB079AA612E4118B4
    
        #endif//HEADER_6AFF21D71B5B43DEB079AA612E4118B4
    

Also consider the use of #pragma once - though as far as I can tell, this
(still) isn't ISO, so I've decided to avoid it.

~~~
JoshTriplett
> Also consider the use of #pragma once - though as far as I can tell, this
> (still) isn't ISO, so I've decided to avoid it.

Depends on your target platform. GCC, clang/LLVM, Visual C++, and many
proprietary compilers all support it. What platform are you targeting that
doesn't support it?

Because if you're writing any non-trivial useful code, it's highly likely that
your code is not pure ISO C; you're using some library with support for
various platforms, or a system call interface, or some other interface to a
real system. Once you do that, universal portability no longer applies, so you
might as well think about which specific target platforms you care about.

~~~
to3m
Good question - probably the main reason (and you don't have to agree that
it's a good one) is that I'd never have to think about it again ;)

------
PhantomGremlin
When I read "very small C" I'm reminded of the BD Software C compiler I used
about 35 years ago for the 8080/Z80 and CP/M.

It was relatively complete K&R C except for no floating point support. It
comfortably ran in under 64K bytes of memory. That's 64 kilobytes, or as they
now say kibibytes!

It was quite fast to compile. It was also quite fast to run, since it
generated true object code, no run-time interpreter needed.

It's now open source and public domain.
[http://www.bdsoft.com/resources/bdsc.html](http://www.bdsoft.com/resources/bdsc.html)

~~~
wallaceowen21
As I recall BDS stood for "Brain Damaged Software", a joke made by Leor
Zolman, the author. Leor had not taken a compiler construction class, so it's
not a recursive descent parser and can be confused by overly complex
expressions. It also wrote the generated code into the memory that held the
source, expecting that the generated code took fewer bytes than the source it
was replacing.

------
tptacek
c4 remains the master class in minimal interpreted C implementations:

[https://github.com/rswier/c4](https://github.com/rswier/c4)

It's (of course) less complete than PicoC, but it might be the single most
useful introduction to compiler construction I've read.

~~~
aidos
Rather enjoyed flicking through that. I wish there was more of an overview for
those of us who haven't thought about this stuff for a few years.

~~~
tptacek
I recommend: just read and reread, commenting it as you go, until it all makes
sense.

It's written in a very limited dialect of C --- most notably, it doesn't use
structs --- because it compiles itself. The expression parser in particular
would be clearer with structs rather than array offsets. But once you realize
that's why it's so gnarly in places, it's straightforward to mentally
translate.

It's a very simple design: a simple lexer with a "pull" API (next()) feeds a
precedence-climbing expression parser that spits out bytecode for a simple
stack machine.

If you grok expr(), you grok the whole thing.

~~~
aidos
Alight then, I will - though I may have to come back to you if I get stuck :)
(I haven't written anything in C for about 15 years)

------
sairahul82
Some time back i used picoc to write c programming visualizer. Such a tool is
possible only with picoc.

[http://dev.pointers.io/#filename=test4.c](http://dev.pointers.io/#filename=test4.c)

~~~
swah
This is kinda great; I've thought about having something like it for "running"
(simulating) embedded code on a PC, where the compiler just ignores (or asks
the user for a value) when hitting anything that isn't available on the host
pc (registers, etc).

------
tdoggette
Do people who use this call it "peacock" or "pico see"?

~~~
kjak
When I first saw it I thought "pico see".

The "peacock" interpretation is funny and went unnoticed by me. Cheers!

------
kazinator
More than 20 years ago, I used a C interpreter called EiC.

Hmm, someone tried to resurrect it on SourceForge a couple of years ago, it
seems, but it's dead again.

[http://sourceforge.net/projects/eic/](http://sourceforge.net/projects/eic/)

Someone threw it on GitHub:

[https://github.com/kungfooman/EiC-C-
Interpreter](https://github.com/kungfooman/EiC-C-Interpreter)

PDF of doc [2009]:

[http://www.mirrorservice.org/sites/downloads.sourceforge.net...](http://www.mirrorservice.org/sites/downloads.sourceforge.net/e/ei/eic/eic/doc/EiC.pdf)

------
fapjacks
Ah! This is very cool! Strangely enough, I was looking for something exactly
like this and staring down having to write my own. Perfect!

------
erobbins
Nice!

No pointers to functions, I see.. not nitpicking, just playing :) I like what
you've done here. I'm going to have to go through the code and hopefully learn
something new.

------
ternaryoperator
Can it interpret itself?

~~~
zik
Unfortuately not - it uses a few C tricks which its implementation doesn't
support. You could probably make it self-hosting with a little effort but I
can't even imagine how slow the interpreter-within-an-interpreter would be!

