
Ask HN: What languages used to write computer languages? - gvb
In the discusson on the Anic language http://news.ycombinator.com/item?id=1042122, drhowarddrfine asks "Is every language claiming to be faster than C now?"<p>The trivial answer is "Yes", but that answer raises two non-trivial and more interesting questions:<p>1) If the language is faster than C, why cross compile the language in C rather than natively compiling it (the acid test of a C compiler is if it can compile itself).<p>2) What computer languages <i>are</i> used to write computer languages?<p>Please expand and correct the list...<p>* C/C++: C<p>* Python: C, but also Java and C#<p>* Ruby: C, but also Java<p>* Perl: C<p>* Ada: C (gnat), it seems like there are Ada-based Ada compilers, but I am not sure<p>* Forth: essential words in assembly (sometimes C) and the bulk of it in Forth<p>* APL: C (?)<p>* Smalltalk: ?<p>* Fortran: Originally assembly?  Fortran?  C for GCC-based GNU Fortran
======
fadmmatt
For implementing languages, I highly recommend Scheme, Haskell and Scala.

These languages are hell-on-wheels for tearing apart and transforming syntax
trees.

I teach a compilers class, and I encourage my students to use a mixture of
Scala and Scheme. If you're thinking about implementing a language, you might
want to look at some of the blog posts I wrote for my students:

* A Scheme interpreter in Scala: [http://matt.might.net/articles/denotational-interpreter-for-...](http://matt.might.net/articles/denotational-interpreter-for-lisp-and-scheme-like-lambda-calculus-based-language-lambdo/)

* A meta-circular Scheme interpreter: [http://matt.might.net/articles/metacircular-evaluation-and-f...](http://matt.might.net/articles/metacircular-evaluation-and-first-class-run-time-macros/)

* Compiling Scheme to C: <http://matt.might.net/articles/compiling-scheme-to-c/>

* Compiling Scheme to Java: <http://matt.might.net/articles/compiling-to-java/>

* Architectures for interpreters: [http://matt.might.net/articles/writing-an-interpreter-substi...](http://matt.might.net/articles/writing-an-interpreter-substitution-denotational-big-step-small-step/)

~~~
eru
* Write Yourself a Scheme in 48 Hours (using Haskell): [http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_H...](http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours)

------
tsally
Can't believe OMeta hasn't been mentioned yet: <http://tinlizzie.org/ometa/>.
It's definitely the wrong choice for performance, but if you are experimenting
with language implementation I don't think you can do much better.

 _OMeta's general-purpose pattern matching facilities provide a natural and
convenient way for programmers to implement tokenizers, parsers, visitors, and
tree transformers, all of which can be extended in interesting ways using
familiar object-oriented mechanisms._

Here's a document detailing an OMeta program that translates textual
representations of abstract syntax trees into assembly for the Intel 386:
<http://www.vpri.org/pdf/m2009011_chns_mng.pdf>. In plain language, you've got
is a mini Lisp like language that you can extend in an object orientated way.

------
jcdreads
Here is a great description of how Alan Kay and friends wrote the Squeak
(Smalltalk) VM using another (Apple) Smalltalk:

<http://ftp.squeak.org/docs/OOPSLA.Squeak.html>

Also consider that a bunch of languages (most?) consist of a kernel (or other
VM) written in something native (like C) surrounded by a bunch of libraries
written in the language itself. Off the top of my head: python, java, clojure,
most schemes, etc. are like this.

For ridiculous exercises in bootstrapping minimalism, check out Ian Piumarta's
work on Cola:

<http://piumarta.com/software/cola/>

<http://piumarta.com/software/cola/objmodel2.pdf>

<http://piumarta.com/software/cola/colas-whitepaper.pdf>

------
silentbicycle
Lua and Awk are written in C. Scheme is often written in another Lisp or C,
and many Prologs are written in C or Lisp (or other Prologs). Erlang's VM is
written in C, though it was originally prototyped in Prolog. It's not unusual
to write scaffolding for a language (i.e., an interpreter or a VM) in C, and
then to write most of the rest of the language in itself. It can be a reality
check on the expressiveness of the language.

If you want to learn how to implement a language, I'd suggest starting with a
subset of Scheme or Forth. If you leave the advanced features (e.g.
continuations) for later, they're really not that complicated. You're better
off using a language with garbage collection and good string support, such as
Python (though if you're already comfortable with them, OCaml and SML are
particularly appropriate.) Mainly, you don't want to have to figure out
garbage collection, complex parsing, etc. on top of everything else. Also,
don't worry too much about efficiency when you're still feeling out a
language's overall design.

There's a great explanation of a Forth here (<http://www.annexia.org/forth>),
and pedagogical Scheme texts aren't hard to find. Christian Queinnec's _Lisp
in Small Pieces_ is particularly good.

Also, claiming to be "faster than C" is one thing, but claiming to be "a
faster way to get things done than C" is another. The difference is
significant, particularly when you can prototype your ideas in something
flexible, get _hard data_ from a profiler, and then just move the hotspots out
to C. C has its strengths, but exploratory programming is not one of them.

------
stan_rogers
It would have been difficult to have written APL in C originally, what with
the lack of time machines and all (APL was created in '57 as an abstract
notational concept and first implemented in '64 and released in '65; C was
created in '72). The interpreter was probably written in S/360 assembler --
just a guess based on the fact that similar but simpler list/vector operations
implemented 15 years later in the Lotus Notes macro (formula) language
required write-only C wizardry on a truly heroic scope. Somehow I don't see
anyone doing it in FORTRAN or COBOL without periodic psychiatric intervention.

------
jrockway
You are forgetting the self-hosting language implementations, all of which are
quite speedy:

* SBCL: SBCL

* GHC: GHC

* PyPy: Python

There are a lot of languages written in C because when those languages were
written, that's all there was.

Language implementations can be faster than the language they are written in
because that language is not used for the codegen. LLVM's C with JIT is
"faster than C", for example.

~~~
ramanujan
Is PyPy speedy nowadays?

------
joe_the_user
A lot of languages weren't written in pure c but involved a combination of c
(or java) and a parser-generator like Yacc or Bison.

The process of implementing a language isn't really about breaking out an
editor and a compiler and coding. Rather, language implementing about
specifying a virtual machine, a grammar and various other meta-programming
constructs. Thus "written in c" can a deceptive description.

Even a recursive-descent parser involves a process of transforming a syntax
specification into a series of recursive function calls.

------
jacquesm
There are 'bootstrap' languages that are mostly written 'in themselves', Forth
(in your list already) and Lisp come to mind but there are plenty of others.

For performance reasons usually some of the more frequently used constructs
will be re-written in a lower level language but there is no strict
requirement to do so.

You could write any language in any other language, provided the first one is
'Turing complete'.

The only reasons people will pick a certain language to do their (language
writing) work in is because of constraints.

edit:

Another common technique is the 'DSL', or domain specific language. Some
programming languages (again, such as forth and lisp) lend themselves better
to this technique than others. Basically the idea is to extend the language
within the accepted syntax with new words that add the required functionality.

After that these additions can be used like any other language primitive, even
if they sometimes will run a little slower because of internal overhead.

edit: Being programming 'in itself' is a sort of coming of age ritual for
languages, it means you can now face the world entirely on your own terms
instead of standing on top of the scaffolding provided by another language.

But not all languages are very well suited to this, the more specific to
solving a particular problem a language is the less likely it is to be
bootstrappable. And plenty of times it just isn't worth it, if you get
adequate portability and performance from being written in C then that's fine
too.

~~~
gvb
I suspect the availability of yacc/flex/bison and the free (as in freedom) Gnu
Compiler Collection are also a pretty big influence. They give a (monetary)
low cost of entry and a substantial amount of utility - I suspect a language
inventor would prefer to invent the language and have the lexing, parsing, and
code generation already taken care of by someone else.

~~~
eru
The parser combinators like Parsec (for Haskell) are also quite pleasant to
work with.

------
JadeNB
A related question was explored on LtU recently: <http://lambda-the-
ultimate.org/node/3754>.

------
dkersten
A natively compiled language can be written in whatever language you feel is
easiest to write the compiler in, so either the language you are most
comfortable in or the language with the best parser/code gen libraries.

Once you can compile your languages source code, you could reimplement the
compiler in your language, so that it is self hosted. Of course, you can't do
this until you can compile your code, hence doing it in another language
first.

If your language is interpreted or JIT compiled, running on some kind of VM,
then obviously you won't be self hosting, so you need to use a lnaguage which
is suitable for executing your language. Most people seem to choose C, I guess
because its low level and they want to reduce overhead.

I myself find Python to be nice to write parsers and such in (and it plays
nice with ANTLR), though my most successful code generator toy project was
written in Clojure.

~~~
eru
And it would be actually a sensible thing to write a C compiler in Python. C
is not a very nice language to write complicated programs in --- and compilers
don't need C's speed. They just need to produce (correct and) fast code, but
they do not need to run bleedingly fast.

~~~
smallblacksun
Um, what? You want your compilers to be very fast, otherwise you end up
wasting a lot of developer time (and thus money) waiting for compilation.

~~~
j_baker
Especially for C. It just takes too long to compile.

~~~
silentbicycle
C++ is far, far worse.

------
aidenn0
I'm used to compilers that are self-hosting. More recently there have been a
swath that were written in C or C++, but this is mainly because to work on
*nix, you will need to call out to a C library at some point, and writing it
in C make implementing the FFI fairly trivial.

------
gnosis
OCaml. See:

One-Day Compilers or How I learned to stop worrying and love metaprogramming

<http://www.venge.net/graydon/talks/mkc/html/mgp00001.html>

------
daeken
A good deal of higher-level languages are completely bootstrapping. C#,
Nemerle, a good number of Scheme implementations, etc all compile themselves.
You of course have to write your initial implementation in something else, but
once that's done many compliers bootstrap themselves completely. In fact, when
I design a language I write the smallest implementation possible before
bootstrapping up (as I'm doing with Dynemerle, a reboot of the Nemerle
project).

------
hga
PreScheme has been used in some Scheme implementations:
<http://en.wikipedia.org/wiki/PreScheme>

I've been looking into Lisp language implementation recently and it seems to
have been somewhat influential, I've seen several references to "this is like
PreScheme", most recently in Cola which jcdreads brought to our attention.

------
baguasquirrel
I remember at my last full-time job how we'd look at the python source
whenever we couldn't figure out what the expected behavior ought to be (yay
for python documentation... =P). Seeing this list, it makes one wonder about
all those "Java schools" that don't teach C anymore.

------
scott_s
I'm confused by question 1. I assume by saying "cross compile the language in
C" you mean "the compiler for language X generates C code." If so, the answer
is easy: to leverage the heavily used, portable and well optimized machine-
code generators that exist for C.

~~~
gvb
Sorry, no, I meant that the compiler for language X is written in C.

Some languages have generated another language as the intermediate; RatFor to
Fortran <http://en.wikipedia.org/wiki/Ratfor> and C++ to C for the first
generation of C++ are the two I remember.

In my experience, this is a horrible thing to do because it is extremely
difficult to find and fix the root cause for errors that turn up when
compiling the intermediate language (e.g. Fortran or C).

~~~
silentbicycle
It's easier to compile to a (restricted subset of) C and let other compilers
worry about code generation than to try to do the whole stack yourself. C, in
particular, works passably as a "portable assembler". If you already compile
to bytecode, for example, you could include the VM as a library and
transliterate the bytecode to C.

------
petercooper
Java's mentioned a few times in this list, but what's Java written in..?
Well.. [http://stackoverflow.com/questions/410320/what-is-java-
writt...](http://stackoverflow.com/questions/410320/what-is-java-written-in)

~~~
hga
Off the top of my head:

Hotspot, the Sun production JVM, is written in C++ with a _lot_ of inline
assembly (hard to port).

Jikes RVM, perhaps the major research JVM, it metacircular, Java with C
bootstrapping.

Jnode, an OS, uses Jikes replacing the C with assembly (x86).

Sun's Maxine research JVM is metacircular with C bootstrapping.

A large fraction of the other JVMs are written in C, sometimes C++, sometimes
with Java as well.

------
mhansen
Javascript: C++ (TraceMonkey, Spidermonkey, V8), Java (Rhino) [EDIT: Sorry,
C++, not C]

~~~
grayrest
Tracemonkey and V8 are C++

------
Locke1689
Scheme is often used in prototype interpreters. I just wrote a very simple
language grammar in Haskell, so I guess really "anything" can be an answer.

If you're asking about compilers , then it tends to be C (although LLVM is
coming along nicely).

------
xtho
For describing a language, I'd use English if you want to reach an
international audience.

I don't think a _language_ can claim to be faster than C. But a specific
interpreter/compiler can claim to be faster than a specific C compiler.

------
berlinbrown
Factor: Factor and a little C

~~~
dkersten
The C has/is being replaced by C++

------
cabalamat
I'm implementing a language of my own design in Python right now. I chose
Python because I'm familiar with it and I can piggyback on Python's strings,
hash tables, and garbage collection.

------
arebop
GHC, the Glasgow Haskell Compiler, is implemented in Haskell.

~~~
Locke1689
Well, just to be clear, GHC compiles to an intermediate stylized C form (hc
files), which are then compiled with a C compiler.

------
berlinbrown
* Python: C, but also Java and C# * Ruby: C, but also Java

Do you mean there are different implementations in other languages?

* Jython: Java * Cpython: C

? * Jython: Java

~~~
gvb
Yes.

Python

* CPython is the "original", written in C <http://en.wikipedia.org/wiki/CPython>.

* Jython is written in Java <http://en.wikipedia.org/wiki/Jython>.

* IronPython is written in C#<http://en.wikipedia.org/wiki/Ironpython>.

* PyPy is written in Python (I had forgotten this) <http://en.wikipedia.org/wiki/PyPy>.

Ruby

* The reference Ruby 1.8 (MRI) is written in C <http://en.wikipedia.org/wiki/Ruby_MRI>.

* Ruby 1.9 is still written in C, but compiles to a bytecode VM (YARV) <http://en.wikipedia.org/wiki/YARV>.

* JRuby is written in Java <http://en.wikipedia.org/wiki/JRuby>.

* Rubinius is largely written in Ruby per Wikipedia <http://en.wikipedia.org/wiki/Rubinius>.

* IronRuby runs on the .net framework (I assume it is written in C#) <http://www.ironruby.net/About> and <http://en.wikipedia.org/wiki/IronRuby>.

------
swolchok
Arc: MzScheme core, then Arc.

------
cmelbye
Ruby has also been done in Objective-C, hasn't it?

~~~
mr_dbr
<http://www.macruby.org/> not sure if it's rewritten in ObjC or just
integrated-with

~~~
cpr
My impression is that it's mostly C and some Ruby, plus C++ for the parts that
utilize LLVM.

When you're implementing something at the same level as Objective-C, you can't
generally use Obj-C itself.

(Witness the Obj-C runtime, which is mostly C and a tiny bit of assembler.)

------
ThinkWriteMute
Ruby is in C, C++, Ruby, and Objective-C.

------
zitterbewegung
Clojure: Java

~~~
Zak
Clojure is largely written in Clojure, with some Java at the core. There's
also a .NET port, which I assume replaces the Java with C#.

------
rick_2047
I think Smalltalk was self compiling?

------
anonjon
lisp is written in lisp. (and sometimes C)

