
ChocoPy: A Programming Language for Compilers Courses - matt_d
https://chocopy.org/
======
userbinator
_Concise: A full compiler for ChocoPy be implemented in about 12 weeks by
undergraduate students of computer science. This can be a hugely rewarding
exercise for students._

Written in itself, and thus self-compiling? IMHO that's one of the biggest
advantages of using C/C-subset compilers, that it's relatively easy to do so,
and I think writing a self-compiling compiler is especially important in a
course because it really drives home the point that a compiler is just another
piece of software.

~~~
pjmlp
Agreed, however there are plenty of languages one can use for bootstraping, I
don't see it as advantage from C/C++, rather how many lectures are lazy in how
they build their curriculum.

This one here is a great alternative approach, specially to dispel the idea C
or C++ are required to be anywhere in a compiler stack.

~~~
userbinator
_specially to dispel the idea C or C++ are required to be anywhere in a
compiler stack._

According to the sibling comment, the compiler is written in Java. The Java
compiler is written in Java and runs in a JVM which is itself written in
C/C++, and we all know how C compilers were created...

~~~
pjmlp
Almost there, then a wrong turn.

Java has multiple implementations, some of them happen to be fully done in
Java, like JikesRVM, MaximeVM and Graal.

Naturally you might say, that you only care about OpenJDK, which incidentally
now has the Project Metropolis with the long term roadmap to use Graal and
SubstrateVM to replace those C/C++ parts.

As for C, many probably aren't aware that BCPL was originally designed to
bootstrap CPL, a memory safe systems programming language based on Algol,
which due to mismanagement never took off, while BCPL bare bones as it was,
took a life of its own.

Also well into the mid 90's there were quite a few OSes where C was still far
from doing its mark, thus it was not even an idea to use it when bootstrapping
a compiler.

~~~
coldtea
> _Java has multiple implementations, some of them happen to be fully done in
> Java, like JikesRVM, MaximeVM and Graal._

Yes, but most of them, including Graal atm, are insignificant curiosities (as
far as developers working with them are concerned. If they were a different
language, they wouldn't even be on TIOBE 100, whereas Java in C/C++ is in the
top 5).

So pointing them out gives mainly "well, actually" pedantic points, than
illuminates any real importance they have.

~~~
pjmlp
Java the programming language is in the TIOBE top 5, not OpenJDK.

Just like we don't have gcc, clang, msvc, whatever-cc on the TIOBE top 5.

Apparently mixing languages with implementations keeps being a thing.

Maybe we should also start arguing that C requires C++ to be implemented,
given that all major mainstream C compilers are now written in C++.

~~~
coldtea
> _Java the programming language is in the TIOBE top 5, not OpenJDK._

Yes, I'm obviously familiar with the distinction between language and
implementation.

My point is that regarding Java, it doesn't matter, as there's a single (well,
2 if you count Oracle - OpenJDK as different) and an insignificant assortment
of others.

The concrete implementation of "Java the programming language" actually used
by enough people to put it in the TIOBE index is the Oracle/OpenJDK one.

If we took all the other Java implementations mentioned, called them a
different name (e.g. Jakarta) and added their users together, the new
"language" user-base wise, they wouldn't even have made it to the Top 100.

(Google's Java-copied Android APIs aside of course -- those do have tons of
users and matter)

> _Apparently mixing languages with implementations keeps being a thing._

Apparently the distinction between language and implementation is pedantic in
most cases, where the alternative implementations are niche products and hobby
projects, and as far as one is concerned Java is 99.9% a specific
implementation.

~~~
pjmlp
It is a little more than two.

OpenJDK, IBM J9, PTC, Aicas, Ricoh, Xerox, Gemalto, SAPVM, GraalVM (delivering
your tweets)

Niche products that have been in business since Java is around, which are a
couple of years now.

So they do matter, on a age where many developers don't want to pay for
software while feeling entitled to be paid for.

Yes it is pedantic, which is why Linux happens to be written in GCC C, not C.
Good luck compiling it with another ISO C 100% standards compliant compiler.

Regarding TIOBE, neither Ctrl+F Oracle nor Ctrl+F OpenJDK turn out to find
anything, just plain old Java, zero references to implementations.

[https://www.tiobe.com/tiobe-index/programming-languages-
defi...](https://www.tiobe.com/tiobe-index/programming-languages-definition/)

~~~
coldtea
I usually don't pay attention to names on HN (usually answer blind or don't
remember if I talked again to someone), with a few exceptions that have a
distinctive style or comment very frequently. I can almost always tell when
I'm reading something from you based on two things:

1) It will mention some experimental/research languages from the 60s-90s (if
Oberon is mentioned, there's pjmlp for example).

2) It will mention some otherwise niche commercial compilers/IDEs.

I enjoy that you dug into all this history, and remind people of the legacy
and innovations of all those research/minor/forgotten languages.

And that you also remind people that paying for software can help sustain some
high quality products we don't have as FOSS at all (though it's a bitter pill
to swallow when you're a student for example, or a programmer making much less
than the US/EU devs somewhere else in the world, not a consultant where the
customer pays for your software).

But, personally, I find that you have a tendency to not account for the niche
factor, and all it means, talking about some
products/languages/implementations as if they're an equally valid and even
common choice with the stuff people actually use.

E.g. "Who said Java doesn't do X? There's the commercial JVM/compiler "Foobar"
that does X!", yeah, only that has like 1000 users and costs $10K, and 99.999%
of the people when they want Java to do X, they want the Java implementation
they actually use to do it.

So my comment above was in that spirit...

> _Regarding TIOBE, neither Ctrl+F Oracle nor Ctrl+F OpenJDK turn out to find
> anything, just plain old Java, zero references to implementations._

Missed my point, the "plain old Java" in TIOBE depends on programmers using
particular implementations, not the abstract idea of Java or the Java specs.
And those implementations are Oracle/OpenJDK by a far far far margin.

If you take the companies programmers that use Oracle/OpenJDK out of the
picture, remove their Google searchers, Stack Overflow questions, blog posts,
conventions, job postings, etc there wouldn't be any mention of Java on the
TIOBE index, even if there would still be 20+ additional implementations.

Same way that if the Python community was reduced to just the PyPy users,
Python would be nowhere in TIOBE -- even though TIOBE says "Python" and not
"CPython".

~~~
pjmlp
If Python community would be reduced to PyPy, the TIOBE index would be exactly
the same as it is now, because they would still be asking questions about the
programming language named Python.

As for the rest, I guess I can only say thanks for the remarks and feedback.

~~~
coldtea
> _If Python community would be reduced to PyPy, the TIOBE index would be
> exactly the same as it is now, because they would still be asking questions
> about the programming language named Python._

You're proposing an alternate universe, where CPython didn't exist, and PyPy
built more or less the same Python community up.

Which is plausible, but not relevant to the mechanics of my point.

My argument wasn't "what if we didn't have CPython, but PyPy from the start,
then Python wouldn't be successful" (which is obviously wrong).

It was rather "a feature that's only in PyPy, is niche and not really that
relevant as far as Python is concerned, even if PyPy is a Python, since PyPy
has an insignificant amount of users -- so few that, if we removed the CPython
user's impact and only kept PyPy's impact (as it stands, not going back in
1990 and replacing one with the other), Python wouldn't be in the TIOBE Top
100".

------
xurias
A bit disappointing that the course isn't openly available in any way. It
sounds really interesting.

~~~
kark
The course website from last semester (when I took this course!) has just
about everything except for lecture recordings and starter code for the
project, and it’s still up: [http://www-
inst.eecs.berkeley.edu/~cs164/sp19/](http://www-
inst.eecs.berkeley.edu/~cs164/sp19/)

------
mixmastamyk
Interesting, are there any other free, modern compiler courses?

~~~
bhrgunatha
University Of Washington CSEP501: Compiler Construction
[http://courses.cs.washington.edu/courses/csep501/](http://courses.cs.washington.edu/courses/csep501/)

Keith Schwartz Stanford Compiler course CS143
[http://www.keithschwarz.com/cs143/](http://www.keithschwarz.com/cs143/)

Stanford's Engineering Compiler course on Lagunita
[https://lagunita.stanford.edu/courses/Engineering/Compilers/...](https://lagunita.stanford.edu/courses/Engineering/Compilers/Fall2014/about)

Matt Might's courses are self educational blueprints
[http://matt.might.net/teaching/compilers/](http://matt.might.net/teaching/compilers/)

Book Introduction to Compilers and Language Design
[https://www3.nd.edu/~dthain/compilerbook/](https://www3.nd.edu/~dthain/compilerbook/)

I'm not a fan of uncurated link dumps, however: Awesome compilers link
aggregation on Github [https://github.com/aalhour/awesome-
compilers](https://github.com/aalhour/awesome-compilers)

~~~
vymague
I really wish we have a compiler analog of "Operating Systems: Three Easy
Pieces".

~~~
pjmlp
Something like "Compiler Construction in Oberon", "lets build a compiler",
"Turbo Pascal Internals"?

[https://inf.ethz.ch/personal/wirth/CompilerConstruction/inde...](https://inf.ethz.ch/personal/wirth/CompilerConstruction/index.html)

[https://compilers.iecc.com/crenshaw/](https://compilers.iecc.com/crenshaw/)

[http://turbopascal.org/](http://turbopascal.org/)

------
tom_mellior
This looks cool, and if you click the "Compile to RISC-V" button you get to a
page where a lot of the assembly code has meaningful comments explaining what
each instruction does. I wish we had that in every compiler...

As for GC, if I read the code correctly it doesn't do any and simply aborts on
out of memory. Fair for a first compiler course.

~~~
johnisgood
> where a lot of the assembly code has meaningful comments explaining what
> each instruction does

Cutter (which is based on Radare2, basically a GUI for r2) can do that under
the "Disassembly" tab! It works with executables and source code. You have to
configure it to show the additional information though. It is such a neat
tool!

~~~
tom_mellior
Sorry, I was probably unclear. I imagine Cutter just tells you what each
_opcode_ does, no? gcc.godbolt.org does that too, with links to the ISA docs.

But here I mean that in the ChocoPy code you get explanations of what
instructions do in the context of the program's semantics, i.e., how they
relate to a higher-level view of what's going on. An example:

    
    
        .globl $print
        $print:
        # Function print
          lw a0, 0(sp)                             # Load arg
          beq a0, zero, print_6                    # None is an illegal argument
          lw t0, 0(a0)                             # Get type tag of arg
          li t1, 1                                 # Load type tag of `int`
          beq t0, t1, print_7                      # Go to print(int)
          li t1, 3                                 # Load type tag of `str`
          beq t0, t1, print_8                      # Go to print(str)
          li t1, 2                                 # Load type tag of `bool`
          beq t0, t1, print_9                      # Go to print(bool)
        print_6:                                   # Invalid argument
          li a0, 1                                 # Exit code for: Invalid argument
          la a1, const_4                           # Load error message as str
          addi a1, a1, @.__str__                   # Load address of attribute __str__
          j abort                                  # Abort
    

Note that the different occurrences of lw and li explain the meanings of the
magic constants in the code. This would be pretty hard to do from disassembly
alone.

~~~
johnisgood
Ah sorry, at first I thought you meant something like:
[https://i.imgur.com/RZeFKZQ.png](https://i.imgur.com/RZeFKZQ.png). I did not
try ChocoPy out myself, so I had no idea of the output to what you were
referring, which is my fault. In any case, I hope I introduced Cutter/Radare2
to someone at least. :D

------
JaDogg
Make the course content available for public please.

------
eterps
A ChocoNim could also be an interesting choice, closer to the metal and opens
up potential for self compilation.

------
xhgdvjky
yeah but is it COOL

~~~
xhgdvjky
since it's getting down voted ..
[https://en.wikipedia.org/wiki/Cool_(programming_language)](https://en.wikipedia.org/wiki/Cool_\(programming_language\))

~~~
jperry
Not knowing anything about this, I thought this would be about C# :)

[https://en.wikipedia.org/wiki/C_Sharp_(programming_language)...](https://en.wikipedia.org/wiki/C_Sharp_\(programming_language\)#History)

------
shantanu77
What! The best ever python name has been taken for complier course. My both
sons love chochopy and they will be devasted to know that chochopy isn't a
sprite based programming language for kids.

~~~
popnroll
What? I don't know what "chochopy" is, but this project is called Chocopy.

~~~
shard
Choco-pie is a delicious Korean snack made with chocolate and marshmallows:
[http://www.trifood.com/chocopie.asp](http://www.trifood.com/chocopie.asp)

You can find it at your local Korean market, and in Costco if you are in the
SF Bay Area

