Written in itself, and thus self-compiling? IMHO that's one of the biggest advantages of using C/C-subset compilers, that it's relatively easy to do so, and I think writing a self-compiling compiler is especially important in a course because it really drives home the point that a compiler is just another piece of software.
This fact is already conveyed conveniently by writing the compiler itself in any language. The important this is the transformation from structure to structure and the recursive nature of grammar rules.
Having wrote an assembler, a linker and for another project a sort of compiler with Flex & Bison (FsLexYacc actually) I think emphasis should be on how close writing a parser and compiler are (because one task include the other ). Thus, understanding the main ideas in compilers can be reused for real tasks such as data parser, where is can solve problems more eloquently than ad hoc coding.
This one here is a great alternative approach, specially to dispel the idea C or C++ are required to be anywhere in a compiler stack.
According to the sibling comment, the compiler is written in Java. The Java compiler is written in Java and runs in a JVM which is itself written in C/C++, and we all know how C compilers were created...
Java has multiple implementations, some of them happen to be fully done in Java, like JikesRVM, MaximeVM and Graal.
Naturally you might say, that you only care about OpenJDK, which incidentally now has the Project Metropolis with the long term roadmap to use Graal and SubstrateVM to replace those C/C++ parts.
As for C, many probably aren't aware that BCPL was originally designed to bootstrap CPL, a memory safe systems programming language based on Algol, which due to mismanagement never took off, while BCPL bare bones as it was, took a life of its own.
Also well into the mid 90's there were quite a few OSes where C was still far from doing its mark, thus it was not even an idea to use it when bootstrapping a compiler.
Yes, but most of them, including Graal atm, are insignificant curiosities (as far as developers working with them are concerned. If they were a different language, they wouldn't even be on TIOBE 100, whereas Java in C/C++ is in the top 5).
So pointing them out gives mainly "well, actually" pedantic points, than illuminates any real importance they have.
Just like we don't have gcc, clang, msvc, whatever-cc on the TIOBE top 5.
Apparently mixing languages with implementations keeps being a thing.
Maybe we should also start arguing that C requires C++ to be implemented, given that all major mainstream C compilers are now written in C++.
Yes, I'm obviously familiar with the distinction between language and implementation.
My point is that regarding Java, it doesn't matter, as there's a single (well, 2 if you count Oracle - OpenJDK as different) and an insignificant assortment of others.
The concrete implementation of "Java the programming language" actually used by enough people to put it in the TIOBE index is the Oracle/OpenJDK one.
If we took all the other Java implementations mentioned, called them a different name (e.g. Jakarta) and added their users together, the new "language" user-base wise, they wouldn't even have made it to the Top 100.
(Google's Java-copied Android APIs aside of course -- those do have tons of users and matter)
>Apparently mixing languages with implementations keeps being a thing.
Apparently the distinction between language and implementation is pedantic in most cases, where the alternative implementations are niche products and hobby projects, and as far as one is concerned Java is 99.9% a specific implementation.
OpenJDK, IBM J9, PTC, Aicas, Ricoh, Xerox, Gemalto, SAPVM, GraalVM (delivering your tweets)
Niche products that have been in business since Java is around, which are a couple of years now.
So they do matter, on a age where many developers don't want to pay for software while feeling entitled to be paid for.
Yes it is pedantic, which is why Linux happens to be written in GCC C, not C. Good luck compiling it with another ISO C 100% standards compliant compiler.
Regarding TIOBE, neither Ctrl+F Oracle nor Ctrl+F OpenJDK turn out to find anything, just plain old Java, zero references to implementations.
1) It will mention some experimental/research languages from the 60s-90s (if Oberon is mentioned, there's pjmlp for example).
2) It will mention some otherwise niche commercial compilers/IDEs.
I enjoy that you dug into all this history, and remind people of the legacy and innovations of all those research/minor/forgotten languages.
And that you also remind people that paying for software can help sustain some high quality products we don't have as FOSS at all (though it's a bitter pill to swallow when you're a student for example, or a programmer making much less than the US/EU devs somewhere else in the world, not a consultant where the customer pays for your software).
But, personally, I find that you have a tendency to not account for the niche factor, and all it means, talking about some products/languages/implementations as if they're an equally valid and even common choice with the stuff people actually use.
E.g. "Who said Java doesn't do X? There's the commercial JVM/compiler "Foobar" that does X!", yeah, only that has like 1000 users and costs $10K, and 99.999% of the people when they want Java to do X, they want the Java implementation they actually use to do it.
So my comment above was in that spirit...
>Regarding TIOBE, neither Ctrl+F Oracle nor Ctrl+F OpenJDK turn out to find anything, just plain old Java, zero references to implementations.
Missed my point, the "plain old Java" in TIOBE depends on programmers using particular implementations, not the abstract idea of Java or the Java specs. And those implementations are Oracle/OpenJDK by a far far far margin.
If you take the companies programmers that use Oracle/OpenJDK out of the picture, remove their Google searchers, Stack Overflow questions, blog posts, conventions, job postings, etc there wouldn't be any mention of Java on the TIOBE index, even if there would still be 20+ additional implementations.
Same way that if the Python community was reduced to just the PyPy users, Python would be nowhere in TIOBE -- even though TIOBE says "Python" and not "CPython".
As for the rest, I guess I can only say thanks for the remarks and feedback.
You're proposing an alternate universe, where CPython didn't exist, and PyPy built more or less the same Python community up.
Which is plausible, but not relevant to the mechanics of my point.
My argument wasn't "what if we didn't have CPython, but PyPy from the start, then Python wouldn't be successful" (which is obviously wrong).
It was rather "a feature that's only in PyPy, is niche and not really that relevant as far as Python is concerned, even if PyPy is a Python, since PyPy has an insignificant amount of users -- so few that, if we removed the CPython user's impact and only kept PyPy's impact (as it stands, not going back in 1990 and replacing one with the other), Python wouldn't be in the TIOBE Top 100".
The thing with actually self-compiling is that you need language features such as file management, which are not trivial and would take time that we don't have in a compiler course to cover. There is already so much to be said about compilation, and generally the course must also cover assembly language and hardware architecture at least.
You should be able to "just" use the C library if your compiler uses a C-compatible calling convention (which is a good idea anyway) and allows a C-compatible representation of strings (for file names) and characters (for fgetc). File management is then mostly just passing opaque FILE * pointers around. IIRC the compiler course I took at university did this.
That said, your parent's point is bogus: The act of writing a compiler in not-C is sufficient to drive home the point that a compiler is just another piece of software. On the contrary, writing a compiler in C because of self-hosting reinforces the misconception that compilers and C are somehow special.
Although I follow Bootstrapping projects, I think the self-hosting focus itself is a bad idea. PreScheme, a systems language, was easier to parse, more productive than, and compiled to C. The latter to leverage its optimizing compilers but showed it was not necessary to give up Scheme benefits to do it all in C.
Likewise, many of these compilers would iterate faster with more correctness if written in a higher-level language instead of being self-hosting just because. The libraries will already test the language in a variety of situations. So, that excuse doesn't fly for me if trading away productivity and/or maintainability of the better language for the compiler/interpreter.
Keith Schwartz Stanford Compiler course CS143
Stanford's Engineering Compiler course on Lagunita
Matt Might's courses are self educational blueprints
Introduction to Compilers and Language Design
I'm not a fan of uncurated link dumps, however:
Awesome compilers link aggregation on Github
(I designed the original version, though it's improved a lot in the past few years)
As for GC, if I read the code correctly it doesn't do any and simply aborts on out of memory. Fair for a first compiler course.
Cutter (which is based on Radare2, basically a GUI for r2) can do that under the "Disassembly" tab! It works with executables and source code. You have to configure it to show the additional information though. It is such a neat tool!
But here I mean that in the ChocoPy code you get explanations of what instructions do in the context of the program's semantics, i.e., how they relate to a higher-level view of what's going on. An example:
# Function print
lw a0, 0(sp) # Load arg
beq a0, zero, print_6 # None is an illegal argument
lw t0, 0(a0) # Get type tag of arg
li t1, 1 # Load type tag of `int`
beq t0, t1, print_7 # Go to print(int)
li t1, 3 # Load type tag of `str`
beq t0, t1, print_8 # Go to print(str)
li t1, 2 # Load type tag of `bool`
beq t0, t1, print_9 # Go to print(bool)
print_6: # Invalid argument
li a0, 1 # Exit code for: Invalid argument
la a1, const_4 # Load error message as str
addi a1, a1, @.__str__ # Load address of attribute __str__
j abort # Abort
You can find it at your local Korean market, and in Costco if you are in the SF Bay Area