Hacker News new | past | comments | ask | show | jobs | submit login
Compiled and interpreted languages: Two ways of saying tomato (tratt.net)
67 points by jasim on Jan 11, 2023 | hide | past | favorite | 37 comments



One of my favourite things about the question between compilers and interpreters is the Futamura projections [0] which is a way of using partial evaluation to construct both things. It could be interesting if we see that getting used more explicit than something like GraalVM which one could argue is getting there. [1]

[0]: http://blog.sigfpe.com/2009/05/three-projections-of-doctor-f...

[1]: https://news.ycombinator.com/item?id=18995651 (and the parent: https://news.ycombinator.com/item?id=18995498)


You can probably draw a sharp line in two ways:

1. You're interpreting if syntactic tokens directly drive evaluation. So only first two are interpreted.

2. Interpreters take syntactic input and evaluate to a result that comes from running the program, a compiler takes syntactic input and outputs a program that must then itself be run to compute the result.

That said, I agree that "interpreted language" or "compiled language" is not a formally correct term, it's a connotative definition implying what's typical or the purposes for which a language may be suited.


my sharp line is "what do you see when profiling?"

if you see the program's inner loops, it's been compiled

if you see the language's inner loops, it's being interpreted


It really depends which profiler you are using. If you profile with a profiler designed for the low level language you will see more level things.


profiling at the executable/machine-code level


What about qsort?


same thing, only what you see will tell you whether the library in which qsort was defined has been compiled or is being interpreted.

ie, in the latter case, you'll see the language's dispatch loop in the "hot" range, as it interprets the def'n of qsort at multiple recursion levels... (and this signature will be independent of whether or not calls in each level are made on their own or combined horizontally to gain nested parallelism)


I think you're missing my point, which is that the language is providing qsort as part of it's standard library, so the indirect function call within qsort counts as the "language's inner loops", no?


We're talking past each other; I'm making a distinction between a language and its libraries which you are not making.

Consider a simpler example: let's profile not just qsort, but a couple of other relatively long-running library calls, and let's strip the executable beforehand, so we aren't even working with symbolic information, just a bunch of code addresses.

If we get multiple different hotspots, we're likely looking at a compiled library.

If we get the same hotspot in each case, we're likely looking at an interpreted library (and seeing the interpreter's dispatch loop).


The fact that you have to use "likely" means it's just a guess and no longer a sharp "line in the sand" as I originally described. Languages that operate on large data structures (like Julia and R) may have a similar design to using higher-order/callback functions like qsort everywhere, but those hotspots will always be in the language/standard library even when programs are compiled. I just don't think it's a reliable enough metric for this distinction.


Makes sense. Come to think of it, I would also caveat threaded (as in forth subroutine-threading not as in thread-local) code as showing up fuzzily for my approach.


Of course even if you don't use a bytecoded language, your object code is interpreted by a program, the micromachine, in your CPU (unless you are running something VERY simple like a Z80 or small AVR, which implements the machine code in hardware).

One of the nice perspective insights from 3Lisp, which of course was very high level!


Aren't Intel/AMD x86 CPUs basically the only ones that use microcode? ARM doesn't (it has micro-ops but they are different) and no RISC-V CPU I've seen does.


While most RISC-V implementations don't use microcode, it is an option if your priority is not performance. Here are just two examples:

https://github.com/gsmecher/minimax

https://github.com/brouhaha/glacial


Crystal lang has both compiler and interpreter.

https://github.com/crystal-lang/crystal


It's a long (but good) article, so don't miss the conclusion:

> Another thing I hope that I've indirectly shown, or at least hinted at, is that any language can be implemented in any style. Even if you want to think of CPython as an "interpreter" (a term that can be dangerous, as it occludes the separate internal compiler), there's PyPy, which dynamically compiles Python programs into machine code, and various systems which statically compile Python code into machine code (see e.g. mypyc).

(Emphasis mine:)

> There are interpreters for C and C++ (e.g. Cint or Cling).


Yes this is a well written and fun piece. I suspect when most people say X is a compiled|interpreted language, they are semantically meaning "the de facto implementation of X is..." but nonetheless, this piece is one of those moments that really revisits and unambiguously resolves a discussion that many probably hadn't bothered to worry about for years. In other words, I think I decidedly have a new take for the next time this type of questions comes up over lunch (:


What was your old take from before reading this article?


I probably would have entertained one of the two stances "X is compiled" or "X is interpreted", not questioning the implied "de factor implementation of X..." and engaged in a pedantic back and forth. Now I'd probably link this article and argue "it depends on the implementation" and ask "what is a compiler anyway".


> There are interpreters for C and C++

Usually those are for (large) subsets of the language though.


Sulong is another. But yeah maybe not 100% compatible, I don't know.

https://github.com/graalvm/sulong/blob/master/docs/ARCHITECT...


Aren't C++ compilers also interpreters for large subsets of the language, all the template / constexpr / consteval stuff ;)


Really great article, thank you author for writing it. I think the first 2 implementations are what I think as interpreted. The next two, not sure. For the last, there is no doubt of compiled. But that is just my understanding.


What would you classify TCC's "C scripting" as?

"just add '#!/usr/local/bin/tcc -run' at the first line of your C source, and execute it directly from the command line."


Compilation and runtime are nominal stages. JIT too.

It's all about transformation of code to other code... and we can be more flexible in our classification. If modern languages even need one.


I never found a book that talked about the relationship between these fully.

lisp in small pieces does try to convey the idea that interpretation and compilation are siblings (by partially deriving a transformer from an interpreter and ultimately a bytecode vm) but it would be worth a full book.


You could just use Java as the example: Compiled to bytecode, which is executed by a runtime interpreter. So: It's compiled & interpreted.


Compiled to bytecode which is executed by a runtime interpreter which traces the values passing through sections of it and uses those values to perform additional runtime optimizations on the newly annotated code and compile the new representation into machine code (or faster interpreted code). So you're right twice!


If it were that simple, I would simply say that Java is compiled, while JVM byte code is interpreted.

However, JVM bytecode is actually both interpreted and JIT compiled depending on runtime performance and other parameters. The JVM can even switch between compilation and interpreting for the same piece of code multiple times (most commonly, when attaching a debugger and hitting a breakpoint).


There are Java compilers that emit machine code.


I am going to hard disagree.

This analysis focuses on the language specification and ignores the wider picture.

The first question we have to ask about a language is what caused it to be created. Making a language is a lot of effort (though it can be a lot of fun), and there has to be a reason for it (even if it just curiosity).

Now to simplify stuff, let us choose to ignore purely exploratory languages.

The first question to ask is what problem the author was trying to solve with the language.

Take a look at C and AWK in which Brian Kernigan was heavily involved. The design goals for C were a lot different than the design goals of AWK. This led to C generally being compiled, and AWK being interpreted.

In addition, in the case of C, there were deliberate decision decisions made to make it easier and to compile and optimize (for example all the undefined behavior).

Now let's take a look at Java. The solution space that this was aiming for from nearly the beginning was a high-performance language compiled to a virtual machine using garbage collection. A lot of design decisions were made for the language with that goal in mind.

Or look at C++. The design goals basically necessitated a compiler (even if the output of the compiler was C as CFront did) vs an interpreter.

In addition, an ecosystem of a language is much more than the language, and can even be more important than the language itself. And the ecosystem, for the most part picks a side in the interpreted vs compiled debate.

You could have a Java without the JVM, but you would lose access to a lot of techniques, libraries, debugging tools, and development tools that you have now.

C++ does have interpreters, but these are all "use at your own risk" and many libraries will behave in weird ways (with regard to static initialization, etc) since they are being used in an unintended manner. I don't think anybody uses an interpreted C++ in a production system, and rather it is used more for exploratory development and analysis.

Similarly a compiled python loses compatibility with certain libraries and there are issues that pop up that you would not have with the more widely used interpreted version.

So yes, theoretically languages can be compiled or interpreted, but practically from the design going forward, usually one or the other is explicitly or implicitly aimed for, and using a language against that grain can lead to a lot of unnecessary pain.


It's fitting that the title mentions tomatoes. Using Python as a compiled language is like eating a tomato as a fruit.


Python is a great example of a compiled language for things like this. It's "known" that it's an interpreted language. Because it allows monkey patching and the like it's also challenging to statically optimise in a compiler. Yet cpython leaves compiled bytecode files scattered around on disk so most python programmers will also have wondered "what are these .pyc files" at some point.

It then gives you tooling to poke around the bytecode which is clear enough that you can read it. I once gave a talk showing x64 and bytecode side by side for some arithmetic function and if you pick the example carefully they line up well.

Then there's cython and pypy to point to for different points in the design space than cpython, or jython if you want to talk about the jvm. There's probably one on .net and iirc truffle has an implementation as well. The abandoned unladen swallow. So there's loads of python implementations all approaching the interpreted vs compiled tradeoffs differently.


Yes, you can still optimize around Python that way. The big difference in end result vs C comes from how Python prioritized runtime flexibility, while C aimed for speed by ensuring the compiler knows more about the program execution beforehand (mainly the size of every variable).

The difference is most obvious looking at arrays of fixed-size integers, where a C int* far outperforms a Python list of ints no matter the implementation. Python list is actually a different thing because it has variable-sized integers and can take other objects too, so you can't optimize around that but could do so with a more specific fixed-size int array interface... which is what NumPy did.


To me, compiled means fast with static types only, interpreted means slower and with possibly dynamic types. Most people will probably agree that C is compiled and Python is interpreted. I know there are some counterpoints that break that, but I don't care, cause the designation is still useful.

The author has identified that "compiled" and "interpreted" don't mean much about how the code technically runs. What really makes the difference is the intended use case. Same reason tomatoes aren't fruits in layman's terms.


Consider javac, hotspot jvm and a java ahead of time compiler. All three are compilers.


Java is one of those counterexamples that falls in between. And fittingly, it's vaguely rated in between C and Python both in terms of performance and flexibility. Even though you could write a slow interpreter for C if you really wanted to.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: