Hacker News new | past | comments | ask | show | jobs | submit login
A graph of programming languages connected through compilers (akr.am)
241 points by andyonthewings on July 6, 2018 | hide | past | favorite | 104 comments



Looks like everyone want to add some missing info. The github source code link can be found at the top right corner. For easy reference, here it is https://github.com/mohd-akram/languages


More specifically a PR to update https://github.com/mohd-akram/compilers/blob/master/src/comp.... One could argue that source and target should allow arrays. But in general someone needs to maintain an up-to-date "awesome-compilers" list.


So, you can apparently compile C++ to LLVM-IR, then to Javascript, then to Python and then to C++ (you can do similar feat with C). Wonder if there is a fix point, or it diverges.


I looked at js2py (the most surprising-to-me link in that chain) the code it generates already kind of explodes on loc. I think it's probably very unlikely there is a fixed point instead of an exponential explosion of your program as you run it through the chain.


Like when you auto-translate English -> Japanese -> English. Very unlikely you will recover the original

Good fun though.


By the pigeonhole principle, any such iteration must either enter a cycle, or the translations must be unbounded in length. If you experiment, you will find that Google Translate usually lands in the first category; transpilers usually land in the second.


I am iterested in an example that cycles on Google translate - can you share one?


> I am iterested in an example that cycles on Google translate - can you share one?

Just translate 'hello' back and forth in any pair of languages I tried.


There's a web site dedicated to this http://translationparty.com


Thanks!


Almost all compilers producing machine code can instead also produce assembly. In fact, many compilers internally produce assembly and then run an assembler to produce machine code. So the Assembly node is missing many in arrows.


goddamn Haxe and Python messing everything up :p https://imgur.com/a/PgNWLbG


Converted graph into JSON format compatible with D3, graph and others: https://pastebin.com/aA1sPaai


I can finally fulfill my dream of writing cross-platform C++ by transpiling for the JVM. It's just too bad I have to convert from LLVM to JS to Python first. And then I could convert back to C++ and start over again...

Neat visualization! I wonder how large it could get if people could add their own nodes and edges.


> writing cross-platform C++

I think you are looking for WASM and it is doable today. WASM can target the JVM.


A few missing items:

- CIL to C++ using CoreRT/AoT compiler [1]

- JVM to CIL using ikvmc [2]

- Quite a few languages to WebAssembly using LLVM [3]

- J# if you are willing to include dead but somewhat significant languages

[1] https://blogs.msdn.microsoft.com/alphageek/2016/10/13/native...

[2] http://www.ikvm.net/userguide/ikvmc.html

[3] https://stackoverflow.com/questions/43540878/what-languages-...


- Quite a few languages to WebAssembly using LLVM

Languages are compiled to LLVM IR and then to WebAssembly. These are shown, right?


The graph doesn't show LLVM IR compiles to WASM. It does show LLVM IR to JS via Emscripten though.


Isn't Python to C using CPython missing?

Edit: Or I guess technically python byte code. [1]

[1] http://effbot.org/zone/python-compile.htm


> Isn't Python to C using CPython missing?

Python doesn't compile Python to C, non-technically or otherwise. Where did you get that impression?

I presume they don't include internal IRs like Python bytecode, as it's not any kind of shared or standardised format. Otherwise you'd be including tons of different compiler IRs.


I thought it maps your code to the base Py_Object like how writing an extension works? Idk only done a little bit of that and need to dive into the process more. On my todo list so if you have some links that have a good explanation of this that would be great. Unfortunately was unable to take compilers in school so missed out on the conceptual aspect of the compilation process.

edit: found the docs on python.org for the compiler [1]

[1] https://devguide.python.org/compiler/


Yeah it doesn’t compile it into C though.

You may be thinking of Cython, which compiles a language similar to Python, but not quite Python, into C using the Python C extension interface.


(A subset of) Clojure can be compiled to C++ using the Ferret compiler: https://ferret-lang.org/


Also OCaml to OCaml bytecode via ocamlc to Javascript via js_of_ocaml.


Also, rust to webassembly


I think the correct representation of this is Rust to LLVM IR (which is there), then LLVM to WebAssembly (which should be there but isn't).

Actually, WebAssembly as a compile target is just generally not there on the graph, which is weird. The Wasm compiler scene is pretty rapidly fluctuating though.


I like the graphing library used in this one. A quick look and search at the code leads me to this https://github.com/cytoscape/cytoscape.js-cose-bilkent


I can vouch for this library, I recently had to do some graphing and tried sigmajs first, but I ended up rewriting it to use cytoscape. Cytoscape is incredibly powerfull, although it would help if they had split up the documentation into multiple pages so that Google works on it.


Kudos to the first person to get one of these working right for anything beyond Hello World:

JavaScript -> Js2Py -> PyPy -> Machine Code

JavaScript -> Js2Py -> CIL -> LLVM IR (-> Machine Code)

JavaScript -> Js2Py -> Pythran -> C++ -> LLVM IR/Machine Code

That's got to look like spaghetti by the end, no?


I guess that PyPy could directly compile only RPython (subset of Python) so it could be a bit of hassle to get proper code :)


Did Javascript become the babelfish of all languages?


It makes a lot more sense when you see Javascript as "machine code" for browsers.



I remember YavaScript!


44 languages

42 languages compile to Machine Code

Wait, what? There are 2 languages that don't compile to Machine Code? Which ones are those, and how is it even possible?


> how is it even possible?

Instead of compiling to machine code, you compile to another language instead. C++ was originally compiled to C, for example.

Why would you think it wasn't possible?


Compilation is the act of generating a back-end code. Most languages compile to Machine Code, but you could also compile to some other language like C, Haskell, Javascript that you know has a very well-optimized compiler. Say, if you generate C code, and do it well, you know that your language will be almost as fast as C.


Haxe basically compiles to any other language. So why would it need to go straight to Machine Code?

It is super-multiplatform.


Java compiles to a bytecode which is not machine code. Once bytecode is executed on target platform runtime, it is then compiled down to machine code.

But there's more to it than that. The bytecode is actually interpreted at first by the JVM runtime. The code is also continuously dynamically profiled. There are two compilers C1 and C2.

Whatever functions are using the most cpu time get compiled using C1. C1 rapidly compiles to poorly optimized code, but this is a big speedup over the bytecode interpreter. The function is also scheduled to be compiled again in the near future using the C2 compiler. The C2 compiler spends a lot of time compiling, optimizing and aggressively inlining.

But there's more. C2 can optimize its compile for the exact target instruction set, plus extensions, for the actual hardware it is running on at the moment. An ahead of time C compiler cannot do that. It needs to generate x86-64 code that runs on a large variety of hardware processors.

But there's more. The C2 compiler can optimize based on the entire global program. Suppose a function call from one author's library to another author's library can be optimized in some way by writing a different version of that function. C2 can take advantage of this and do it where a C compiler can not because it doesn't know anything about the insides of the other library it is calling -- which might be rewritten tomorrow, or might not be written yet. Once the Java program is started, the C2 compiler can see all parts of the running program an optimize as needed.

But there's more. Suppose YOUR function X calls MY function Y. If your function X is using much CPU, it gets compiled to machine code by C1, and then in a short time gets recompiled again by C2. The C2 compiler might inline my Y function into your X function. Now suppose the class containing my Y function gets dynamically reloaded. Your X function now has a stale inlined version of my Y function. So the JVM runtime changes your X function back to being bytecode interpreted once again. If your Y function is using a lot of CPU, then it gets compiled again by C1, and then in a while, by C2.

All this happens in a garbage collected runtime platform.

It is why Java programs seem to start up, but take a few minutes to "warm up" when they start running fast. Many Java workloads are long running servers, so startup is infrequent.

Now you know why Java can run fast for only six times the amount of memory as a C program.


This is when describing how the Hotspot JVM would execute a Java application.

There are other ways to execute Java applications.


That is true.


There's ActionScript, which can only be interpreted.


Hm. I thought Flash has bytecode rather than ActionScript being plain interpreted.


You are correct. ActionScript compiles to ABC (ActionScript ByteCode) which is run by the AVM (ActionScript Virtual Machine).


Does it count interpreters?


Interpreted only


My main takeaway: people want to avoid writing JavaScript about as much as they want to avoid writing machine code or Java bytecode.


Alternative takeaway: Javascript is so widely deployed that most languages have found a way to compile to it


Alternativev (joke) takeaway: Javascript is as important to computers as machine code


Alternative non-joke: JavaScript is important to the web, just like machine code is important to computers. Actually, many believed JavaScript was the assembly of the web (just attempt to read the minimized scripts running in your browser), until WebAssembly...

* https://www.hanselman.com/blog/JavaScriptIsAssemblyLanguageF...


I think the quality of JavaScript as a language is completely irrelevant. It is because it's the only* language that runs in web browsers. So if you want your stuff to run there, you have to target JavaScript.

If WATFIV was the only language that ran in browsers, everything would target it, too.


Nah. It's just that different people have different tastes. Don't compare JavaScript alternatives to people wanting an alternative to Java bytecode; compare JavaScript alternatives to people wanting an alternative to Java.

It's just that, at least for the first couple decades of its existence, the platform in question had no bytecode, so transpiling was the only way to escape.


Dude, where's my lisp?


Right, sbcl generates some damn good x86_64, even interactively.


Yes, there are some Lisps (Scheme and CL) that run on Java JVM.


BEAM (the Erlang VM) + Erlang, Elixir, hipe-llvm seems like a good addition to the graph.


It seems like there is a lot missing from this graph. There's a ton of info for conversion to javascript here:

https://github.com/jashkenas/coffeescript/wiki/list-of-langu...


I tried in vain to find WebAssembly. Why would you include JavaScript but not WebAssembly in a graph that has LLVM in it?


It does have WebAssembly, but its incorrectly labeled as targeting Java Bytecode.


> incorrectly labeled as targeting Java Bytecode

https://github.com/cretz/asmble

The name of that tool is even labelled in the graph!


That line is labelled with https://github.com/cretz/asmble


What stands out to me is the absence of anything targeting WebAssembly. This must be a work in progress.


It says V8 compiles JavaScript to Machine Code, but is that really correct given that the machine code is an intermediate product, and the result of applying a JIT to a specialized piece of code (where part of the variables are already known to the compiler)?


I guess what you're saying is that machine code is an "intermediate product" in the sense that the final product must include the compiler, and the machine code produced by a JIT compiler is useless by itself. And I agree with your overall point, JIT compilers and VMs shouldn't be in there. You might as well include all interpreters. If you look at a JIT compiler as a black box, it acts like an interpreter, and I'm pretty sure this is meant as a practical graph ("what can I do with this language") rather than one that describes the internals of compilers and interpreters.


> given that the machine code is an intermediate product

The machine code isn't the intermediate product - it's the final product. Some internal IR is the intermediate product.

But yes it's normal convention in the industry to talk about JIT compilation to machine code as 'compiling to machine code'.


But let's say, they added "decompilers" to the graph, and there is (say) a decompiler from machine code to C. Then I'd assume, because there is a path in the graph from JavaScript to C (going through machine code), that I can translate from JavaScript to C. Except this isn't true because V8 is a JIT, and doesn't output its internal representation.


It's not a graph of compilers to a file on disk. It's a graph of compilers. Just because it exists only in memory doesn't mean it doesn't exist. Has the JavaScript been compiled to machine code. Yes, it has.

You mean that the JavaScript compilation is only valid for a given application at a given point in time? Yes that's true.


What I meant is: the graph is more useful if you can apply the relations transitively (i.e. find one or more paths between any two languages, and use the compilers to walk along the path). The demo even suggests that by allowing you to find a path, and leave the "directly" box unchecked.


Interactive graph is nice, but what's with so few zoom levels? It's either too small or too large.. Or is my mouse scroll wheel too sensitive?


I suspect it was developed on a trackpad, which has pixel-level scroll precision. Zooming is a bit fast but generally fine on my MacBook Pro. Probably anything like this should have a zoom slider (like Google Maps does) for people without precise scrolling or pinch zoom.


This is really neat. I noticed that although it does have Phalanger for PHP, it's missing PeachPie: https://www.peachpie.io/.


Just nitpicking, but why wouldn't Ruby connect to C in this graph?


I thought the same thing...c via MRI als0 C++ via Rubinius.


Chalk it up to being a WIP. The author apparently uses this repo to pull this data in https://github.com/mohd-akram/compilers


I’d really love to be able to view cycles more easily in this graph


Where is php -> c# through peachpie? Pretty significant project


For Haskell -> JVM, there is Frege/Eta compiler. You could argue those are dialects of Haskell, but are pretty close. The differences will be around FFI because of jvm


Well with all three of Haskell, CIL and C# (through D) compiling to machine code, LLVM IR and Javascript it looks like the graph is no longer planar.


Where is Smalltalk ?


And Tcl (which can be compiled to machine code using TclQuadCode)


Probably hiding the same place as the BEAM languages. No Erlang, Elixir, etc.


and Forth and Erlang


Nice visualization, it can be improved though.

There are Java compilers directly to native code though, the Oracle JVM isn't the only path.


> There are Java compilers directly to native code though

Dude, if there was some kind of wiki documenting HN lore, there would have to be a page on you drilling this fact into our skulls for years on end! :)

https://hn.algolia.com/?query=java%20native%20pjmlp&sort=byP...


Well, if some devs would bother to learn how to use their tooling properly instead of keeping urban myths alive, that wouldn't be needed. :)


According to this graph one could actually get from Java to Objective-C to LLVM-IR to Machine Code...

Or with LLVM-IR to C in the last step you could get from Java to C...


One problem with this graph is that not all links actually support the full language. J2objc actually places constraints on your code so in some sense it's really compiling some (large) subsets of Java.


Sure, the point was about direct generation, without those intermediate steps.


It seems like it allows the nodes to be dragged to rearrange them. But yeah, it definitely needs a better graph.


Every road leads to GCC, or in this case, GCJ. ;)



GCJ is dead since 2009, but there are other compilers, most of them commercial though.


There's also Excelsior Jet


Wait.. wasm compiles to Java Bytecode?


Quite directly[0] :-) There's even an example using Rust regex on the JVM to get speedups in some cases [1].

0 - https://github.com/cretz/asmble 1 - https://github.com/cretz/asmble/tree/master/examples/rust-re...


Is hphpc still banging around somewhere inside HHVM? That would connect PHP -> C++.


Very cool. Looks like it is missing cython, but a great graphic none the less.


Is this a challenge to see if we can make this a fully connected graph?


Oooh, Python to Rust! It would be so thoroughly incomprehensible!


Clearly we need to make a machine code (de)compiler to JS.


Where is Pascal?


It's there. Albeit only the Free Pascal compiler


it's there, bottom left


g++ can turn c++ to assembly




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: