While I believe this is a good arena for algorithms to provide solution, I'm sceptical that ML in its current incarnations is the right tool. The problem is the variations in the problem space: generic programs, need for idiomatic code, renewed design and lack of human insight and ownership. Sure for an easy and specialized solution, this will work and may be helpful as a one-off. That is, if one just wants a straight port and test alot.
Programming languages, on the other hand, are domains that are entirely man-made. We have created the rules, they are known in its entirety (in contrast to knowing all objects over there in the shade below the tree), and they are optimized to be so precise that they can be followed mechanically by a comparatively simple machine. Sure there are a few "undefined" corners but that's not because we don't understand them but because they are deliberately left open. Even they are precisely defined. So, with two precise definitions of semantics for two programming languages, it should be possible to build a translation based on those rules. Inherently, this is not an "empirical" endeavour, it's a logical/mathematical well-defined one.
Using neural nets for this is using the wrong hammer for the job. It's like using them for deriving the implementation of a sorting algorithm or to find the first 100k digits of pi. It just does not make sense.
And I'm not talking about "idiomatic" translations. That's a less well-defined domain an maybe that would be more suitable, but the post explicitly excludes this kind of thing. In principle, without providing examples of idiomatic style to the learning machinery, this can't ever work.
Not sure what language you are referring to, because this statement is very vague, but it's worth pointing out that undefined behavior in the C/C++ sense is absolutely not precisely defined. If a program executes undefined behavior in either of these languages, the entire program execution is meaningless, including the time "before" undefined behavior occurred. The standard leaves room for the implementation to do absolutely anything before or after that point. In practice, what actually happens is the crazy complex interactions of a slightly broken abstraction that will depend on memory layout, optimizations, libraries, execution history, the phase of the moon, etc.
All that said, this isn't hard because undefined behavior. This is hard because languages are crazily complex and simply doing the semantically equivalent thing amounts to an emulation of one language in another, including all the implicit conversions, move constructors, copy constructors, overflow behavior, multiple inheritance, virtual dispatch, the whole mess. If you just do a full-fidelity source-to-source translation that amounts to emulation, you end up with an unreadable, gross mess.
I think OP means "it is precisely defined by the spec which actions result in undefined behaviour"
If you go through the paper - you have to check the evaluation section to see how they measured their success. They used some programs from GeeksforGeeks to evaluate their approach. Problems on GeeksforGeeks do not represent the vast majority of programming tasks encountered in daily life. This is very much in contrast to the overarching claims presented in the introduction of the paper.
Second issue with the evaluation: they use BLEU scores to judge how good their translations are. BLEU makes sense for natural language translations (even that is widely debated in the NLP community these days). For a program there is no concept of an almost correct program (based on how things look similar), it is either correct or not. Eg. if I am asked to write a program to add two numbers and I write `x - y`, I am not almost correct, I am completely wrong. And in some ways that is what their model does, it optimizes for BLEU scores.
Third, the correctness of the programs are tested based on 10 random inputs. Are 10 random inputs enough to cover the entire input space that can be accepted by a program?
It is indeed a great advance in the application of ML technology, but it is nowhere close to the broader claims. One can even debate, ROI on time spent in gathering and curating data and then checking the correctness of translation from such system vs the ROI on writing rules for a rule based system since all programming languages are easily expressible that way.
If they kept working on it I think they would run into an asymptote. Maybe they could get closer and closer to 90% accuracy on a task with real hardware, 92% boiling the oceans, and 93% with a Dyson sphere, 93.5% if you can harness a quasar. At that point it probably passes a whiteboard interview and the people who have to fix the bugs can console themselves that the last programmer had neither a brain nor a soul.
That system has an approximate, not an exact model of the domain it works on and that is why it has an asymptote. Turning a graph-structured program into a vector is like mapping the curved surface of the Earth onto a flat map -- except instead of it being a 3-dimensional space it is more like a 1000-dimensional space. Information is destroyed in that process and forever lost so there will always be important characteristics of the problem that it will never "get".
If the message recipient is a person they will meet you halfway and might even accept bullshit if it is presented with complete confidence and lack of shame. The computer will interpret exactly what you said and reveal that you're a dog. (e.g. mute animal)
Not sure though, I’m not an expert in LLVM IR, so I’m not sure how feasible that is...
IR looks to be targetting machine code, so not sure it'll be the right format to embed high-level language concepts.
And that conversion misses out important details like comments. Perhaps the ideal solution takes as input both the original code and the IR to generate output.
Most of the time, porting between two similar languages can be done by cargo-cult pattern matching, but there are some very tough corner cases for semantic mismatches between the languages: thread safety (std::vector vs. java.util.Vector), iterator invalidation differences, hashmap iteration order differences, etc., etc.
Most of the time, you can ignore these minor differences between languages, but sometimes you need whole-program, or at least whole-module analysis to know if the semantic differences matter. Heck, you could probably mostly get away porting from a lexically scoped language to a dynamically scoped language without putting any effort into fixing up scoping issues.
I hope our tools eventually get good enough to perform these sort of transformations, but at this point, it's far from being safe to trust automated tools for this sort of thing.
C# has an open source compiler (Rosyln) with an API that exposes lot of information, so for advanced needs there is this road too.
But casting any problem with a discrete correct answer into floating point calculations that mimic probability is a bad idea.
 : Learning to Compose Neural Networks for Question Answering
 : Inferring and Executing Programs for Visual Reasoning
Maybe take a look at the translations in the paper and come back to give a second opinion?
It's a fascinating idea. But now you need some way to check the result. Last time around, someone noted that some of the translations were plausible but wrong.
There's clearly some guessing. The translation of one function from Python to C++ resulted in untyped bounds becoming ints, which is OK. But it also resulted in a data array becoming ints, although the Python operations were generic over any type with arithmetic operators. The Python code might have been used for floats. It's not doing type inference by looking at the callers; it's just guessing.
Still, it's promising, even if it doesn't do the whole job. There's a C++ to Rust translator, but it turns array indexing into pointer arithmetic on special C++ compatible types. It's compiling into lower level operations. Deep learning has the potential to recognize and use idioms of the target language. Maybe.
But this needs a checker. Perhaps something that runs both language versions in lockstep on test data and checks for disagreement at key points.
- 2008: PHP is too slow, compile it to C++
- 2011: C++ is too slow, write a VM and compile to assesmbly
- 2012: PHP's type system sucks, write a better typer in OCaml (Hack)
This stuff is just an outgrowth of their deep deep belief in writing code to handle the crazy amount of code they already have.
Rewrites at that scale are really expensive and take forever, and by the time it's done they'll be running behind the facts already.
I like to think that Mark Zuckerberg read Joel's post and took it to heart.
But the pattern is super noticeable for FB, once you start paying attention.
Also as a programmer your mental modal is completely out of sync, how do you work on such application suddenly now that its all in Python (coming from C++)? Or is this shipped to a new team which received 100 000 lines of C++ but no clue how it all works?
How would FB use this at all?
There are entire companies already build around this.
It may be more cost effective to keep around some developers who know the old code base and the language it's written in.
Also, I don't fully understand the why, which is also not really addressed. What problem is automatically translating a mess of a code base (I am assuming it's a mess, otherwise, why would you want to port it) to another programming language, leaving the mess in place, solving? Isn't the idea of porting to another language usually to help improve the codebase and/or to help integrate it into some other system? Is the idea of automatic translators that it does the "heavy" lifting that is then gone in and cleaned up by experts of the new language? How do you manage the failures, since that would seem you'd still need an expert in the original language?
It reads as if they used a test suite to confirm wether the translation represents the same function.
You would need to write down the most general abstract syntax tree (AST) that encompasses all supported languages' features, as well as a way to read languages into this AST, and write the AST into languages.
The cool thing about pandoc is the ability to transform the AST between the read and write. In the context of a transpiler, that might mean that you transform the AST to be safer (e.g. adding type-checks in Python) before writing out.
That would be a very tedious tool to write, given that every language has its strange behaviors. But it might be a fun project if you limit the number of languages you support.
The only way this could work is if you write the whole stdlib in your AST system.
Of course, this still won't mean your lazy Haskell program works as an eager C program. Language semantics are too different, even between similar languages, you'll ALWAYS hit corner cases on anything more than a toy program. Even if you translate C# to Java. Well, maybe C to C++ will work decently, as few programs use the diverging features.
This stems from earlier work on natural language translation (by the same team) and opens quitte a number of doors.
This ML crap is inexplicable, and nearly guaranteed to introduce subtle bugs and performance regressions.
But yeah, I won't start any new projects in Python.
But languages that have side effects, such as imperative languages, would be very hard to generate. On the other hand, this is the first thing I’ve seen that approaches it.
I am very worried about the uncanny valley of producing stuff based on remixing other stuff with no understanding of the logic behind it.
Recognizing and classifying images and finding correlations is altogether different than programming. I can see how physics and science can be automated. But programming?
I wonder what the deep learning would do with concurrent code and differences in the memory models of the languages. The part is rather hard for experts in both languages and concurrency, there is very little similar code at all to learn from.
Take this rather simple example:
PointList interpolate(PointList sample_points, OrderedRealNumberList points_to_interpolate);
Where a documentation would actually be helpful are cases like this:
SUBROUTINE PCHFE (N, X, F, D, INCFD, SKIP, NE, XE, FE, IERR)
PCHFE is Piecewise Cubic-Hermite Function Evaluation of course  and the parameters aren't exactly self-explanatory either...
Each of those variables will be defined later on, in the code. at least with a type (not required, but it is not 1960 anymore). In that declaration is where some comments would be.
REAL*8 N ! Radius of the body in radians
(now if "implicit none" is not a requirement, then this all you get)
Love seeing some hard core numeric code. Precise and compact. No pointers, nothing sophisticated. Do loops, if statements, subroutine calls.
Admittedly, I haven't read a lot of Fortran code, but I have yet to see anybody who includes such comments. It wouldn't be so bad except also:
a) the only code I see in Fortran is numerics code, therefore written by mathematicians or other people who seem to believe that using more than one letter to describe a term is an admission of weakness
b) people write function names as if it costs $1000 per extra character
c) there often doesn't seem to be any introductory resources to the concepts that are being implemented that might let me discover what cryptic one-letter variable names might actually refer to
Here's the declaration of the parameters:
INTEGER N, INCFD, NE, IERR
REAL X(*), F(INCFD,*), D(INCFD,*), XE(*), FE(*)
> NE,XE, and FE "E" means "error"? Maybe.
Wrong guess - "E" means "Evaluation" as in "to be evaluated" in this case. It's also not clear which are inputs and outputs and primitive data types give no indication at all about constraints and use-case.
> IERR is an error flag. Idiomatic fortran there.
Now that's true, but idiomatic cases that are well documented and don't benefit from AI documentation. Simple pattern matching will do.
The point I was trying to make wasn't really about any specific programming language either. The point is rather that documentation requires translating implementation to intent and purpose.
If you have a system that's capable of translating a program into purpose, constraints, and usage example(s) expressed in plain natural language; you have created a system to end all programming languages, because the inverse transform would be possible as well...
nice to see someone else using fortran 77. or at least reading it.
Something like DeepL  but for code, where you can select one of several possible translations to translate a section of the code, rewrite part of the translation and have the algorithm take them into account for the rest of the code and indicate some prefered translations.
You cannot replace a human translator but you can certainly make him much faster and automate the trivial bits.
what problem does it solve exactly? im not seeing unsolved problems here... just a really heavyweight solution that is frail and prone to error
At compile-time only, people have also been using trained models to derive cost functions for sequences of instructions (as opposed to analytical models which become very difficult to derive these days given the complexity of modern CPU architectures)
It's hard to even figure out if the scheduling algorithms that work well on the Pentium and PentiumPro are worthwhile for the x86-64.