Hacker News new | past | comments | ask | show | jobs | submit login
Deep learning to translate between programming languages (facebook.com)
164 points by homarp 3 months ago | hide | past | favorite | 76 comments



This sounds more impressive than it is. I kept an auto-translation going for quite a while from C# to Go just by creating some bash scripts. These languages are so similar, that while not idiomatic, you can be up and running with quite few rules and some simplifications to the original code.

While I believe this is a good arena for algorithms to provide solution, I'm sceptical that ML in its current incarnations is the right tool. The problem is the variations in the problem space: generic programs, need for idiomatic code, renewed design and lack of human insight and ownership. Sure for an easy and specialized solution, this will work and may be helpful as a one-off. That is, if one just wants a straight port and test alot.


And it's also a quite nonsensical use of ML technology, to be honest. Neural nets are good for problem domains with fuzzy definitions of right and wrong as you find them in the physical world out there. Is this a tiger or a rock? Is this food or poison? Should I walk around this pond or swim through it? That's what the human brain is good at handling and neural nets are trying to approximate that.

Programming languages, on the other hand, are domains that are entirely man-made. We have created the rules, they are known in its entirety (in contrast to knowing all objects over there in the shade below the tree), and they are optimized to be so precise that they can be followed mechanically by a comparatively simple machine. Sure there are a few "undefined" corners but that's not because we don't understand them but because they are deliberately left open. Even they are precisely defined. So, with two precise definitions of semantics for two programming languages, it should be possible to build a translation based on those rules. Inherently, this is not an "empirical" endeavour, it's a logical/mathematical well-defined one.

Using neural nets for this is using the wrong hammer for the job. It's like using them for deriving the implementation of a sorting algorithm or to find the first 100k digits of pi. It just does not make sense.

And I'm not talking about "idiomatic" translations. That's a less well-defined domain an maybe that would be more suitable, but the post explicitly excludes this kind of thing. In principle, without providing examples of idiomatic style to the learning machinery, this can't ever work.


> "undefined" corners but that's not because we don't understand them but because they are deliberately left open. Even they are precisely defined.

Not sure what language you are referring to, because this statement is very vague, but it's worth pointing out that undefined behavior in the C/C++ sense is absolutely not precisely defined. If a program executes undefined behavior in either of these languages, the entire program execution is meaningless, including the time "before" undefined behavior occurred. The standard leaves room for the implementation to do absolutely anything before or after that point. In practice, what actually happens is the crazy complex interactions of a slightly broken abstraction that will depend on memory layout, optimizations, libraries, execution history, the phase of the moon, etc.

All that said, this isn't hard because undefined behavior. This is hard because languages are crazily complex and simply doing the semantically equivalent thing amounts to an emulation of one language in another, including all the implicit conversions, move constructors, copy constructors, overflow behavior, multiple inheritance, virtual dispatch, the whole mess. If you just do a full-fidelity source-to-source translation that amounts to emulation, you end up with an unreadable, gross mess.


> it's worth pointing out that undefined behavior in the C/C++ sense is absolutely not precisely defined.

I think OP means "it is precisely defined by the spec which actions result in undefined behaviour"


Yes. Thank you.


Reminds me of that time I exported my MATLAB code as C.


So why does it work better than the commercially available tools at the moment as claimed in the article?


This article is a classic example of Facebook PR doing a wonderful job of selling the research and linked paper [1] claiming too much in the introduction. Please, please talk to actual researchers before you buy such claims.

If you go through the paper - you have to check the evaluation section to see how they measured their success. They used some programs from GeeksforGeeks to evaluate their approach. Problems on GeeksforGeeks do not represent the vast majority of programming tasks encountered in daily life. This is very much in contrast to the overarching claims presented in the introduction of the paper.

Second issue with the evaluation: they use BLEU scores to judge how good their translations are. BLEU makes sense for natural language translations (even that is widely debated in the NLP community these days). For a program there is no concept of an almost correct program (based on how things look similar), it is either correct or not. Eg. if I am asked to write a program to add two numbers and I write `x - y`, I am not almost correct, I am completely wrong. And in some ways that is what their model does, it optimizes for BLEU scores.

Third, the correctness of the programs are tested based on 10 random inputs. Are 10 random inputs enough to cover the entire input space that can be accepted by a program?

It is indeed a great advance in the application of ML technology, but it is nowhere close to the broader claims. One can even debate, ROI on time spent in gathering and curating data and then checking the correctness of translation from such system vs the ROI on writing rules for a rule based system since all programming languages are easily expressible that way.

[1]: https://arxiv.org/pdf/2006.03511.pdf


Because the commercially available tools aren't that good and because they kept working on it until it was better.

If they kept working on it I think they would run into an asymptote. Maybe they could get closer and closer to 90% accuracy on a task with real hardware, 92% boiling the oceans, and 93% with a Dyson sphere, 93.5% if you can harness a quasar. At that point it probably passes a whiteboard interview and the people who have to fix the bugs can console themselves that the last programmer had neither a brain nor a soul.

That system has an approximate, not an exact model of the domain it works on and that is why it has an asymptote. Turning a graph-structured program into a vector is like mapping the curved surface of the Earth onto a flat map -- except instead of it being a 3-dimensional space it is more like a 1000-dimensional space. Information is destroyed in that process and forever lost so there will always be important characteristics of the problem that it will never "get".

If the message recipient is a person they will meet you halfway and might even accept bullshit if it is presented with complete confidence and lack of shame. The computer will interpret exactly what you said and reveal that you're a dog. (e.g. mute animal)


Yeah I was thinking it might be better to use something like LLVM IR as the “data structure” and then train deep nets to go $(LANG) => IR and then IR => $(LANG), trained with a lot of idiomatic well written code for each Lang, then the nets might be able to glean the intent more than implementation details.

Not sure though, I’m not an expert in LLVM IR, so I’m not sure how feasible that is...


Good call on LLVM. Was thinking about that too, but had to look up IR. This one looked interesting:

https://idea.popcount.org/2013-07-24-ir-is-better-than-assem...

IR looks to be targetting machine code, so not sure it'll be the right format to embed high-level language concepts.


The issue is that LANG -> IR is a lossy transform: some constructs are lost, some language specific optimization passes could be applied, etc. Going from LANG1 -> LANG2 starts with more information, so it could in theory work better.


It's lossy, the inverse is not well defined. But you could try to train a NN on reconstructing plausible (idiomatic) inputs leading to the current IR...


It would be interesting to see what neural translation systems (in general, not just for code) use as an implicit internal IR.


$(LANG) => IR can be done by LLVM itself...

And that conversion misses out important details like comments. Perhaps the ideal solution takes as input both the original code and the IR to generate output.


From my limited machine learning experience, it seems machine learning is great at pattern-matching/cargo-culting itself to 95% solutions.

Most of the time, porting between two similar languages can be done by cargo-cult pattern matching, but there are some very tough corner cases for semantic mismatches between the languages: thread safety (std::vector vs. java.util.Vector), iterator invalidation differences, hashmap iteration order differences, etc., etc.

Most of the time, you can ignore these minor differences between languages, but sometimes you need whole-program, or at least whole-module analysis to know if the semantic differences matter. Heck, you could probably mostly get away porting from a lexically scoped language to a dynamically scoped language without putting any effort into fixing up scoping issues.

I hope our tools eventually get good enough to perform these sort of transformations, but at this point, it's far from being safe to trust automated tools for this sort of thing.


How do you handle generics? Go doesn’t have them, right?

C# has an open source compiler (Rosyln) with an API that exposes lot of information, so for advanced needs there is this road too.


It was just a bunch of sed. It wouldn't handle generics in a general manner, so was just a script to fill a specific need in the span of a period. What was "amazing" is how much of C# didn't need any translation into Go code, for core business logic. It was up and running within say 20-30 minutes, though needed a tweak now and then (sort of like modern-day "AI" ;).


I have seen some ML methods that learn explicit structural mappings [1][2]. I think ML is great when used as such. Even RL can work, because it operates over discrete spaces.

But casting any problem with a discrete correct answer into floating point calculations that mimic probability is a bad idea.

[1] : Learning to Compose Neural Networks for Question Answering

[2] : Inferring and Executing Programs for Visual Reasoning


An ML method is able to outperform hard-coded translators by large margins, supporting a wide mix of languages like Python → C++ and translation between varied native APIs, but it's not impressive impressive because you can do the same thing with `sed`?

Maybe take a look at the translations in the paper and come back to give a second opinion?


Wasn't this on HN a few weeks ago? Or was that a similar project?

It's a fascinating idea. But now you need some way to check the result. Last time around, someone noted that some of the translations were plausible but wrong.

There's clearly some guessing. The translation of one function from Python to C++ resulted in untyped bounds becoming ints, which is OK. But it also resulted in a data array becoming ints, although the Python operations were generic over any type with arithmetic operators. The Python code might have been used for floats. It's not doing type inference by looking at the callers; it's just guessing.

Still, it's promising, even if it doesn't do the whole job. There's a C++ to Rust translator, but it turns array indexing into pointer arithmetic on special C++ compatible types. It's compiling into lower level operations. Deep learning has the potential to recognize and use idioms of the target language. Maybe.

But this needs a checker. Perhaps something that runs both language versions in lockstep on test data and checks for disagreement at key points.


I can't imagine this working out for producing idiomatic code, especially between very divergent languages like Haskell and C++. Their example is almost straight syntactic replacement, not very impressive. And their real-world use-case is for migrating COBOL projects to python??


Facebook as a company do not believe in ever rewriting large code bases. You can view lots of their open source/technical achievements as out growths of this belief: - 2004, write PHP as it's fastest for now

- 2008: PHP is too slow, compile it to C++

- 2011: C++ is too slow, write a VM and compile to assesmbly

- 2012: PHP's type system sucks, write a better typer in OCaml (Hack)

- They have done much the same for Javascript and Python (look at all the typing tools provided by Instagram).

This stuff is just an outgrowth of their deep deep belief in writing code to handle the crazy amount of code they already have.


I think they chose the right strategy, compared to twitter which loves rewritting to another language.


How well does that really work out? It seems like that would aggregate a lot of cruft, or are there ways around that?


I think the theory is a small subset of engineers can do work that speeds up the entire codebase for the rest of the engineers instead of spending time rewriting that entire code base and also retraining all of those engineers.


I think this is just it; at some point, code becomes a numbers game, an investment game. Rewriting their PHP codebase to, say, C++ would cost 10000 work-years, whereas creating HVVM cost 100 (random numbers) at a 1000% improvement in performance.

Rewrites at that scale are really expensive and take forever, and by the time it's done they'll be running behind the facts already.


Yeah, exactly.

I like to think that Mark Zuckerberg read Joel's post and took it to heart.

But the pattern is super noticeable for FB, once you start paying attention.


To be fair, there's a great use case for migrating COBOL to python. But I don't think banks would trust an automatic conversion done using ML...


Cool magic trick but I fail to see anything useful coming out of it. I would not blindly rewrite code, rather see how it can be improved and maybe dig a bit in the existing functionality, clean it out, use the new language in my advantage...

Also as a programmer your mental modal is completely out of sync, how do you work on such application suddenly now that its all in Python (coming from C++)? Or is this shipped to a new team which received 100 000 lines of C++ but no clue how it all works?

How would FB use this at all?


You would, but FB and some other companies are dealing with 20 years and millions of lines of code written by a small army of developers. If they decide to switch language and rebuild their services, just doing it manually will in all likelihood take longer and more people if they don't apply tricks like automatic conversion, because it takes more effort to rebuild than to build fresh due to multiple reasons.


But the options aren’t “manual” and “ai”. There is a third option of using classical static analysis and compiler techniques that have been around for decades to do transpilation and large scale refactors.

There are entire companies already build around this.


There's also a fourth option: do nothing. Leave the existing services in the languages that they're written in rather than translating them into other languages and laboriously debugging the resulting code. If these services communicate through well-defined APIs, any new services, which can be built in new languages, can communicate with the legacy services.

It may be more cost effective to keep around some developers who know the old code base and the language it's written in.


As far as I can tell, this article gives no definition of what they mean by "successfully translate" in their testing. I would think it should be equivalent to the typical use of porting, which usually means you move the codebase from one language to another, reproduce all existing functionality, introduce no new bugs, and maintain relative performance. I am not seeing anything in this article other than their translator translates functions to functions. It also doesn't mention what failure looks like. Does it mean bugs are introduced? Does it mean the translator doesn't return? How are the percentages arrived at?

Also, I don't fully understand the why, which is also not really addressed. What problem is automatically translating a mess of a code base (I am assuming it's a mess, otherwise, why would you want to port it) to another programming language, leaving the mess in place, solving? Isn't the idea of porting to another language usually to help improve the codebase and/or to help integrate it into some other system? Is the idea of automatic translators that it does the "heavy" lifting that is then gone in and cleaned up by experts of the new language? How do you manage the failures, since that would seem you'd still need an expert in the original language?


> To better measure the performance of TransCoder and other code translation techniques, we’ve created a new metric called computational accuracy, which evaluates whether the hypothesis function generates the same outputs as the reference when given the same inputs. We are also releasing our test set and the scripts and unit tests we used to compute this metric.

It reads as if they used a test suite to confirm wether the translation represents the same function.


Could be helpful to integrate existing libraries with languages/runtimes lacking performant ffi.


I've been thinking of a pandoc-like tool to translate between languages, i.e. a transpiler.

You would need to write down the most general abstract syntax tree (AST) that encompasses all supported languages' features, as well as a way to read languages into this AST, and write the AST into languages.

The cool thing about pandoc is the ability to transform the AST between the read and write. In the context of a transpiler, that might mean that you transform the AST to be safer (e.g. adding type-checks in Python) before writing out.

That would be a very tedious tool to write, given that every language has its strange behaviors. But it might be a fun project if you limit the number of languages you support.


For the read part I think this something what GraalVM already does https://www.graalvm.org/docs/reference-manual/polyglot/


You will need to create common library for all these languages, then write all your programs using your own libraries only.


I don't get why? The AST can be parsed by any single language. Haskell is particularly well-suited for writing parsers.


I believe the point is that you can have perfect syntactic compatibility, but that means nothing when your program calls boost::asio or a Common Lisp defmethod and you want to interpret it as Haskell.

The only way this could work is if you write the whole stdlib in your AST system.

Of course, this still won't mean your lazy Haskell program works as an eager C program. Language semantics are too different, even between similar languages, you'll ALWAYS hit corner cases on anything more than a toy program. Even if you translate C# to Java. Well, maybe C to C++ will work decently, as few programs use the diverging features.


I see, thank you.


Something like LLVM and decompiling


I think that while the current technology could do such a thing, the current data absolutely cannot. Think of the parallel with language translations. We have tons of books that are all painstakingly translated from language to another by talented translators who even try to persevere idioms of each language, television and other media as well. I don't think there are even any copies out there of a faithful language translation in programming space as there is in language space


Note that this algorithm does not require translations as input. It takes large corpus of the languages of interest, models them and then find parallel between both models which are used as a crude basis for a model. They use it to train a better model and iterate.

This stems from earlier work on natural language translation (by the same team) and opens quitte a number of doors.


I see the potential to speed up code a lot. Say that you could translate Python to C++ or Nim that would speed up the program a lot. You could profile and translate select hot often used code paths. This has the potential too save a lot of energy by using more efficient programs.


Cython already translates Python to C. And if the translation introduces a bug, the Cython developers can find a solution.

This ML crap is inexplicable, and nearly guaranteed to introduce subtle bugs and performance regressions.


That seems like a lot of effort when Julia is both high-level and efficient already.


Ok but I have 3 million lines of Python code, so that doesn't help me. I'm not gonna spend a year or two writing everything in Julia.


Good thing you don't have to: https://github.com/JuliaPy/PyCall.jl


I hear that you can use Python from Julia and vice versa. You could technically get started immediately and strangle the python away.


Let's rewrite everything in Julia.


There's no need to rewrite anything: https://github.com/JuliaPy/PyCall.jl

But yeah, I won't start any new projects in Python.


I see deep learning doing really well for one shot translation of declarative languages like HTML etc. As well as simple grammars like English etc. You could maybe generate HTML and CSS based on English.

But languages that have side effects, such as imperative languages, would be very hard to generate. On the other hand, this is the first thing I’ve seen that approaches it.

I am very worried about the uncanny valley of producing stuff based on remixing other stuff with no understanding of the logic behind it.

Recognizing and classifying images and finding correlations is altogether different than programming. I can see how physics and science can be automated. But programming?


Even if it works, which I doubt, the maintenance question is more important than just porting. Porting is rarely done just because there is a new modern language. Writing a compiler/transcoder is likely easier and safer, guaranteed to work.

I wonder what the deep learning would do with concurrent code and differences in the memory models of the languages. The part is rather hard for experts in both languages and concurrency, there is very little similar code at all to learn from.


It seems like you could maybe go through machine code as an intermediary?


But wouldn't that just be decompiling with extra steps?


That would only help if you are willing to translate all of the libraries the code uses. Otherwise, transforming a call to C's `qsort` to Java's `Arrays.Sort` is unlikely to have the desired effect.


Cool idea...I would be more interested in auto-documentation tool based on this.


That'd be truly remarkable as in actual magic.

Take this rather simple example:

  PointList interpolate(PointList sample_points, OrderedRealNumberList points_to_interpolate);
This function can be auto-documented from its name and the names and types of its arguments alone. No ML required - simple pattern matching and LUTs will do. But then again, do you really need a detailed documentation in such case?

Where a documentation would actually be helpful are cases like this:

  SUBROUTINE PCHFE (N, X, F, D, INCFD, SKIP, NE, XE, FE, IERR)
A format commonly found in the FORTRAN code of numeric libraries. If the author(s) didn't document this, an AI wouldn't stand a chance to know what it does.

PCHFE is Piecewise Cubic-Hermite Function Evaluation of course [1] and the parameters aren't exactly self-explanatory either...

[1] http://www.netlib.org/slatec/pchip/pchfe.f


Guessing !! N,X, and F are probably related to NE,XE, and FE "E" means "error"? Maybe. IERR is an error flag. Idiomatic fortran there.

Each of those variables will be defined later on, in the code. at least with a type (not required, but it is not 1960 anymore). In that declaration is where some comments would be.

          REAL*8 N ! Radius of the body in radians
Point being that fortran is not hopelessly opaque. A subroutine declaration is backed up with some more information.

(now if "implicit none" is not a requirement, then this all you get)

Love seeing some hard core numeric code. Precise and compact. No pointers, nothing sophisticated. Do loops, if statements, subroutine calls.


> In that declaration is where some comments would be.

Admittedly, I haven't read a lot of Fortran code, but I have yet to see anybody who includes such comments. It wouldn't be so bad except also:

a) the only code I see in Fortran is numerics code, therefore written by mathematicians or other people who seem to believe that using more than one letter to describe a term is an admission of weakness

b) people write function names as if it costs $1000 per extra character

c) there often doesn't seem to be any introductory resources to the concepts that are being implemented that might let me discover what cryptic one-letter variable names might actually refer to


> In that declaration is where some comments would be.

Here's the declaration of the parameters:

  INTEGER  N, INCFD, NE, IERR
  REAL  X(*), F(INCFD,*), D(INCFD,*), XE(*), FE(*)
  LOGICAL  SKIP
Sorry, but no comments to be found :)

> NE,XE, and FE "E" means "error"? Maybe.

Wrong guess - "E" means "Evaluation" as in "to be evaluated" in this case. It's also not clear which are inputs and outputs and primitive data types give no indication at all about constraints and use-case.

> IERR is an error flag. Idiomatic fortran there.

Now that's true, but idiomatic cases that are well documented and don't benefit from AI documentation. Simple pattern matching will do.

The point I was trying to make wasn't really about any specific programming language either. The point is rather that documentation requires translating implementation to intent and purpose.

If you have a system that's capable of translating a program into purpose, constraints, and usage example(s) expressed in plain natural language; you have created a system to end all programming languages, because the inverse transform would be possible as well...


i agree with you!

nice to see someone else using fortran 77. or at least reading it.


I see this as a great way to speed up manual code translation from a language to another.

Something like DeepL [1] but for code, where you can select one of several possible translations to translate a section of the code, rewrite part of the translation and have the algorithm take them into account for the rest of the code and indicate some prefered translations.

You cannot replace a human translator but you can certainly make him much faster and automate the trivial bits.

[1]: https://www.deepl.com/translator


Code is now public, with pretrained models : https://github.com/facebookresearch/TransCoder/


The linked paper was previously discussed at https://news.ycombinator.com/item?id=23463812


Without ML logic - Trace the control flow and generate language specific string.


golden sledgehammer.

what problem does it solve exactly? im not seeing unsolved problems here... just a really heavyweight solution that is frail and prone to error


I’d be impressed if it could translate, say, React to Qt. Translating what amounts to leetcode problems is much less interesting


To imagine future compilers may just one day be AI powered is mind boggling to me.


I've heard that LLVM uses neural networks to drive the register allocator. I don't know how well it works, but it's a pretty cool idea.


I listened to a talk about this at pldi, wrt the auto-vectorizer. Given a piece of of sequential code, there are many ways to auto-vectorize it, and finding the fastest one is computationally complex. The current auto-vectorizer uses a faster algorithm that won't always generate the fastest possible vectorization. When they threw a neural network at it, they found it sometimes generated faster code than the slow 'optimal' algorithm, because the neural net was able to take into account factors the humans hadn't thought to in their model.


In the PGO category, there's also this recent proposal: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140763.h...

At compile-time only, people have also been using trained models to derive cost functions for sequences of instructions (as opposed to analytical models which become very difficult to derive these days given the complexity of modern CPU architectures)


This is interesting, as I have essentially given up on attempting to figure out which sequences are faster. Instead, I go for the approximations that fewer instructions equals faster, and registers are faster.

It's hard to even figure out if the scheduling algorithms that work well on the Pentium and PentiumPro are worthwhile for the x86-64.


Sounds pretty risky to me...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: