
Aroma: Using machine learning for code recommendation - moneil971
https://ai.facebook.com/blog/aroma-ml-for-code-recommendation/
======
markphip
If people are interested in this area and not already aware, there has been an
Eclipse project that operates in this general area for a while.
[https://www.eclipse.org/recommenders/](https://www.eclipse.org/recommenders/)

~~~
steve76
What would be nice is if could write recommendations automatically, then you
pick and choose which one you want.

You could scaffold a project, and it runs, and then you check back to see what
it recommends.

------
spott
I want something like this for math. Write an equation, or a definition and
see a bunch of different 'versions' of that snippet and where they are being
used.

This would help so much with understanding concepts and merging fields. It is
way too common for different fields to independently "discover" some concept
and be completely ignorant of all the work that has been done on that concept
by some other field.

~~~
ihaveajob
It reminds me of Haskell's type based function search:
[https://hoogle.haskell.org/](https://hoogle.haskell.org/)

------
userbinator
Interesting. In typical Facebook style, they do not attempt to fix the root
problem (there's too much code, and much of it has been copy-pasted) but
instead expend even more resources just to allow (or even encourage) it to
proliferate. The effort would be far better expended on a tool to refactor out
all that duplication, because they've created something that can clearly
identify duplication.

It reminds me of how they hit a limit in the Android VM because their code had
so many classes, and decided to work around it instead of reflecting (no pun
intended) on how they ended up with so much code in the first place:
[https://news.ycombinator.com/item?id=5321634](https://news.ycombinator.com/item?id=5321634)

------
ccozan
Maybe ontopic: I would like a voice controlled system that works with me. For
example: saying, "I need a loop over a list" and promptly I get served in my
text editor the loop. Or "I need to open a file and read contents". Or "Create
an object ThisAndThat, with three properties" ... etc. Of course ideally would
even ask for more details like, what kind of list is that, or how shall the
file be read.

How hard would be?

~~~
melling
Well, I’ve been discussing it on HN for years now. There really hasn’t been
much interest.

[https://news.ycombinator.com/item?id=8515987](https://news.ycombinator.com/item?id=8515987)

[https://news.ycombinator.com/item?id=11418285](https://news.ycombinator.com/item?id=11418285)

Mention coding by voice and someone will explain how they can’t imagine not
using a keyboard, or they bring up the open office problem.

Considering the limited scope, it’s probably just a matter of the proper
editor integration.

~~~
ccozan
I think my idea goes about a rapid prototyping, where i build the skeleton of
a program faster, no matter the boiler code, and then workout the details.

Intellisense or shortcuts do this up to a level, but the current big IDEs are
limited. Maybe some editor with the concept of VIM with a separate command and
edit mode would be more fit to work like that.

------
jboggan
The most interesting (and I think difficult) approach here is properly
representing the ASTs as vectors. There is a lot more possible when you get
this right.

~~~
yazr
This. ML is so vector-y. And code is so graph-y. Can you point out some SOTA
on bridging this ?

~~~
dunefox
I think you'll find a good amount of material if you search for 'deep graph
learning'.

------
mlthoughts2018
The primary use case I experience for searching for idiomatic usage patterns
is to know how to do a higher level refactoring, meaning I don’t want results
that have syntax tree similarity to what I’ve got or even the small bit I
start from to create the query. I want the _intention_ of my search query but
expressed in a better design.

Separately, for very micro-level idiomatic things, like use of a certain data
type operation or efficient constructor patterns, I need to search by natural
language descriptions of the subtle differences between options. This is what
makes Stack Overflow so helpful, the accompanying natural language description
of intentionality or special cases, even if the code that is found isn’t
precisely what’s needed, it demonstrates _directionally_ what to do.

This tool seems like yet another example of trying to force machine learning
solutions to problems nobody actually has.

Considering the idea that I’d need to integrate this into my coding
environment, I’ll say No Thanks!

~~~
xpaulbettsx
> This is what makes Stack Overflow so helpful, the accompanying natural
> language description of intentionality or special cases, even if the code
> that is found isn’t precisely what’s needed, it demonstrates directionally
> what to do.

You're entirely right, but if you're in an incredibly huge monorepo like
Facebook, this information literally doesn't exist; that's part of the problem
that Aroma is trying to solve - "how can we show people the Facebook App Way
To Do That Thing, even if That Thing doesn't have current documentation"

(Disclaimer: I worked on the coding environment UX for Aroma)

~~~
mlthoughts2018
Wouldn’t it make more sense to spend the effort annotating these things? Or
building models to provide the annotation? I mean, I work professionally in
embedding models for computer vision and NLP, and my reaction to the article
is that this seems like totally the wrong approach. You’re putting all this
effort to create the embedding model out of the part that is both most
superficial and least human interpretable (the AST).

~~~
franklsf
Building models for natural language _and_ code for either NL/intent-based
code search or automatically annotating code is indeed another hot research
area!

I'd argue Aroma solves a different problem in that it surfaces more idiomatic
patterns based on the code you already have. This also can be important
especially in production environment, when you need to do things "the right
way".

------
azhenley
If anyone wants to get a PhD in this topic, let me know :)

~~~
jdc
Your website is the first I've heard of "Information Foraging" as a field of
study. Absolutely fascinating. Any recommendations on where I might dive into
the topic?

~~~
azhenley
A good start that is easy to read would be:

An Information Foraging Theory Perspective on Tools for Debugging,
Refactoring, and Reuse Tasks
[https://dl.acm.org/citation.cfm?id=2430551](https://dl.acm.org/citation.cfm?id=2430551)

The paper applies IFT to software engineering, but IFT has also been applied
to navigating websites or even physical offices. Use Scholar.Google.com to
find a PDF of the paper if you don't have ACM access.

------
cyc115
It would be very helpful to help figure out how to do things in a large
codebase with little documentation.

------
bshipp
Was anyone able to find a link to Aroma in that document? I found the colours
made it very difficult to differentiate the links from the text and I couldn't
find it.

A quick search through Facebook's profile on Github turned up nothing.

~~~
franklsf
There's a paper describing the approach in detail
<[https://arxiv.org/abs/1812.01158>](https://arxiv.org/abs/1812.01158>), but
Aroma itself is not open source yet.

~~~
franklsf
Sorry, HN screwed up my URL:
[https://arxiv.org/abs/1812.01158](https://arxiv.org/abs/1812.01158)

~~~
bshipp
Great, thanks very much for finding that paper. Hopefully Facebook follows
this up in the not to distant future with at least a rudimentary example.

I can learn from a paper, but I learn much faster from an example!

------
konamicode
Seealso: IntelliCode for Visual Studio.

~~~
browsercoin2019
I also found for Code,
[https://visualstudio.microsoft.com/services/intellicode/](https://visualstudio.microsoft.com/services/intellicode/)

however when I try to install it on Mac OSX, I get

Couldn't find a compatible version of Visual Studio Intellicode - Preview with
this version of Code

------
occamschainsaw
Not directly about the article but I am annoyed that the FB AI blog is very
much a part of FB the social network. While reading the blog I got three
useless notifications (boxes in the bottom left corner). The whole page has no
other indication that I am logged into FB nor any option to log out.

------
z3t4
Is this project open source? I would like to experiment with something like
this to generate boilerplate code. Often when programming I copy something
that already do kinda what I want, then modify it until it does exactly like I
want.

------
pgib
Perhaps not as fancy, but I have been loving TabNine with Vim. (works with
most editors) Its suggestions are scary good some of the time.

[https://tabnine.com/](https://tabnine.com/)

------
orliesaurus
I wonder if anyone from Kite.com is reading this and if they have any
comments?

------
usefulcat
Seems like this could be a useful feature for Github, BitBucket, etc.

------
muratsu
From what I read, it's doing search & clustering on AST based feature vectors.
I'm a bit lost on the learning part, how does the system improve over time?

~~~
seanmcdirmid
I’m guessing by learning the vectors over lots of ASTs?

------
bitL
That's a pretty cool approach! Thanks for sharing! ;-)

------
sabujp
i expect to see this in intellij and cider
[https://code.google.com/archive/p/cider-
ide/](https://code.google.com/archive/p/cider-ide/) soon

------
Dragna
What is the IDE they are using ?

~~~
SethKinast
That's Atom with Nuclide.

------
saagarjha
One danger I can see coming up is that if someone writes incorrect code it
could end up propagating throughout other codebases. I guess this is still an
issue without automatic tools, but I feel like this might make it easier…

~~~
franklsf
Aroma would only surface what it thinks is "idiomatic" coding patterns. So if
you have many instances of incorrect code, you might already be in trouble :)

~~~
taneq
Given how many "everybody does X wrong" articles we see here, I think we're
already in trouble. :P

(Thinking specifically of the "binary search in the Java API was broken for X
decades" one with the integer overflow.)

