
Autocomplete from Stack Overflow - monort
https://emilschutte.com/stackoverflow-autocomplete/
======
hbt
It still astounds me that we haven't solved reusable modules in 2016.

Sure, we have libraries, apis, package managers etc. but every time I read a
code base, there is always a utility function reinventing the wheel.

Someone wrote it because it is still difficult to discover modular code and
reuse it easily.

It's pretty nuts when you think about it. Imagine mechanical engineers having
to recreate the same CAD file because they can't find a component with the
same functionality (which does happen but for other reasons e.g intellectual
property).

But we don't have the same intellectual property challenges and yet the best
tools we have to discover and reuse code are 90s search engines that treat
code as raw text with zero contextual understanding and often outdated code
snippets.

PS: I love your project btw.

~~~
zingermc
Haskell has Hoogle[1], which allows you to search for functions using a type
signature. This is surprisingly effective.

Let's say you want that `contains` function from the original post. You'd
search for `(Eq a) => a -> [a] -> Bool`, which describes a function that takes
two parameters and returns a boolean. The first result[2] is the `elem`
function, which is exactly what we wanted!

This is a bit of a contrived example, but I have honestly been surprised by
how effective searching by type signature is in Haskell. I wonder if it is
possible for a language with a weaker type system, like JavaScript.

[1] [https://www.haskell.org/hoogle/](https://www.haskell.org/hoogle/)

[2]
[https://www.haskell.org/hoogle/?hoogle=%28Eq+a%29+%3D%3E+a+-...](https://www.haskell.org/hoogle/?hoogle=%28Eq+a%29+%3D%3E+a+-%3E+%5Ba%5D+-%3E+Bool)

~~~
Fargren
Does the order of the parameters matter? It's been a while since I've touched
Haskell, so maybe it's obvious that this isn't [a] -> a -> Bool for some
idiomatic reason. I guess that would be `isContained` instead of `contains`,
so it probably wouldn't be the first thing I search for, but there's at least
some potential for ambiguity.

~~~
cormacrelf
It's almost a convention in Haskell. The idea is, since currying functions is
easy and common, you stick the argument that you'd want the least in a curried
version last. So if you wanted to find out if x was in ten lists, you'd just:

    
    
      map (elem x) [list1, list2, ..., list10]
    

or, for folds, the function is the one least likely to change, so you put it
first, and the list is most likely to change, so you put it last:

    
    
      foldl (+) 0 [1,2,3]
      map (foldl (-) 100) [[1,2],[2,3],[4,19]]
    

Of course, it's always debatable if you can find some situation where you
wanted to curry in a different order, but generally you pick in order to
reduce forced named lambda parameters. As far as I can tell, that's how it's
done.

Edit: if you saw the sneaky edit, you'll know that this particular convention
isn't easy to follow!

~~~
eru
There are a bunch of rules of thumbs that help you decide faster. Like `needle
before haystack'.

------
sharmi
Stackoverflow is an invaluable resource when you are a newbie or learning a
new stack.

That said, the real value of stackoverflow is not only in the code but the
surrounding explanation and the following comments which discuss the merits
and demerits of each solution. So you get to read multiple solutions to the
same problem and realize why the top answers standout from the others. In a
way, you learn to smell good code and bad :)

So searching on StackOverflow can be a learning experience!

But these kind of developments might be the next step and who are we say?

One problem I foresee is, the highly rated solution could be for a specific
version of the stack/software ( the latest stable or the obselete,
unmaintained one). In which case, you might end up with an error amplifying
the mess.

------
ajmurmann
I had a conversation with a coworker a few days ago where I jokingly suggested
building an AI system that you give a failing test suite and it uses
stackoverflow answers to make your tests pass. Sounds like we are one step
closer to making that happen.

~~~
Matetricks
Have you heart of StackSort?

[https://gkoberger.github.io/stacksort/](https://gkoberger.github.io/stacksort/)

~~~
thedufer
Original credit goes to the hovertext of
[https://xkcd.com/1185/](https://xkcd.com/1185/), I believe.

~~~
gkoberger
Yup, I stole the idea from there :)

------
supergreg
Rather than indexing StackOverflow, why not do this by indexing all the open
source code out there? Sure StackOverflow answers are good, but they usually
skip error checking for focusing on the question at hand. Code used in
production applications surely is more sound.

~~~
jonnycowboy
A good start would be for Github to host a package that contains the latest
versions of each repo. However even they may not have that right (depends on
TOS).

~~~
ThinkCritically
What they really need is a way to rank github code by quality - so that a tool
like this pulls in the good code (as opposed to code of lesser quality)

~~~
drumdance
Perhaps instead of using StackOverflow "correct" answers, someone could create
a site that has example of anti-patterns and code smells, which you could then
use to analyze the code on Github and link to possible alternatives.

~~~
matt4077
That's basically what linters do.

------
creed
Great idea, nice thinking!

I guess we as programmers should seriously think about the future of our
craft.

If, for example, we feel we are basically using the same building blocks over
and over again, we should seriously think about organizing libraries and code
snippets and questions asking for such snippets in an organized way and
provide ways to transpile 1 solution in different languages etc..

We should not accept the current state of our craft as final and rather think
about how to improve in general.

If for instance something like StackOverflow has become the Wikipedia of Code
then let's think hard about how to make into a full blown tool, with all the
features and semantics we need. It was a nice project the way it started and
grew but it doesn't have to stay like that forever!

------
ryannevius
This isn't working at all for me...Either I don't get what it's supposed to
do, or the common functions I'm typing aren't common enough. On a related
note, 50 StackOverflow points seems like a high number for something like
this, and may reduce the results significantly enough that the example doesn't
work for most common problems.

~~~
swiley
I only have 50 stack overflow points /total/ (although I only very rarely
post) so I would absolutely agree that it's way too high.

~~~
sleepychu
Accepted answers are also included.

~~~
pc86
My understanding is that it's only using code from accepted answers with 50+
points tagged JS.

So an accepted JS answer with 49 points would not be included, nor would a JS
answer with 500 points that was not marked accepted.

~~~
sleepychu
Oh, you're right. That's crazy!

------
jnardiello
If it's a joke "Ahah", if not it's just depressing. SO is a valuable resource
to confront and understand concepts. Let's not encourage this copy/paste
culture

~~~
bikamonki
The OS/SDK/Browser/Protocol/Firmware which you used to type this comment is a
compiled copy/paste of a zillion lines of code, of which you probably typed
none.

If we encourage the DRY paradigm in the whole development cycle, copy/pasting
a function is just a further reach of such paradigm. Now, if a programmer
decides whether or not study and understand the pasted code has nothing to do
with the quality of the final product.

Furthermore, IDEs by design encourage copy/pasting to speed development, most
offer functionality to store code snippets. In that sense SO is like a web
extension of IDEs.

~~~
jnardiello
> Now, if a programmer decides whether or not study and understand the pasted
> code has nothing to do with the quality of the final product.

I deeply disagree with this and for such obvious reasons.

~~~
bikamonki
Why? A widely accepted answer(many points) is most likely correct/best
practice/bug-free.

~~~
pavel_lishin
It may be bug-free, but it may not be the correct solution for the specific
problem the programmer was trying to solve. It may also not do what is
expected under different circumstances.

It may be a hammer when you're looking for a screwdriver.

------
MrPatan
Boss never wants a function, boss wants a solution to a problem, and your job
is to know how to express that in functions. If you have the signature, 99% of
your job is done.

I know you know, I know they know, and I know you know they know. I just
wanted to say it, ok?

------
nateabele
Haha, I love this. We're getting closer and closer to the ultimate conclusion
of our industry: where we'll just type in a few keywords and generate full
applications from StackOverflow code examples!

~~~
pavs

      Siri, make me a facebook clone.

~~~
Uehreka
...12 hours of involuntary surgery later:

    
    
        There, now you're a facebook clone.

~~~
mikeash
This is close to a real pun it would do, and nearly just as dire. It's fixed
in recent versions, but it used to not understand emergencies. If you said
"Siri, call me an ambulance," it would come back with "OK, I'll call you 'an
ambulance' from now on."

~~~
AnkhMorporkian
That's both tragic and hilarious.

------
yitchelle
If we put this in front of 1 million monkeys, would it generate the next
killer app?

Anyway, this gave me a little laughter today.

~~~
eschutte2
Glad to hear it. That was my hope :)

------
braythwayt
I have a good feeling about this:

[https://vimeo.com/76141334](https://vimeo.com/76141334)

This is exactly the direction we should be going. We build the world's most
sophisticated engines for predicting human social behaviour, why are we stuck
in the 1990s when it comes to autocomplete for a tightly scoped domain like
writing software?

Bravo!

------
bprosnitz
There was some research from Stanford a few years ago with a similar idea
[http://hci.stanford.edu/publications/2010/blueprint/brandt_c...](http://hci.stanford.edu/publications/2010/blueprint/brandt_chi10_blueprint.pdf)

------
byteface
i did something similar one night a few months back, using python as a sublime
plugin, but scrapes the live site. it eventually gets blocked tho as doesn't
use the official API.

[https://github.com/byteface/chode](https://github.com/byteface/chode)

I considered hooking into a variety of other resources but haven't really
bothered with it as have other things going on.

------
robinduckett
Seems to work very well for that one example, but I can't get it to work for
anything else.

~~~
ianstormtaylor
It's not obvious, but because of the way it's programmed, you have to remove
the example function, since it parses the entire code block for similarities.

------
WithTeeth
Cool idea, but kind of ironic that the example you used produces unnecessarily
complicated stack overflow solutions. A JavaScript "contains" function is as
simple as:

var contains = function (needle, haystack) { return haystack.indexOf(needle)
!== -1 }

~~~
ne0phyte
Older IEs don't have Array.indexOf(). So you either have to extend the Array
prototype or, you know, just implement it in a more compatible way.

~~~
CiPHPerCoder
Or just not support older IEs.

[http://browserupdate.org/](http://browserupdate.org/)

------
sabujp
Someone please make this for eclipse, vim, intellij, emacs, sublime, etc

------
falcolas
A great way to increase your liability in an automated fashion!

Sure, the code on Stack Overflow is licensed as MIT... but what assurance is
there that the code which was posted is original property that the poster owns
copyright to? What assurance is there that the posters won't claim patents on
the methods being used?

The risk is low, but it's certainly not 0. Big companies caution their
software developers to not even read code in SO answers to avoid lawsuits...
how much risk do you bring to your company by copying and pasting (not to
mention autocompleting) from SO?

I am not a lawyer, etc. etc.

~~~
matt4077
It's fortunately not _that_ easy to get a software patent. It'd be quite
difficult to patent something that's the length of an average stackoverflow
answer. There's also a lower limit on the length (and originality) for
copyright which I doubt many answers reach.

~~~
falcolas
I can fit the concept of mp3 decoding and a sample decoder in a few dozen
lines of text and code. This is a patented technology (for another year or
two, at least).

~~~
odbol_
I severely doubt that. Would love to see it though!

~~~
eru
falcolas might not be able to fit a full mp3 encoder and decoder in the couple
of lines, but falcolas can probably come up with enough in these lines to
violate the mp3 patents.

------
nottednelson
Similarly: 2014 Springer volume
([http://www.springer.com/us/book/9783642451348](http://www.springer.com/us/book/9783642451348))
and conference series that spawned it
([https://sites.google.com/site/rsseresearch](https://sites.google.com/site/rsseresearch)).

------
Finbarr
This should exist for Wikipedia. Would be great to autocomplete sentences as
well as code.

------
pka
Interesting. I was working on something similar [0] last year. Are you
planning on publishing the code?

[0]
[https://news.ycombinator.com/item?id=9954059](https://news.ycombinator.com/item?id=9954059)

------
rcarmo
Pretty neat, although the "training set" probably doesn't comprise what most
people would need.

I'd like to see a version of this based, say, on React examples. Might save me
some time :)

------
andgio
Can't really see this being useful if I'm honest. Here you are simply showing
the code suggestions based on what the user is writing. You are completely
ignoring both the context of the code being written by the user and of the
code being suggested.

These are key functionalities in order to actually be usable. With this you
simply get hundreds/thousands of suggestions that are not related to what you
are coding. And if by chance the suggested code is exactly what you want, then
the variable names wouldn't even match. This is assuming that the suggested
code is complete and functional.

~~~
pavs
I don't think the point was to for it to be useful, it was something fun to
do.

------
magic_man
We can feed the SO data to a ml algorithm and pretty soon we won't even need
software engineers. Go today self coding computers tomorrow.

~~~
hzhou321
Of course we still need software engineers troll SO :).

------
boksiora
hahah :) someday we will say "Computer make me a program" and it will make it
from SO posts

------
jrbapna
Is there an atom extension for searching stack overflow snippets? if not,
let's build one!

~~~
jrbapna
it exists! [https://atom.io/packages/ask-stack](https://atom.io/packages/ask-
stack)

------
mrfusion
I don't understand how it works? It's just matching the function name?

~~~
eschutte2
No, but I realize the demo doesn't make it very clear. It inspects the
structure of the code up to the cursor position, based on the syntax tree,
along with nearby variable and function names, and matches to similar
constructs from SO. It could be greatly improved, but I haven't had a lot of
time lately.

------
mananvaghasiya
Can somebody please make an Atom plugin for this? Oh wait...

------
daveheq
It works great until it does something you don't want.

------
bawana
yes, but will this help a million monkeys smashing keyboards write the next
great AI

------
mgalka
Awesome project! Nice work.

------
blacktulip
[http://i.imgur.com/7SiQSD1.jpg](http://i.imgur.com/7SiQSD1.jpg)

~~~
logicrook
Where can you buy this book? I want it!

~~~
schlowmo
No need to buy it, it's free:

[https://www.gitbook.com/book/tra38/essential-copying-and-
pas...](https://www.gitbook.com/book/tra38/essential-copying-and-pasting-from-
stack-overflow/details)

Found yesterday in this HN Story:
[https://news.ycombinator.com/item?id=11333448](https://news.ycombinator.com/item?id=11333448)

~~~
logicrook
Oh, wow, thank you very much.

It's funny because I thought "you can't just copy and paste SO, there's a
number of things to think of to be able to do it correctly". Guess I'm not the
only one who thought that.

Anyway, most intellectual work is just applying known recipes (copy-paste,
renaming a few variables), so it's just a matter of granularity and sources.
The hate for SO copy-pasters owes a lot to the few beginners who don't
understand that yet and take copy-paste literally.

