

The most popular links posted by developers to Stack Overflow - bcleary
http://linkedlists.net/

======
Encosia
Nice work.

This would take some work (more on the server's part than yours), but I think
it would be more accurate if you tested the links for 301 redirects and
consolidated the results accordingly. For example, in the ASP.NET tag one of
my pages has 112 links to an older URL ([http://encosia.com/2008/05/29/using-
jquery-to-directly-call-...](http://encosia.com/2008/05/29/using-jquery-to-
directly-call-aspnet-ajax-page-methods/)) that 301 redirects to a newer URL
([http://encosia.com/using-jquery-to-directly-call-aspnet-
ajax...](http://encosia.com/using-jquery-to-directly-call-aspnet-ajax-page-
methods/)) which has 129 links itself. It would be interesting to see only the
latter URL show up in the list with 241 links.

~~~
bcleary
Thanks. Re the redirects, yea I have a lot of work to do on my crawler, my
current solution is hand rolled and only has limited capabilities. But if
anybody knows of a good open source project let me know, i have been looking
at using the common crawl but that's not a complete solution either.

------
aaronbrethorst
I'd be interested in seeing what happens when you bucket citations by domain
instead of URL. Google Analytics counts over a thousand unique referring SO
pages to Cocoa Controls (<http://www.cocoacontrols.com>) this year alone. But,
of course, most of the links back to my site are long tail.

~~~
bcleary
Great idea, will try and look at that next.

~~~
yarianluis
Another good reason to do this is for example the android tag. Most of the top
results are Android documentation for basic classes like ASyncTask and
Activity. If I could filter out google domains I'd be left with interesting
links related to Android.

------
mtowle
The top python link, which should direct to
<http://www.crummy.com/software/BeautifulSoup/> instead directs to
<http://www.crummy.com/software/beautifulsoup/> \--which causes crummy.com to
throw a 404 error at you.

~~~
Evgeny
You learn something every day. I was sure URLs are not case sensitive until
now.

 _While domain names are not case-sensitive, the rest of the URL might be. In
our example, this would be everything that follows “.com” as in
wisegeek.com/are-urls-case-sensitive.htm._

~~~
Encosia
Somewhat related, it took me an embarrassingly long time to realize that
browsers treat assets with different URL casing as different assets for
purposes of caching. Three image elements that reference Foo.jpg, foo,jpg, and
foo.JPG all require a separate request and space in the cache, even if the web
server you're using is case-insensitive and all three URLs resolve to the same
image.

~~~
ars
I always add:

    
    
        CheckSpelling off
    

To my dev (but not production) server so that I catch things like that. This
won't help with a case insensitive web server though.

------
captaincrowbar
Looking at the results for C++, it looks like the text santizing you're doing
on link titles is losing the "++". All the titles seem to have C instead of
C++, e.g. "Boost C Libraries" should be "Boost C++ Libraries". You're also
losing the # in C# and the dot in .NET, and probably others.

~~~
bcleary
Thanks for the heads up, will modify the parser, again if anybody knows of a
good open source crawler let me know. I rolled my own very quickly but would
love it if i could find a 3rd party solution. Another option i have explored
is pulling in the titles and descriptions provided by search engines but
currently only DuckDuckGo offers anything useful and even then its coverage of
some of these low ranking programming pages isn't great. Bing offers a pay per
use access to its index but the costing structure really doesn't fit with my
use case.

------
bcleary
Thanks for the comments and votes. By the way if anybody is interested this
was presented as part of the mining challenge at MSR2013
<http://2013.msrconf.org/challenge.php> and here is the paper
[http://thechiselgroup.org/2013/03/27/a-study-of-
innovation-d...](http://thechiselgroup.org/2013/03/27/a-study-of-innovation-
diffusion-through-link-sharing-on-stack-overflow/)

~~~
denzil_correa
I was just about to ask this question! I read that paper and found it quite
interesting.

------
ChrisClark
The most popular link for Android answers is AsyncTask. It makes sense, one of
the biggest complaint about Android is that it isn't always perfectly smooth
and people notice the jerkiness in the UI. I would say a large majority of the
time it is because an Android app developer is running slow code on the UI
thread instead of doing it correctly.

~~~
bcleary
Interesting in c# the second most popular link is for the BackgroundWorker
class, not entirely the same use case but i guess similar motivations.
[http://msdn.microsoft.com/en-
us/library/system.componentmode...](http://msdn.microsoft.com/en-
us/library/system.componentmodel.backgroundworker.aspx)

------
ben336
The javascript entry doubles as a Table of Contents for the jQuery API.
Totally unsurprising

~~~
rjzzleep
looks like the android entry then.

------
ronj
Nice! Nitpick: converting entities (such as '&mdash;'es) to their actual
representation ('—') in the titles would be nice, currently they show up as
'mdash'

~~~
bcleary
Thanks, yea we will have to do some work on cleaning up our title and
description parser. Will add to the bug list.

~~~
ronj
Cool. Also, props for the pun, that's a fine name you found here :)

------
teshima
How different could be the results from linked_lists tool compared with Google
results for a specific topic? Are they more close to what a developer needs?

~~~
bcleary
Ah, you guessed my next paper :) I have done a little informal analysis on the
top 10 results from linked_lists vs the top 10 for that tag used as a Google
query. They are quite different, almost totally different actually. But this
is not really surprising if you think of the developers curating the links
posted on Stack Overflow. I know there was an attempt a few years back to
build a search engine based on the SO data set, don't know what happened to
that.

------
taternuts
Very cool I like it! I like that it adds whatever tag you clicked on to the
top, but maybe you should save that between sessions. Also, a lot of the more
popular languages will reflect what everyone here has already seen and worked
with - which is cool because it does what you advertise, but the usefulness is
somewhat limited for us folk. It would be cool to have a year/month/week
filter to see what has been linked to the most lately (I just saw that was
suggested earlier). Things like node would benefit from that since it's
growing so fast, but everyone knows about express. A simple thing to enable a
bit more usefulness would be to link to the actual Stack Overflow posts so we
can look at the comments. Cool stuff though!

~~~
bcleary
Thanks for the support. Yes to everything above :) We are currently mining the
post history data to be able to do those kinds of time range queries. Cant
wait to get that out there it should be very cool indeed, also want to allow
users to search by their SO id and to filter their links by tag. (As an aside
when we do mine the history we will be able to get more accuracy on which
users actually posted which links rather than just the post owner.)

------
brokentone
This is nicely done, but so far isn't returning anything interesting for me.
All the results are very basic (Django docs, PHP docs, etc), which makes
sense, the most often cited will be the most general.

What about a change in algorithm to try and add some discovery here. What if
you look at votes for answers vs cites? So things that have a high average
vote:cite ratio rank?

~~~
bcleary
Thanks. Yea i agree our initial use case was to provide an interface to the
dataset and to allow us to explore what kinds of things developers were
sharing on SO. For the next version we are working on new ranking metrics that
will improve the discover aspect, vote:cite and view:cite are 2 we are looking
at.

~~~
brokentone
Looking forward to v2!

------
raimonds
Here's the same but for Hacker News <http://www.hnstore.co/42.html>

~~~
bcleary
Wow, that's really cool, have to admit i didn't see that before. Let me know
if your interested in sharing data, we are doing a lot of research in this
area and the more data the better!

------
eliasmacpherson
This is very cool! This only a minor quibble but when you've selected e.g. C
and then C++, the 'x' to remove one or other tag from the search results is
nearly invisible. I only found it because I've used a similar 'x' box before
on a different site to remove results.

~~~
bcleary
Thanks, will update.

------
rc4algorithm
It would be interesting to weight the links by the score of the corresponding
comment.

~~~
bcleary
I actually had that in an earlier version and took it out just to simplify the
design but i am actually looking at this again to produce a better sorting
experience. Also the number of views a post receives may be a good metric
also.

------
__sb__
This is an awesome idea! Is there any way to create a randomized list weighted
with popularity (so for a given tag you can refresh to find new links, but
still ones likely to be interesting)?

~~~
bcleary
Thanks for the support, really encouraged by the feedback here on Hacker News.
You guys are great.

To your question, yes I am currently testing some ideas for a magic ranking
system. One option is as you say a kind of random select amounts popular or
trending links. Another is a smart weighted sort when filtering by multiple
tags. One problem is that right now if you add jQuery to your filter, the
javascript results are going to just dominate everything else.

------
coin
Doesn't work on an iPad. I type in my search term, press enter, and nothing
happens. Furthermore, when I enter my search term, the "Filter by tag (e.g.
javascript)" text doesn't disappear.

~~~
bcleary
Wow - ok looking into that now. What version of ios?

~~~
coin
The latest, iOS 6.1.3

------
tharshan09
Great work. I am sure this would be useful if there was a filter on official
documentation sites like jquery or php.net etc.

In the spirit of HN, how does it work and what powers it? :)

~~~
bcleary
Thanks. Yes the domain filter is a great idea, will go to the top of the
feature request list.

The site is c#, asp.net mvc, with a javascript front end, backed with sql
server 2012. And running on AWS.

------
quanganhdo
Top Objective-C link points to ASIHTTPRequest, an obsolete networking library
that is no longer supported (and even its developer recommends against using
it).

~~~
SG-
Right but there's a lot of people still maintaining code that has it
implemented and also a lot of tutorials that people are using that likely use
that framework.

------
mratzloff
Ha, the top result for C is the spec. Every other language is a popular
library.

I wonder if there are more RTFM-type responses for C than other languages.

~~~
TheCoelacanth
I suspect it has more to do with the large amount of
undefined/unspecified/implementation defined behavior in C. A lot of questions
on Stack Overflow can only be answered correctly by referring the the
standard.

------
SG-
Are these links from only 2013? It would be interesting to see what the top
links are by year/month or something too.

~~~
bcleary
So these are taken from the March 2013 data dump, which includes questions
going right back to the start of SO. So some of these links have been
collecting citations for a few years. We only mined the actual post content on
the date the dump was created, we did not mine the post history. But we are
working on that right now, its a lot of data to process :)

------
nivstein
Interesting. This data may potentially be useful in evaluating
framework/library/OSS trends and prominence.

~~~
bcleary
Thanks, yes we are actually working on a paper to that effect at the moment.

------
novaleaf
i dunno... 11 of the top 14 links under the "Javascript" tag are about jquery
(which has it's own tag). I suppose it's not a flaw in your algorithm, just
not really very interesting results.

~~~
bcleary
I know javascript and jQuery suffer from their popularity and utility I think
in this analysis. There is also the issue of jQuery and javascript tags being
used by people as synonyms. jQuery particularly is a 500 pound gorilla over
all the dataset, it shows up everywhere. As I mentioned above there was a joke
on SO at one point that the answer to nearly every question was jQuery with a
link to the site. Apparently it wasn't a joke.

------
nhebb
Feature idea: add a time filter, e.g. last six months, last year, etc.

~~~
bcleary
Thanks, yea second most requested feature after the domain filter. Will
hopefully add soon.

------
astrodust
Does this scrape links from the comments posted to questions as well?

~~~
bcleary
No just the post bodies (question and answers) at the moment, but we are
working parsing the comments and the post history, those datasets are about 4
times the size of the posts! So there are probably a lot more URLs in there,
although we will have to decide if we treat all URLs the same or if we
differentiate between URLs in post bodies contained in the dump, URLs in the
post history (that may have been removed from the post) and URLs in the
comments. Not maybe a concern for the website, but more so for research.

~~~
astrodust
You usually see notes or clarification posted to questions in the form of
comments first, where the types of links are more introductory, general
purpose, than specific as you might find in answers.

Can't wait to see the updated stats.

If you could make "more" load more than just a few more records, though,
that'd make it a lot easier to dig deeper.

------
gwillen
you're _

~~~
ceautery
Totally. Fix the bad grammar, and flesh out what cc/sa means for those
unfamiliar with it.

~~~
bcleary
Thanks, missed that one, will fix.

------
nthitz
PHP's top result: jQuery...

~~~
bcleary
Yea, I know. I think it probably goes back to that SO joke about "Q - I have
this programming problem" "A - jQuery"

