
A Chrome extension to avoid the Stack Overflow ripoffs - jlangenauer
https://chrome.google.com/webstore/detail/gledhololmniapejefjfocffkhoamlll#
======
geekfactor
Sites like these are, in my opinion, the scourge of the internet. There is a
lot of talk nowadays about curated search engines displacing machine-generated
search engines but I tend to think this goes too far. A search engine that
could reliably determine the authoritative source of duplicated content and
only include that source would be killer. Seems within the realm of
possible... Anyone working on that?

~~~
Swizec
Look up information cascades.

We're doing something along a similar vein for LazyReadr. The idea is to merge
news about the same story together, an important step from there will be
deciding which is the authoritative source to display. Going from that to
effective search isn't a large leap.

~~~
geekfactor
Thanks for the tip. My search turned up this resource, which looks awesome:

<http://news.ycombinator.com/item?id=1986198>

------
spicyj
How can we prevent these sites from ranking well in the first place?

~~~
imperialWicket
Use <http://dukgo.com> instead of its less quality-motivated big brothers. If
google, et al. see duckduckgo usage spikes, they will likely implement similar
filtration measures. Then they will post blog updates about how awesome they
are for adding said measures to their service. And I will read about it after
searching with duckduckgo.

~~~
eru
Sounds a bite like Opera vs the major browsers.

I use duckduckgo as the default search engine with Chrome, and it works really
nice. The key to make switching easy: Preface your query with, like `!g my-
query' or `"!bing my-query' to search on Google or Bing. Useful, because for
me Google is still better on searching stuff in German.

Just using `! my-query' gives you I'm-feeling-lucky semantics.

Have a look at <http://duckduckgo.com/bang.html> for an exhaustive list. !hn
searches Hacker News via searchyc.com.

------
evandavid
Often times, efreedom ranks higher than StackOverflow, and in some cases SO
isn't even on the first page (for some reason the official source 'misses'
with my search terms). In those cases, I open up the efreedom link and click
through to SO. This seems to be happening more and more.

EDIT: I see that the extension actually redirects to SO. So in a way the
presence of those sites when I wouldn't normally see SO results is a good
thing. Nice.

~~~
ekanes
>> In those cases, I open up the efreedom link and click through to SO. This
seems to be happening more and more.

I completely understand why you're doing that, but you should know that's seen
by the Goog as a big +1 for efreedom. All they know is you clicked on that
result and didn't come back because it didn't answer your question.

------
AndrewO
It's sites like these that have made me wish I could downvote or mark-as-spam
from the search results. Why can't I tell Google that I never want to see
results from certain URLs ever again?

~~~
kaylarose
I would love to see this feature. Preferably as a way for search results to be
"voted-down" by multiple users, with an aggregated score (this would probably
be heavily gamed/spammed by black hat SEO people, and become useless anyway).

At the very least if I am logged into my gmail account, I should be able to
hide certain site from being returned in my personal results.

------
mkane91301
Wait, let me see if I understand this right. Stack Overflow doesn't use
AdSense. Other sites scrape Stack Overflow and surround the ripped-off content
with ads from AdSense. And you're wondering why Google ranks its customers'
sites higher than its non-customers'?

~~~
InclinedPlane
A correction: these sites don't scrape Stack Overflow's content, they download
and use it directly and legitimately. Stack Overflow content is cc-wiki
licensed and released in full data dumps every month, as long as the sites
link to stackoverflow.com and otherwise comply with the cc-wiki requirements
it's legit.

------
w1ntermute
Google Blacklist can be used to remove any arbitrary site:
[https://chrome.google.com/extensions/detail/hbodbmhopadphblo...](https://chrome.google.com/extensions/detail/hbodbmhopadphbloiimamkjmihekaejd)

------
duck
For a little more info on these sites:
[http://meta.stackoverflow.com/questions/58369/did-anyone-
not...](http://meta.stackoverflow.com/questions/58369/did-anyone-notice-that-
tech-efreedom-com-seem-to-be-scraping-republishing-sos-po)

------
gte910h
THANK GOD. I was getting really pissed off while trying to meet a pretty hard
deadline yet trying to lookup iOS esoterica.

------
ams6110
DuckDuckGo does filter some of these sorts of sites. Not sure about these in
particular, but I did try searching for "NSFetchedResultsController" there and
none of these sites were in the results.

~~~
jonpaul
Actually, I never thought I'd find one, but DuckDuckGo is a search engine that
every programmer should use. In fact, I've found that DuckDuckGo can be better
than Google in a lot of cases.

~~~
cdr
Google is not currently optimized for technical people - if anything, it's
anti-optimized. It shouldn't be hard to beat it for technical queries.

------
Semiapies
Judging by the permissions, it redirects for a whopping three sites. Why not
just _not click on the links for those sites?_

~~~
jlangenauer
Because, for reasons known to the engineers toiling at Google, and not me,
these sites will show in the search results, and the actual StackOverflow
answer _will not be shown at all_.

~~~
codinghorror
yes, it's very very bizarre and it drives me crazy. We've tried 3 or 4
different things to fix it and nothing seems to take. Note that our
attribution terms do require a link back primarily for this reason, and _even
the sites who attribute back to us totally legally_ still have this problem.
It truly does feel like a Google bug, honestly. See related discussion at
[http://webmasters.stackexchange.com/questions/5385/page-
appe...](http://webmasters.stackexchange.com/questions/5385/page-appears-
indexed-in-google-but-not-findable-for-any-search-terms) . I'm all ears if
anyone knows of a way to fix this.

~~~
brc
Why not just yank the creative commons licence and replace with one that
explicitly does not allow scraping?

That would be the fastest way in my book. I've never worked out why SO allows
it in the first place. Is it just to appear open and web 2.0-y, or is there a
business reason. It's a proper business now, users are loyal, cancel the
licence. I can't think of a single person who would say 'oh, but I much
preferred to read those spam sites'.

~~~
Encosia
Yes, absolutely.

As someone who has contributed a fair amount of content to SO, I would
wholeheartedly support modifying the CC license on the content I've
contributed. I _much_ prefer the idea of that to the idea of allowing my
answers to help build dens 'o spam like eFreedom.

Now that I think of it, my answers going straight-to-spam is a nontrivial
detriment to my contributing content to SO.

~~~
brc
Glad to see I'm not the only one.

But I really posted this comment in reply to say thanks for the blog posts on
jquery/asp.net. They really got me going in the right direction - fantastic
stuff.

~~~
Encosia
Thank you for the kind words. It's always great to hear that someone's been
able to extrapolate a useful approach/direction out of my rambling.

------
dy
@codinghorror - any update from your conversation with Matt_Cutts? I'm curious
on this issue from a justice standpoint, as continuously seeing efreedom
results in the Google rankings just doesn't sit well with me.

I looked around eFreedom's site a bit and they are providing some additional
add-on value (translations etc.) so that may also be the case. In any case,
best of luck in getting SO pages ranked better.

~~~
codinghorror
the auto-translations are specifically against Google's TOS, just FYI. Beyond
that Matt is looking into some specific oddities we found and stuff was
forwarded on to the Google search quality team. Not sure what will come of it,
but Matt Cutts is awesome!

~~~
Rabbidwongbat
It looks like the robots.txt on those translation sub domains is disallowing
all crawlers.

------
thegyppo
I really can't get over how a blatantly spammy website (Adsense plastered over
pages) just reorganizing SO content can have such a huge traffic velocity -
check this alexa chart: <http://www.alexa.com/siteinfo/efreedom.com>

------
nhangen
Wouldn't it be great if there was a way to do this for any site you didn't
want to see in the search results? Is it already possible with Google
personalized?

