Sites like these are, in my opinion, the scourge of the internet. There is a lot of talk nowadays about curated search engines displacing machine-generated search engines but I tend to think this goes too far. A search engine that could reliably determine the authoritative source of duplicated content and only include that source would be killer. Seems within the realm of possible... Anyone working on that?
We're doing something along a similar vein for LazyReadr. The idea is to merge news about the same story together, an important step from there will be deciding which is the authoritative source to display. Going from that to effective search isn't a large leap.
Use http://dukgo.com instead of its less quality-motivated big brothers. If google, et al. see duckduckgo usage spikes, they will likely implement similar filtration measures. Then they will post blog updates about how awesome they are for adding said measures to their service. And I will read about it after searching with duckduckgo.
I use duckduckgo as the default search engine with Chrome, and it works really nice. The key to make switching easy: Preface your query with, like `!g my-query' or `"!bing my-query' to search on Google or Bing. Useful, because for me Google is still better on searching stuff in German.
Just using `! my-query' gives you I'm-feeling-lucky semantics.
Often times, efreedom ranks higher than StackOverflow, and in some cases SO isn't even on the first page (for some reason the official source 'misses' with my search terms). In those cases, I open up the efreedom link and click through to SO. This seems to be happening more and more.
EDIT: I see that the extension actually redirects to SO. So in a way the presence of those sites when I wouldn't normally see SO results is a good thing. Nice.
>> In those cases, I open up the efreedom link and click through to SO. This seems to be happening more and more.
I completely understand why you're doing that, but you should know that's seen by the Goog as a big +1 for efreedom. All they know is you clicked on that result and didn't come back because it didn't answer your question.
I would love to see this feature. Preferably as a way for search results to be "voted-down" by multiple users, with an aggregated score (this would probably be heavily gamed/spammed by black hat SEO people, and become useless anyway).
At the very least if I am logged into my gmail account, I should be able to hide certain site from being returned in my personal results.
Wait, let me see if I understand this right. Stack Overflow doesn't use AdSense. Other sites scrape Stack Overflow and surround the ripped-off content with ads from AdSense. And you're wondering why Google ranks its customers' sites higher than its non-customers'?
A correction: these sites don't scrape Stack Overflow's content, they download and use it directly and legitimately. Stack Overflow content is cc-wiki licensed and released in full data dumps every month, as long as the sites link to stackoverflow.com and otherwise comply with the cc-wiki requirements it's legit.
SEO is non-deterministic enough that I hesitate to even speculate, but the clean markup on sites like eFreedom is probably a lot of their advantage. Stack Overflow buries its content in quite a bit of presentational markup. Everything else being equal, it makes sense that the tighter, more focused page should rank better than the larger, more convoluted page containing the same content.
yes, it's very very bizarre and it drives me crazy. We've tried 3 or 4 different things to fix it and nothing seems to take. Note that our attribution terms do require a link back primarily for this reason, and even the sites who attribute back to us totally legally still have this problem. It truly does feel like a Google bug, honestly. See related discussion at http://webmasters.stackexchange.com/questions/5385/page-appe... . I'm all ears if anyone knows of a way to fix this.
Why not just yank the creative commons licence and replace with one that explicitly does not allow scraping?
That would be the fastest way in my book. I've never worked out why SO allows it in the first place. Is it just to appear open and web 2.0-y, or is there a business reason. It's a proper business now, users are loyal, cancel the licence. I can't think of a single person who would say 'oh, but I much preferred to read those spam sites'.
As someone who has contributed a fair amount of content to SO, I would wholeheartedly support modifying the CC license on the content I've contributed. I much prefer the idea of that to the idea of allowing my answers to help build dens 'o spam like eFreedom.
Now that I think of it, my answers going straight-to-spam is a nontrivial detriment to my contributing content to SO.
Good idea, but I have no idea who our account manager is at Google. Do we even have an 'account manager'? what do they manage exactly, we don't do any adsense. I'll try to find out. but I'm kinda skeptical this approach (serve googlebot one set of content, everyone else something else) is something that Google would actually want to encourage at scale on any site.
Cloaking (the name for faking out the google-bot) is NOT cool by Google. There are ways to accomplish the same thing without cloaking, for example making sure content loads first, keeping markup and presentation away from the main HTML, etc.
I'm looking into this a bit now. It's not a clearcut situation, in that SO has a license that allows copying. Jeff, drop me an email and let's talk about it more (I followed you on Twitter so you can DM me).
@codinghorror - any update from your conversation with Matt_Cutts? I'm curious on this issue from a justice standpoint, as continuously seeing efreedom results in the Google rankings just doesn't sit well with me.
I looked around eFreedom's site a bit and they are providing some additional add-on value (translations etc.) so that may also be the case. In any case, best of luck in getting SO pages ranked better.
the auto-translations are specifically against Google's TOS, just FYI. Beyond that Matt is looking into some specific oddities we found and stuff was forwarded on to the Google search quality team. Not sure what will come of it, but Matt Cutts is awesome!
I really can't get over how a blatantly spammy website (Adsense plastered over pages) just reorganizing SO content can have such a huge traffic velocity - check this alexa chart: http://www.alexa.com/siteinfo/efreedom.com