Personal domain blacklist.
There's a lot of spammy bullshit on the web and Google seems to have given up on keeping this away from me. Fine. But for my specific searches, there's usually a handful of offenders who, if I never, ever saw them again, it would improve my search experience by an order of magnitude.
So let me personalize search by blacklisting these clowns. Why can't I filter my search results so that when I search for a programming issue, I never see these assholes from "Efreedom" who scrape and republish Stack Overflow?
I don't, personally, need an algorithmic solution to spam. Just let me define spam for my personal searches and, for me, the problem is mostly solved.
(Also blacklisted: Yahoo Answers, Experts Exchange.)
I am actually suprised there is no Labs application for this, unless there is a business case against it.
A cross-browser effort that implements a few key features from OptimizeGoogle, would be a very good idea. I'd be up for that.
Maybe IE9 does, too, but that's not important. :)
Typically it's ok to err on the side of caution but when someone offers to do a bunch of work if you just indicate that you'd like it done the safe bet is to assume that in fact they are qualified, after all their reputation is on the line in public.
The first thought that came to mind was what happens when I disagree with a couple of items on one of these 3rd party blacklists?
Then I thought, FORK IT and make the changes you want. You could even merge in lists from other people. Github for blacklists?
Now, this doesn't mean that filtering them wouldn't be useful to you, since at first glance it appears they're solely a duplicate. Just pointing out that they're not actually doing anything wrong, and they're (probably) not scraping.
SO has specifically said that this is okay.
It doesn't look like Jeff is that okay with it, especially when it comes at the cost of Stack Overflow's own ranking:
Sorry, this is absolutely necessary, otherwise we get demolished by scrapers using our own content in Google ranking. – This is from a question about Stack Overflow's SEO strategy:
In the same way I can call someone's mother bad names because it isn't illegal, it doesn't mean I should do it, because I can follow the letter of the law 100% and still be an asshole. Overall, my policy is that it's best not to be an asshole and it annoys me when others can't share that basic ethos.
That said, you could make an argument that the value they're adding is SEO and promotion, it's pretty impressive to be able to out-rank SO...
New media will make it work anyway. IP is not needed for content producers to survive and even thrive.
(On a side note, capitalism is defined by the legal enforcement of property rights. Abolishing intellectual property is probably the exact opposite of capitalism. The word you're looking for is "market".)
You're still right though: market would be a better fit. Markets and capitalism are pretty much interchangeable in my mind, which is why I made the slip.
It's fine that they have an expressed policy that says it's okay, but I'd keep it at that and not refer to terminology like CC licenses.
There's nothing dubious about this legally at all.
I prefer how YouTube handles it:
“You shall be solely responsible for your own Content and the consequences of submitting and publishing your Content on the Service. You affirm, represent, and warrant that you own or have the necessary licenses, rights, consents, and permissions to publish Content you submit; and you license to YouTube all patent, trademark, trade secret, copyright or other proprietary rights in and to such Content for publication on the Service pursuant to these Terms of Service.
For clarity, you retain all of your ownership rights in your Content. However, by submitting Content to YouTube, you hereby grant YouTube a worldwide, non-exclusive, royalty-free, sublicenseable and transferable license to use, reproduce, distribute, prepare derivative works of, display, and perform the Content in connection with the Service and YouTube's (and its successors' and affiliates') business, including without limitation for promoting and redistributing part or all of the Service (and derivative works thereof) in any media formats and through any media channels. You also hereby grant each user of the Service a non-exclusive license to access your Content through the Service, and to use, reproduce, distribute, display and perform such Content as permitted through the functionality of the Service and under these Terms of Service. The above licenses granted by you in video Content you submit to the Service terminate within a commercially reasonable time after you remove or delete your videos from the Service. You understand and agree, however, that YouTube may retain, but not display, distribute, or perform, server copies of your videos that have been removed or deleted. The above licenses granted by you in user comments you submit are perpetual and irrevocable.”
(Can someone tell me the <pre> syntax or something appropriate for a blockquote?)
prefix with four spaces
Regardless, thanks for the tip.
The question is whether StackOverflow is closer to YouTube or Wikipedia. I think it's closer to Wikipedia because it's a curated reference source, not just a medium for self-expression.
The articles are in a constant flux of change, and I don't know if anyone deserves more attribution than others for contributing to an article.
Knoll might be a more relevant example, but I haven't really checked it out in a while. (Who has, really.)
I recognize I may be being overly simplistic.
> Always redirect to stackoverflow from pages that just copy content, like efreedom, questionhub, answerspice.
Very handy. I put ehow.com on mine and never see results from them.
In the interests of results diversity, you don't want the same content repeated ten times on the first page, although this has the side effect of pushing the original source onto the second page if you guess wrong.
There must be some way Google's search engine could learn by looking at the blacklists people uses.
Now, it's possible that SearchWiki just needed a few more iterations, and with a few details changed, could be a big success. There have been a few other recent launches that were tried years ago, didn't work then, but had a few more iterations and now are big successes. I could at least raise the issue. But unless I can tell a convincing story about why people would use this when they didn't use SearchWiki, it may be an uphill battle to get resources devoted to this.
Compare that to the GMail labs. It's pitiful.
They already have the exact opposite curation feature: the star system. And it's crazy.
When I search, and click one of the 10 results, and the result turns out to be satisfying, the last thing I want to do is click the back button and star it.
When the result turns out to be spam I necessarily have to hit the back button and try again. Staring me in the face is the now-purple link spam - let me X it.
Personal blacklists are the least Google could do, because my SERP is never going to be perfect. Feeding those blacklists back into the general SERP population is an interesting research project.
> I don't, personally, need an algorithmic solution to spam.
Not now you don't. But if everyone started using your approach then the spammers would adjust their behaviour and use many domains instead of one.
For whatever it's worth, blekko has this. It's one of the main reasons I switched to blekko over Duck Duck Go for the majority of my searching.
Also, I'm also happy to take requests to ban these stupid sites for everyone.
It's even really a complaint. I am glad that when I show up on DDG (which I still do several times per day) I get the same high-quality results without regard to who I am.
I'm also happy to take requests to ban these stupid sites for everyone.
The problem is that I have, for example, en.wikipedia.org marked as spam, simply so that their juice doesn't overwhelm my search results. It makes sense for me, but I suspect it's not even close to what your average user wants or expects.
In any case, thanks for the recent addition of non-Google options for searches when DDG runs out of results. Small as it may seem, I consider that a major step in the right direction.
Haven't you used Gmail? If I tag a site as spam, DDG shouldn't show it to me. If a thousand users mark it as spam... then it starts to be clear that is a shady site, and DDG should eliminate it from its system.
Uhm, but now we have a vector for script kiddies to ban sites from a... who knows, maybe in some years... major web search.
Perhaps definitively removing a site should be done by a human operator.