Hacker News new | past | comments | ask | show | jobs | submit login
New Chrome extension: block sites from Google’s web search results (googleblog.blogspot.com)
508 points by dannyr on Feb 14, 2011 | hide | past | web | favorite | 214 comments



I just wanted to say thanks to all the people on Hacker News who asked for this option. We'll look at offering a "block site" option directly in the search results over time, but it takes longer to write, test, and launch that code.

In the mean time, use this extension to clean up your own search results and tell us which sites you don't want to see in Google.


Matt, I don’t understand why Google need to offer blocking as a user option. Any web netizen with some experience can tell at a glance the spammy and low-quality results on a SERP, but Google, with their vast knowledge and experience and their ability to monitor all sorts and aspects of user clicking behaviour, cannot?


I alluded to that in http://news.ycombinator.com/item?id=2218627 . People feel comfortable with Google removing blatant spam: hidden text, cloaking, sneaky JavaScript redirects, etc. People tend to feel less comfortable if they feel like Google is making an editorial decision.

If we get a good signal from this extension, or from offering block links in Google's search results, then it's much more similar Gmail's spam algorithm, where an email is labelled as spam partially because a lot of users say it is, rather than because of some editorial decision on our part.


Matt, before we get to the point where potentially controversial editorial decisions will have to be made, I would imagine there are things that could be done automatically and uncontroversially.

For example, sometimes we see copies rank higher than originals. Why does that happen? Google know where they first saw a particular piece of content, don’t they? Why don’t they use that as a heavy ranking factor?

Or am I too far off?


Consider this: I make a blog post on my relatively new blog examplesite.com. Techcrunch picks up on the article and immediately reposts it on their site. Which do you recon would be the first to get indexed?

Now which is the original from Google's point of view? Relatively smaller blogs can take significantly longer to index then sites that have massive amounts of content moving about daily.


I have thought about that and I cannot believe it is really a problem.

My small negligible personal website notifies Google, Bing and Yahoo immediately and automatically as soon as I publish something new. It also publishes a feed. Even if the content is picked up and republished right away by a site that is indexed every minute, it should be possible to determine correctly the original publisher.

In some cases, I can think of more ways to determine the original publisher. And certainly Google can think of even more.


> I can think of more ways to determine the original publisher.

Please share them. Thanks.


Here is one scenario that comes easily to mind:

I publish an article at only-original-content.com. The article has some images that are served from only-original-content.com/images. Now only-copied-content.com takes my original article and republishes it. Since only-copied-content simply copied the HTML, the images are still served from only-original-content.com/images.

In that case it should be simple to determine who is the original publisher. Of course, only-original-content.com could simply be a CDN that only-copied-content.com uses for its static resources, but, again, it should be easy to determine whether that is the case.


demetris, that's exactly what we improved with a recent algorithm change: http://www.mattcutts.com/blog/algorithm-change-launched/ . As you point out, that was a more straightforward change, and that's why we were able to launch that one first.

That said, if you wanted to share some examples where you're seeing copies rank higher than originals I'm happy to pass that on to the right folks. In fact, some of the right folks are already on this thread. :)


posts to google groups show up in devcomments.com, osdir.com, mail-archive.com, etc ranked higher than the original post in google groups.


Hmm. That could be if Google Groups has changed the url structure, which could make crawling it harder. Or because USENET/mailing lists don't always have a centralized/canonical location on the web, which makes dupe content more of a potential issue. It's not the usual "Website X copied my website" scenario.


Getting on a tangent here, but Google has a hard time crawling Google Groups? Have you tried emailing support@google.com? Just kidding but in all seriousness how is it that a Google property has bad SEO?


still the sites mentioned above are arguably of lower quality than google groups, there is no reason to rank higher...


You might want to take a look at this:

http://news.ycombinator.com/item?id=2152286


You already have the SafeSearch filter that can be toggled on and off to show you different search results. Why not an Editorial filter as well (perhaps disabled by default)?


I understand the fine line you have to walk, and I'm glad to see that you guys take it as seriously as you do.

Personally, I'm glad that you're putting this out as a user-controlled thing. I like the fact I'm able to get rid of results that aren't necessarily spam or SEO'ed garbage, but I where I still know that I never want to see results from that site again.


I'm really glad this extension exists.

However, if it is built in to the general results, could you also add a metric to Webmaster Tools showing the frequency with which your domain is reported. It would also be good if the blocking could have timeout period so that sites can be given a chance to improve their behavior, rather than just deleting that domain forever.


+1 to a report in WMT - to detect problems in our sites and also potentially abuse.


A "web netizen" is intelligent. Google - as you're using the term - is a sophisticated piece of software that leverages much of what we know about AI, but it is not actually intelligent. The determination you're making that something is spam and not worth your time may be simple for you to make, but it's hard to get software to consistently make the same determination.


It's one thing for an individual user to ask not to be bothered by results from an entire domain. It's quite another for Google to make that call for EVERYONE. I think this is a great first step. If the evidence is overwhelmingly against a domain, I would hope that Google would use that as a strong negative signal against the domain to adjust the ranking of the site.


Another thing to keep in mind is to think about how many unique search queries are entered into Google on a daily basis. I don't know what the figure is at Google, but some engineer at Bing said that 25% of searches they've never seen before. You can't get a human to look at each of these searches and remove the content farms, you have to do it algorithmically or else it will never work.

Getting users to do the work will help with the most egregious abuses. One thing that content farms are good at is coming up with answers for questions that don't have an answer available on-line. For example, the query "what's the personal cell phone number of <insert celebrity>?" will just take you to a spam farm because that's the only kind of website that claims to know the answer.


So, does this mean expertsexchange will eventually get shoved down the search results?

(AKA great stuff!)


Experts exchange answers aren't nearly as bad as things like yahoo answers, etc. (PS - you know the answers are available at the very bottom of the page, right? I thought this was common knowledge, but just wanted to make sure the content was the reason behind the animosity and not that the information used to be inaccessible....)

Perfect example of this extension in action: search for "how to remove ear pads from HD555", and I get a link to this page as the third result: http://www.fixya.com/support/p516634-sennheiser_hd_555_consu...

Totally useless!


> Experts exchange answers aren't nearly as bad as things like yahoo answers

While yahoo answers etc. might provide bad answers, experts exchange provides no answers at all. All you see is an open question with a nasty subscribe button that will only work if you haven't already used it more than 30 days ago.

So experts exchange is more than useless: It's just plain advertisement without any added value. It's the kind of stuff you expect in the ads section of a search engine, but definitely don't want to show up in your search results.


Well EE is nasty and I hate it, but they have the right if they choose to do so. Anyway they didn't steal the contents from anywhere. That being said, I'll probably add EE to my block list :)


Just scroll down past the stupid paywall when you come from google or bing. Or become an expert for free http://bit.ly/EEfree I have been an expert there (not an employee) since 1998 and it is a great site. For a while now there are free articles and blogs too. I agree with the haters that the paywall should go away, but that is the decision of the owners of EE and those in the know can get full ad-free access by answering approx 3 questions a month. It took me more effort to have a reasonable flair at SO...


> but they have the right if they choose to do so

I don't mind their business model, but I can't understand why they appear near the top of so many search results.

Either they are very good at (mis-)using SEO techniques, or there are really many websites linking to them. However, I personally find it hard to believe that any author of a blog article or forum entry links to an EE answer voluntarily.


To third this, scroll down. :)


I installed the extension and blocked experts exchange first thing ... not sure if I'll even use it for any other sites ...


I immediately dropped velocityreviews.com, bytes.com, and about.com too.


ehow.com, ezinearticles.com. I'm sure there are many others, but those piss me off daily.


Yeah, those and WiseGeek for me. I keep on getting WiseGeek in searches (especially if I search for a question, which I occasionally do) and the content is complete and utter crap.


eggheadcafe (example: http://bit.ly/fxIaKE )


Snap. I wonder how many others did the same.. going to guess a reasonable number.


Definitely misread that as:

ExpertSexChange


Yup, that’s why they added hyphens to their domain a few years ago


Touche! I always get frustrated upon knowing that the answer is blurred and I go "Ack expertsexchange"


OMG that was the first domain I blocked.


What role did Hacker News play in bringing about this feature?


Thats a very important question, as without an answer it seems like the original comment is just excellent Public Relations. *Did not mean to imply that it is PR :)


I asked a member of the webspam team to prototype the extension after reading this Hacker News comment on January 1st: http://news.ycombinator.com/item?id=2058408

We've absolutely done similar things in the past, but that comment was the spark behind this most recent Chrome extension. I hope that's specific enough. :)


I think the fact that Matt frequently comes in here to discuss Google's search results shows that it is not just PR.

Unless you want to categorize everything Matt says publicly (whether here, on his blog or on twitter) as PR...


Market validation. :)


Thanks for listening to our requests!


Thanks Matt.

Will you be offering a Firefox extension?


This is great, thanks Matt.

After using this for less that 12 hours and I think it's fantastic. I also think it could be vastly improved upon. What if you added reddit style "up" and "down" votes to my results?

Maybe it sounds silly at first pass, but if all my votes get passed back to Google, your algorithms could learn from an incredible crowd-source treasure trove of knowledge.

A black list wouldn't just help you find the bad ones - but up-boats could help you suss the good ones.

There's the concern of DIGG style voting blocks, but perhaps even that could be detected by sufficiently sophistimatacted algorithms. (detection of pairity in voting that fell outside of statistical norms could be downgraded in quality).

I mean, sure, I'm already gaming the system in my head ... automated creation of accounts, automated up-boating (which I suspect happens on Reddit as well, despite being bad "reddiquette")

OK. Actually, maybe it's a terrible idea. Like DIGG with only down-votes. Seems like a potential treasure trove though.

Right?


In addition to removing the blocked site from the search results would it be possible to be able to remove a site's contribution to pagerank? I feel that this would probably be more effective in the long run...


Matt do you think this will be abused at all? I.e. organized blocking of sites on mass to push it down rankings?

Is your block list something that will be synced so that it is available everywhere you use Chrome and eventually just synced with your Google account?


On top of counting the block votes, Google could check if the site is crippled by AdSense links. This could filter some shot-down-my-competitor efforts.


Do you plan to release the list of blocked sites to the public?


1. http://bit.ly/gTADhE

2. Click Install, close page

3. Open each of the links below in a new tab, click block on the first result

4. Win.

http://www.google.com/search?q=Mahalo

http://www.google.com/search?q=ehow

http://www.google.com/search?q=experts-exchange

http://www.google.com/search?q=livestrong.com

http://www.google.com/search?q=answerbag

http://www.google.com/search?q=bills.com

http://www.google.com/search?q=chacha.com

http://www.google.com/search?q=associated+content

http://www.google.com/search?q=efreedom

http://www.google.com/search?q=questionhub

http://www.google.com/search?q=squidoo.com

http://www.google.com/search?q=about.com

http://www.google.com/search?q=yellowpages.com

----

Edits: fixed formatting, added suggestions

This method is fine. The actual data sent to Google when you block a domain does not contain the search query (or the referrer).

This is what gets sent when you block a domain:

  http://www.google.com/gen_204?atyp=i&oi=site_blocker&ct=addToBlocklist&ei=[CSRF-cookie]&cad=mattcutts.com
and unblock:

  http://www.google.com/gen_204?atyp=i&oi=site_blocker&ct=deleteFromBlocklist&ei=undefined&cad=mattcutts.com
(Interestingly the CSRF token is broken when unblocking.)


A good list :)

A couple more:

http://www.google.com/search?q=wisegeek.com

http://www.google.com/search?q=bizrate.com

----

My reasons for suggesting that these are blocked:

"What are Some Animals Commonly Mistaken for Dinosaurs?" (http://www.wisegeek.com/what-are-some-animals-commonly-mista...)

"How Do I Care for a Rose Breasted Cockatoo?" (http://www.wisegeek.com/how-do-i-care-for-a-rose-breasted-co...)

"Why Does Alaska Have Really Long Days During Some Times of the Year and Really Short Days During Other Times?" (http://www.wisegeek.com/why-does-alaska-have-really-long-day...)

- should be fairly self explanatory. Plus it only took be ~45 seconds to find these examples of journalistic <sarcasm>brilliance</sarcasm> (there's MANY more such examples). Plus they are - IMO - breaking AdSense guidelines since the Google link unit ads are EXACTLY the same (in color/style and :hover behaviour) as the in-content navigation links, which is a direct breach of AdSense policy. You'd think the AdSense team would be aware of such a big violation..

--

And as for BizRate, it's a 99% autogenerated site with little (if any) unique content. Plus most pages have at least one keyword stuffing area (such as http://www.bizrate.com/mens-shoes/dress-shoes/ - see the 'related searches' bit on the left, for a quick example).


Just a remark - Experts-Exchange user generated content. If something is copied from elsewhere and not attributed, it will be attributed or deleted and the poster told off.


I used to write for eHow and the like, and I fully endorse blocking them. Thanks for the list!


On the subject of eHow, here's another reason to consider blocking them, Google for "no text to spin" (include quotes).

That's an error message from bots that are spamming random wikis. I wrote more about it here:

http://news.ycombinator.com/item?id=2215993


To be fair, there isn't a scintilla of evidence that they're involved with that. Some black hat is just using them as feedstock for content spinning. Happens to Wiki and my blog all the time.


Very possible. I gave more mention of that possibility in the writeup.

There are some weird things, too. I see what might be links between mystery numbers in the spam pages and parts of their URL, and I find it odd that they're the only site listed in those error messages, and there aren't a lot of those errors. Which may indicate that they're not deliberately blocking the spambots. But I will concede that it's weak evidence so far at best.

On another note, who is using your blog as feedstock? Can you give a URL to a spun copy of your stuff?


Doesn't this tell Google "Mahalo is a spam result if I search for 'Mahalo'"?

Seems like you should do this when you get one of these and you wanted Stackoverflow.


I don't think it records which search first resulted in the block, only that a block exists


about.com actually has some interesting information from time to time.

Secondly, the first link that appears for me when you search for experts-exchange is wikipedia.


About's atheism site is pretty good. I wont block about.com they have good information.


The hiding functionality doesn't interest me, this feature has been available for ages and ages in GreaseMonkey scripts and the like. But sending the information to Google, with the hope that they will eventually use it for something, interests me greatly.

Unfortunately, i don't and can't use Chrome. If all this does though is send a URL request when a site is blocked, it should be extremely trivial (for someone who is smarter and has more free time than me, probably) to re-implement this in a Firefox-compatible form, right?


You forgot bytes.com and bigresource.com, holy fuck I hate them.


I'd put 'osdir' on that list...


[I run osdir.com]

Just to be clear we're not scraping content nor SEOing to death. We've been running our mailing list archive for almost 8 years.

To quote Matt_Cutts somewhere else in this thread: 'because USENET/mailing lists don't always have a centralized/canonical location on the web, which makes dupe content more of a potential issue. It's not the usual "Website X copied my website" scenario.'

If you dislike us that's cool, but just know we're not showing in your SERPs because of scraping & SEO trickery. A lot of people, maybe not you, really are finding their answer there. There's a lot of solid answers in inboxes.


Very first site I blocked. (Would have been eFreedom, but I haven't seen them in the results for awhile. Thanks Google!)


Ask.com and Answers.com are bigger culprits than all of your list put together.

google this: site:answers.com "can you answer this question"

32,200,000 of their 103,000,000 answers pages DO NOT HAVE AN ANSWER! They are spamming google’s users to get them to build answer.com’s database of answers. (and surrounding the non-content with heavy ads!)

For the pages that do have an answer, answers.com has gone the way of answers.yahoo.com. (I guess you forgot answers.yahoo.com too)

And, BTW, ask.com is just regurgitating answers.com!!

google this: site:ask.com answers.com

or just surf around ask.com to see that they just regurgitate ALL of the content farm results, which makes ask.com a "content farm" farm

your list was a good start, but you missed the biggest (by far) offenders on the internet.


If these sites are as bad as you claim, hacker news users will easily be able to spot them and nuke them from their results... I have a sneaking suspicion that those of you passing around lists of sites have ulterior motives.


My ulterior motive is I'm sick of seeing crap sites like bigresource.com and bytes.com. All they are is scraped/stolen content wrapped around AdSense, and I don't even know how many times I've clicked on them without thinking.

bigresource is the most annoying, one time those assholes put a fricking CAPTCHA in place so you had to prove you were human to get to the source link they STOLE the content from.



What's wrong with about.com?

I generally find about.com has high-quality content giving an overview of a topic, much more in-depth than mahalo or eHow.


I can see where he's coming from: it has information on a lot of different topics and it's full of ads, therefore it's an evil content farm.

Just like how someone that weighs the same as a duck must be made of wood, and therefore is a witch.


They may not do it now, but they used to be as bad as Associated Content.


For the extension developers: How about a way to import lists of sites like this?


1. Does this remove results from each page, or from the resultset? In other words, if 7 of the first-page results are blocked, will I see only 3 results on that page?

2. Any plans for a Firefox extension? I'm willing to install Chrome just for running Google searches, but would rather add it to my main browser.

e: After a month or so, I would absolutely love to see the top 10 or so blocked domains. It's OK if you can't do this, but it would be interesting/amusing.


Right now, you'll see only three results. We'll look at refreshing, but in answer to #2, we're also looking at putting this more directly into Google's search results.

I think having block options directly on Google's search results is the right long-term answer. But this extension lets people clean up their personal results while sending block data to Google that we might be able to use as a signal to improve overall search quality.

Thanks to everyone at HN who poked us by asking for this, by the way.


> We'll look at refreshing, but in answer to #2, we're also looking at putting this more directly into Google's search results.

Don't worry about refreshing -- I'd much rather see three good results than 7 bad ones. The idea of being able to block content farms entirely is literally making me giggle at my desk. Focus on that!

Thank you, thank you, thank you x1000 for this.


I've been using it for a few weeks while we tested it, and it really does feel nice to block a site that you never want to see again. :)


After integrating this feature into Google search, a nice enhancement would be to allow sharing of people's personal exclusion list. Let the community help clean search!


That's the first step in establishing a community curated subscription service. That's how AdBlock works and it works great.


#2 If you don't want to wait for Google: http://www.customizegoogle.com/


OptmizeGoogle seems to be more up to date:

https://addons.mozilla.org/en-US/firefox/addon/optimizegoogl...


The website used as an example in the first screenshot (http://thecontentfarm.tumblr.com/) just made my day.


I made the screenshots this weekend, so I threw that in there. http://thecontentfarm.tumblr.com/ is pretty funny.


Why do I need to use Chrome and then an extension if this is being offered by Google?

Make this a google labs feature directly for Google itself in the personalization options.

(also please make it available via a URL option, not just cookies or javascript)


I'll take the version that walks today over the version that runs in 6 months.


This is step 1. You're asking about step 5. They'll get there, but they have to go through step 1 first.


Well played, Google.

Not only did you just make your search engine 50 times more valuable to me, but you've just ensured I'll be spending almost all of my browsing time inside Chrome.


Why make this a Chrome extension rather than a google labs type feature?


The Chrome extension was faster to roll out and lets us iterate faster as well because it's outside of front-end pushes. We're looking at offering block links in the search results too, but that code takes longer to write, test, and launch.


It seems we've come full circle -- features easier to push out in client software than through modification of server code.


Any chance we can get the blocklist transfered via sync in the next iteration of this plugin?


I second that suggestion. It would make the extension that much more awesome to not have to block sites on each device. Might also help to lower the noise on Google's end, so they're not seeing multiple blocks on the same domain from the same users, though I'm sure they account for this already.


It's a bit difficult to implement sync of arbitrary data in a Chrome extension now. The extension developer has to setup their own storage and implement their own sync code. Either that, or some developers just store their data in the bookmark system, which has its own problems (please don't do this Matt!).

We (the extension posse) are working on making this really easy to do [1], but nothing exists yet.

[1] http://crbug.com/47327


Aaron, thanks for stopping by with an update on Chrome sync--much appreciated. And thanks for making Chrome extensions fast/easy/powerful to write.


I think that's a good longer-term option. We wanted to get the Chrome extension launched to start getting data though.


So HN, what sites are we all blocking?


My first round includes eHow, Mahalo, Associated Content, Experts Exchange, Squidoo and Examiner.

Update: I added answers.yahoo.com as well, which has consistently terrible content and yet is often highly ranked.


Experts Exchange used to be a scam, showing fake results. But now, they just roll them to the bottom of the page. I don't endorse them or like them, but they've had decent quality answers several times.


They play too many dirty tricks for my tastes (for example, it's only at the bottom of the page when you come from Google)--I won't give them any of my page views. I have no idea why Google doesn't bust them on what is straight forward cloaking.


Oh, you're right. I agree, any site that displays significantly different content when coming from a Google search should be heavily penalized.


I really often get exactly the answer I need from Yahoo Answers, just to throw that in there...


I have only ever seen incorrect answers on Yahoo Answers, so our experiences definitely vary. That's the beauty of allowing custom blocklists though.


I've found that there are some more taboo (but valid) questions that are only ever answered on Yahoo! Answers, presumably because people aren't worried about their identity there.

For an extreme example: http://answers.yahoo.com/question/index?qid=20071113122412AA... (Warning: Taboo. And the answer probably isn't relevant to 95% of the posters here. :)


I found Wikipedia's answer to be of much higher quality (and properly cited).



I added +-hubpages to my keyword search in Firefox. I should probably look into an extension instead, but it works well enough. I like the idea of Google getting lots of searches for <random query> -hubpages


Experts Exchange.


As mentioned above - bytes.com and bigresource.com


Definitely Bigresource if you're doing anything with Flash/Flex.


ehow, experts-exchange, questionhub/efreedom/etc


experts-exchange.com


eHow. They appear to be connected with a spambot that's attacking random wikis. They might not be part of it, but even if not, I won't feel too bad about blocking them.


markmail, osdir. these sites basically mirror yahoo/google groups. Big source of frustration.


as replied up the thread...

[I run osdir.com] Just to be clear we're not scraping content nor SEOing to death. We've been running our mailing list archive for almost 8 years.

To quote Matt_Cutts somewhere else in this thread: 'because USENET/mailing lists don't always have a centralized/canonical location on the web, which makes dupe content more of a potential issue. It's not the usual "Website X copied my website" scenario.'

If you dislike us that's cool, but just know we're not showing in your SERPs because of scraping & SEO trickery. A lot of people, maybe not you, really are finding their answer there. There's a lot of solid answers in inboxes.


I see. I bear you no ill will. I just happened to one day start seeing LOTS of osdir and markmail results in my searches. And it's not even that I mind what osdir does (I did find answers there), but that it's UI is worse than the original site and I feel that the originals should be given higher weight.


Hey cellis, thanks. What I see in cases where we come up higher in SERPs is that the crawler just hasn't found the best content in google groups yet, if ever. I scratch my head over that one too, but ours is not to reason why. I don't see the index pointing to the same content in the landing pages typically either with my queries.

UI: LOL, yeah, it's a toss up here whether to make it prettier or keep it a bit old school.


markmail provides the best archives of Apache project mailing lists that I've seen. Many of the project sites link to the markmail archive. And it's ad free.

Compare for example the Apache archives of CouchDB (http://mail-archives.apache.org/mod_mbox/couchdb-dev/) to Markmail archives (http://couchdb.markmail.org/search/?q). The former is searchable; the other isn't.


devcomments.com. osdir.com, and mail-archive.com


efreedom, experts exchange, ehow.

Would be interesting to have collective access to what people are blocking.

I like the approach...being able to click and exclude based on the search results that pop up.


Assuming it's a normal extension and has to abide by the same rules that Non-Google Authored extensions do, the extension manifest indicates the extension doesn't have access to do any cross domain posts so all the filtering is done client side. Digging a little deeper, it looks like the blocked sites are stored in Chrome's LocalStorage, which if memory serves me correctly is somehow isolated per extension.

It should be relatively easy to listen in on the background page while the extension is running and write a script to extract the list of blocked sites or update it with a master list so you don't have to block dozens or hundreds of sites manually.

Not that I think everyone should blindly block everything everyone else does on HN; I personally loathe Experts Exchange, but I do find an answer I needed from them now and then.

I was more curious than anything.

Update: As "dsl" posted above, it does look like the extension makes a call out to a Google Endpoint to record the block as well, but I don't believe that call actually filters the data for you. That's still done client side. So it's probably best not to call the end point directly or update the blocked sites list directly, but actual use the extension as intended?


Yes they block client side (several posts from matt confirm it in this thread).

But you mention something important; for me I don't want to 100% block most of those websites, but to give them lower priority (never in my top 10 for exemple). Like you, as much as I hate Experts exchange sometimes they do have the answer when no one else does. This extension sems to solve that by giving a "show" link to display hidden results (when there are any).


As nolok mentioned, the blocking in Chrome takes place on the client side. We're looking at adding block links to Google's search results along with server-side code (which would let your blocks work on any computer you logged in on, for example), but that code takes longer to write and test than the Chrome extension.


Normally I don't post this sort of thing (try to focus on valuable content) but dear god THANK YOU!

I'm cackling maniacally while I block expertsexchange, Mahalo, and several other sites. I'm so happy right now.


Free karma points to whoever creates a corresponding Firefox extension...


There is a Greasemonkey script[1] which does blacklisting. It works by sending "-site:example.com" with every search query. The script does a lot more cleanups like removing link tracking and ads, so if you don't feel like using all the functions, look in the code and check out "RemoveBadLinks()" function. Maybe you want to comment all other public functions in init().

[1] http://userscripts.org/scripts/show/79742


Wouldn't that reach a limit pretty soon? I've commented on this before, but there's a max character length for the search box.


MaxLenght of the input field is 2048, so you can put lots of domains in it. Of course, it's not a complete solution but it works to some extents, specially because it filters out stuff on server side and not with a css' display:none.


There are a few grease monkey scripts that the same thing. I wrote my own that does this and also fetches more results to fill in the gaps left by the "This site has been blocked" entries, but sadly I forgot to back it up before I upgraded hard drives and wiped the old one.


"There are a few grease monkey scripts that the same thing."

Not quite. This sends the info to Google whereas the GM scripts just CSS display:none the links. In theory this is different since the optimist in me hopes that Google will do something positive with the "downvotes".


there's this:

https://addons.mozilla.org/en-US/firefox/addon/custom-google...

but for cross-browser, it's best to roll your own:

http://radleymarx.com/blog/better-search-results/

Personally, I'd prefer Google change the Google Custom Search pages to be more like traditional search. If I could have image and map links, I'd probably never revert back to regular search.

BTW - my personal filter has 172 sites so far...


would be cool if there was a subscription option where you can subscribe to some master list that gets updated by people you trust.

or just simply a bulk insert


My concern with an "AdBlock Easylist" style centralized block list is that it removes, or at least blurs the consensus feedback to Google of what is a content farm, what is a low-quality non-farm site, and what is useful.

A user above listed about.com as something they wanted to block, while I find them to be not very in depth, I wouldn't classify it as useless, and certainly not a content farm like livestrong.com.


Or at least an import/export option. But we'll need to see how popular/useful the extension is first.


Doesn't WOT (Web of Trust) attempt to do this?


I was just griping about this to my wife yesterday. The noise is drowning out the signal in my recent searches...


Please install the extension and let us know what sites are annoying you. You're exactly the sort of person we want that feedback from. In the process, you'll keep them from showing up for you again.


Oops? Maybe I am not referring to the same problem - Rather than identifying "content farms" I am hoping to block sites that are returned that do not have the search terms on the page returned, and therefore should not be in the results, in my opinion.

Is this Google's intention for usage?


Will do. It's already on three of our machines, soon to be seven.


Isn't anyone else kind of bewildered by this?

I mean this is kind of like if a kid pissed all over the floor in wal-mart, and when you notified an employee about it they gave you a mop to clean it up yourself.


It is more like being in a Walmart full of zombies and having all the Walmart employees killing zombies as fast as they can but then when you ask what is up with all the zombies they hand you a shotgun so you can help them by killing the zombies in your immediate vicinity.

On the one hand, in an ideal world there wouldn't be a large and ever growing number of zombies at Walmart, but on the other hand, thanks a million for the shotgun because I'd rather get to killing some zombies myself than whine about the fact that the Walmart employees aren't killing them fast enough while they are eating my brains.


Really? This is kind of like some sort of weird time warp, and the Yahoo mentality of "curating" the internet is back in fashion. Which is weird, because Google came to power because that strategy doesn't work.


I just know you’re going to get downvoted for being unappreciative, but I love the analogy (for its color). And you’re right, it’s a striking “solution.”


Having a big company behaving like a local one-- listening to the customer's opinions is really nice.

However, this is a feature that Google actually had. Why did you remove it? I accept the Search Wiki was not particuarly a success[1], but the remove option was very nice.

Alas, thanks for listening. I'll be waiting for the server-side option.

1: http://googleblog.blogspot.com/2008/11/searchwiki-make-searc...


I think users are more savvy about recognizing spam/low-quality/content farm sites than a few years ago. At least some of the previous features also had unfortunate UI issues, e.g. the default was to block a single url for that single query.


Positive move by Google, which should help the best content to rank in search.

However when Google start using this feedback data directly in their algorithm there are two dangers, which both can be overcome:

1. Abuse by SEOs trying to get their competitors blocked

2. Bad use of block data on a site wide basis in the search algorithm.  Eg just because certain pages on a site cause users to give block data doesn’t mean other pages on that site contain unique useful content.  High volume publishing or community sites will have a large degree of noise, but due to nature the cross section of society as members there will be also valid useful content in the seams.

One approach is to let individual users block sites forever (as the extension now) but use the block data on a page, not domain basis in the search algorithm. Individual pages can also increase in quality over time (the qoura business model) so changes in page content needs to be considered too.

The Internet is a better place with non editorial user generated content (hacker news is user generated) after all the human race mind based knowledge is a not all published online. Google just needs to figure out which pages really add value in that specific search no matter which domain they sit on.


You mean I never ever again need to see a search result pointing to experts-exchange?! This is the best gift ever! And it's not even my birthday.


A few thoughts and requests:

1) I'd love to use this tool but because of my job still need to look at the "official" search results from time to time. Is it possible to allow &pws=0 override the blocked sites so &pws=0 both session-based and browser based personalisation?

2) There's a lot of talk among startups about ignoring what users want because they don't know best. I worry that this move from Google might actually be detrimental to user experience. If you end up blocking a bunch of thin sites then the chances are high that you're not going to get better results for your search query, you'll just get fewer results (since usually Google only returns these thin sites when there's little else to offer). Also - as users get trigger happy and block a bunch of sites they may well be harming their own search experience. I'd love to hear your thoughts on how Google can mitigate against this? Especially when this rolls out to your average user rather than just the tech-savvy HN crowd.

Tom


If you don't allow the extension to work in incognito mode (it's opt-in by default), then you can just open up an incognito / private browsing window instead (which Chrome makes pretty low-cost). This might have the benefit of removing any influence on your search results due to Google cookies that would otherwise be sent.


I don't think the plugin works via cookies though does it? Might be wrong....


By default Chrome does not allow plug-ins to run at all in incognito mode. You have to explicitly allow the setting highlighted in this screenshot (available from the URL "chrome://extensions")

http://img6.imagebanana.com/img/0om09caa/20110215_135721_Sel...


Good work, squeaky wheels.

So: will this eventually be a search settings option once it is less beta or permanently an extension thing?


We'll look at offering this as a more direct option over time, depending on how popular the extension is and whether the data from it looks good.


Hey, I think I "called" this a few days back... yep: http://news.ycombinator.com/item?id=2199498

This little bit of successful nerd prognostication cheers me up more than perhaps it should, but oh well.


Although this is Chrome only this is a great extension that I believe a few people on this site wanted. I remember other people made a mashup but this looks like a slightly better solution. I wonder why they don't want to do this server side though?


I imagine releasing server-side changes to Google's core search functionality is quite a bit harder than releasing an extension. If nothing else, there are tons of tests that need to pass and there's probably constraints on speed regressions and such. This is a much quicker way to get the same data so that they can determine whether the data is actually useful as a ranking signal.


I feel like I just got a new upgraded Internet.


And we make fun of our parents for thinking that Google = internet. ;)


We do? I thought it was the blue IE6 'internet' icon?


E is for enternet.


>...explore using it as a potential ranking signal for our search results

Democratic censorship.


That's like calling the current situation "technical censorship" since websites are ranked by how they perform on the SEO level. They won't ban sites from google, they will re-arrange who is shown higher (or lower) based on what the users like.

Google has always been in the business of linking search terms with the corresponding informations that users want to see, this is no different.


What's the difference in Google censoring spam and users censoring spam?


I'll pass along an anecdote from Gmail, which also uses explicit user feedback to label spam emails. Some people come to Gmail and say "Why did you block the [mass low-quality] email that I sent?" It's a pretty good answer if the reply is "Enough people marked your emails as spam that the emails were considered spam."

The hope is that users are savvy enough that we can get a good signal out of the block signal.


This would only be a fair comparison if and only if you allow users the option to view the blocked search results.


One is censorship. other is free-will.


If this ends up being used as a "voting" system that factors into determining what sites show in public SERPs, folks could crowdsource competitor blocking using say Amazon's MTurk to knock competition off of Google, no?


Finally Google accepts the elephant in the room. Just a few days ago a friend while being interviewed had mentioned about search being ineffective for content farm sites and the Google employee was like "No Search is fine. Nothing wrong there". Finally e-how, about.com, expertSEXchange.com can RIP, atleast for me. I also hope that over time, Google will use this data from users via Chrome to somehow incorporate crowdsourced search results in it's results not just for one user but all of them.


I thought i was the only one doesn't want to see "experts exchange" in google search results. No, i'm not alone... :) :) Looks "experts exchange" is annoyingly very famous...


Nice. My first target - every local restaurant result I get that appears before either (1) the official restaurant website or (2) Yelp. Local search results are always gunked up with yellowpages.com and local newspaper spam. Also, I find it highly suspicious that urbanspoon has consistently better placement than Yelp, despite having consistently weaker content.


I created a site on app engine that essentially does the same thing via Google Custom search engine. Granted mine isn't as integrated or user friendly but still. The site is: http://blacklist-search.appspot.com/

Why can't Google offer something like this rather than only allowing it via a Chrome extension?


I think we can, but that sort of code takes longer to write, test, and launch. You'd want it to be really robust before you showed those sorts of options to 1B+ people a day. So the Chrome extension is a good interim step to see how much traction it gets and what sort of data emerges as a result


I know this is a legit extension from Google.

By why is this extension not marked as Verified author? https://chrome.google.com/webstore/detail/nolijncfnkgaikbjbd...

The "nolijncfnkgaikbjbdaogikpmpbdcdef" makes it look suspicious as well.


Hmm. I'll ask if we can fix that.


"If installed, the extension also sends blocked site information to Google, and we will study the resulting feedback and explore using it as a potential ranking signal for our search results."

If that happens, what's to stop this being used by companies to influence the results to get rid of competitors?


Did you even read the snippet that you quoted? They will "study the resulting feedback and explore using it as a potential ranking signal for our search results."

They are well aware of the ways in which such a signal could be gamed. That's why they will "study" and "explore" ways to use the new data as a "potential" signal.


Was the twatty attitude really necessary?

I asked a simple question expecting someone with industry-appropriate know-how to expand further on Google's theoretical/possible intentions.

We already have an entire industry dedicated to gaming the search engines ranking signals/etc; we have some very talented black hats who undermine existing attempts to keep SERPs clean. I don't think it was that stupid a question.


I imagine that's where the "we will study" bit comes in.


I was just bitching about experts-exchange last night and wanted this feature. Thanks for sharing.


Thank god. Goodbye to: devcomments.com. osdir.com, and mail-archive.com!


osdir isn't actually too bad -- mailing lists are very poorly indexed by google and osdir exposes some of the data to be indexed.


Looks promising, but I'm unsure about the security angle. Google has just added a way for anyone to "DDos" competing websites into oblivion. I hope there are measures in place to prevent that.


They aren't yet using the data for ranking.


They are or probably would be doing that. See this comment by Matt Cutts: http://news.ycombinator.com/item?id=2218531


And you really think no one in the google search team has thought of how it could be abused ? Come on let's be serious here, they are not going to use these results verbatim or even block websites for everyone just because some users wanted to.


Even at that, just look at all the problems Reddit etc. have had with bury brigades working to game the system. I imagine with the reputation defender styled companies out there right now, this would be a natural service.


When I first read the headline, I assumed it meant blocking all of Google's sites from the SERPs (Youtube, blogger, etc). Perhaps this would be a nice way to rule out any potential nepotism.


You can block google.com in the extension if you want (we didn't put in any code to prevent that), but your search results might behave in weird/unexpected ways if you do.


I note with mock irony that this works fine on blekko.com in any browser. You push the "this is spam" link in the SERP results and poof, its dead to you.

(Disclaimer: I work at Blekko)


I know this questions is asked in different forms couple of times in the comments, but here it is again: 1. Will there ever be Firefox extension which do the same?


Awesome extension!

An option to hide the icon from the toolbar would be nice.


To hide the icon, you can right-click on the icon and select "Hide button." To bring it back, click Wrench->Tools->Extensions and click the "Show button" link.

I love Chrome. :)


I don't see that option on Linux v.8.0.552.224. But maybe I need a Chrome upgrade. Thanks for the tip, I'll try and get it working!


I'm using the dev version of Chrome on Linux, currently "10.0.648.45 dev". The developer channel has always been surprisingly stable for me.


I think this tool will be "Very Misused" by a lot of people just to SQUASH their closest competitors. I can see some companies hiring "paid blockers" to squash competition websites and even in the Search Engine world you watch Microsoft and Google block and report each others sites in an even continuing search engine war what a sad day for free speech George Orwell 1984 :(


Enlightened people seldom or never possess a sense of responsibility. George Orwell


Google should make one of these for Bing also - with the option to send google my Bing blocklist.


Why should I have to be signed in to my Google account to be able to use this functionality?


To attempt to prevent people gaming/spaming the system


Is it also integrated with WOT (Web of Trust) way to block and report bad sites? How if so?


I'm going to try it first thing. I hope all of my Chrome instances sync the block entries.


Finally I can block ycombinator.com Thank God!


Sounds like an excellent feature. Just seems kinda weird that Google would start using user-click data when they were complaining so much about Bing using it.


This is not comparable, at all.


Is there something similar for Opera 11?


answers.yahoo.com, here I come.


just wondering what happens if you block google.com? is that a meta-block


Is it possible to block Bing from copying search results using this? ;)


One does not establish a dictatorship in order to safeguard a revolution; one makes a revolution in order to establish a dictatorship. George Orwell


What can you do against the lunatic who is more intelligent than yourself, who gives your arguments a fair hearing and then simply persists in his lunacy? George Orwell


Progress is not an illusion, it happens, but it is slow and invariably disappointing. George Orwell


If you want a vision of the future, imagine a boot stamping on a human face - forever. George Orwell


There are some ideas so wrong that only a very intelligent person could believe in them. George Orwell


Freedom is the right to tell people what they do not want to hear. George Orwell




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: