Hacker News new | past | comments | ask | show | jobs | submit login
Dear Google: please let me ban sites from results
438 points by nervechannel on Jan 6, 2011 | hide | past | web | favorite | 208 comments
Given the current high-ranking thread about spammy sites in Google results, it strikes me that a very simple solution would be to let logged-in users blacklist sites.

Bam, no more wareseeker or efreedom.

This would solve a lot of people's complaints in one fell swoop.

There are greasemonkey etc. scripts to do this, but they're tied to a single browser on a single machine. A global filter (like in gmail) would be so much more useful.

Would this be particularly hard to do?




I see a lot of people asking what happens when a group of people downvote a site just to ruin its ranking. Sure that's a problem, but there's an easy solution on Google's end: your blacklist only affects you. Yes, that means all of us have to hide efreedom ourselves. Doesn't seem like a problem to me...

Plus, we are talking about a company whose core business demands that it can identify groups of bad-faith voters. Given time, they may find a way to incorporate this data safely into the ranking data (if anyone could, it would be Google).

And I know there are extensions to do this (mine mysteriously stopped working recently), but doing this on the client-side in a way that's bound to a single browser install just seems wrong to me, especially for Google.


I think a personal black-list would be ideal initially as those most motivated would be most helped, i.e. the majority of people who might not care about the status quo are then not impacted at all.

As mentioned above, then introducing the shared-ranking via the social graph would be the next logical step. It could be something opt-in'd to ease adoption.

Then, ideally (and this is my personal 'white whale' problem) it would be great to imagine something where the user could whitelist through no action of their own rather than have to do any work to block, i.e. use the result set 'hit' of what's clicked in the results to act as a personal ranking upvote.

There's some interesting engineering issues of per-user indexing though, but hey, you wanted to work at Google right?


The blacklist does not only have to affect me, just throw in the blacklists of my social graph too. These are the people I trust.


Which brings up another point: I think that if anything is going to threaten Google in the coming years, it'll be the quality of their social graph. Gmail gives them a lot of data, but if my inbox is any indication a lot of it is somewhat ambiguous. Aside from that their social apps haven't done too well (in most places).

Facebook, on the other hand, has developed a system where nearly every user activity creates a new easily processed and meaningful connection between users or out to the web itself. And those connections are probably closer to representations of some kind of trust than "I email that person a lot".

Anyway, I'm not saying the sky is falling for Google, just that search appears to be changing for the first time in awhile.


I don't know about you, but my social graph is pretty diverse. I wouldn't trust all those very different people to make an important decision like "what sites should be visible to me in search engines."


You're right. Others should not be able to hide sites from my search results, but their decision on blacklisting sites can affect the ranking of sites in my search results.


Nobody said you had to trust everyone in your social graph. I definitely wouldn't, but that doesn't mean there aren't a few select people that I would trust.


"Yes, that means all of us have to hide efreedom ourselves. Doesn't seem like a problem to me..."

efreedom is monetised by Google ads. Might seem like a problem to Google.

Let's say it starts with personal blacklists. Then trusted lists that you can subscribe to (AdBlock-style). Then word spreads and enough people are using it such that AdSense revenue drops 20-30% or more?

(IME, CTR on ads is much higher on these content-light sites than it is on more reputable sites.)


To be honest, I think this is the reason Google doesn't have this feature. The sites everyone wants to blacklist are the spammers that game Google search and show Google ads. If they don't get traffic, they don't show ads. If they don't show ads, Google doesn't get that money either.

It's to Google's benefit that people end up on these pages, see a ton of ads, and then click on one out of confusion or desperation.


That's strange, though, because it should be clear to them that the long term effect of this is pretty dangerous for them - if their search results were always the best, then they would be largely unassailable in the search market, but as it is, they're decreasing satisfaction with their product. It's hard to improve everyone's perception of your product once that's happened, and it opens them up to usurpers.


It does, but Microsoft's thrown loads of money at the problem and not made huge inroads so maybe they figure it's worth the risk?


Yes, but that would be pretty evil of them, don't you think? ;)


This is exactly what I meant. Lots of people have added the idea about crowd-sourced re-ranking based on blacklisting, then said "it'll never work..."


It seems to be a particular affliction among our group (by that I mean anyone who spends a lot of time here) that once we learn to apply one revolutionary/disruptive idea, we can't stop even when we probably should. I'd say it's because we're used to trying to think of how to scale-up every idea, but I've been accused of armchair-psychology before... :)


We can also look to the AdBlock extension. They have prefilled lists that you can customize, so it's a "shared ecosystem" that you subscribe to, then customize yourself.


I think this could be modeled after Gmail's social spam filter.


It should be pretty easy to set up public block lists. The ones that are honest about their methodology (of which sites are spammy) would win.


No, it's not particular hard, but it will make the problem worse.

Why?

99% of users are non-tech oriented.

Those users will not really be aware of the specific problems with the search results, they won't understand the concept of a good vs bad result and they certainly won't bother to tweak/ban/filter their results.

The 1% that do care and are currently being vocal about it will start filtering their results and they will perceive that the problem is solved. They will stop making a fuss.

So now, the complaints have gone away, but 99% of users are still using the broken system, so the good sites that create good original content are still ranking below the scrapers and spam results for 99% of the users.

The problem must be solved for all (or at least the majority) of users.

(And you can't take the 1%s filtering and apply it to all users in some kind of social search because the spammers will just join the 1% and game the system)


The problem must be solved for all (or at least the majority) of users.

Perfection being the enemy of good enough, and a common and valued and traditional mechanism to delay product shipment.

And Google might well be able to utilize information from that 1% of users that have sorted that out - 1% of a Really Big Number of searches, factoring for the folks looking to game the search results (downward, in this case) - to provide feedback back into their search results.


This goes back to the traditional engineering parable: is it better to create a million dollar car that gets 900mpg (gasoline), or to make a 5 dollar widget that adds another 5mpg in every car?


>99% of users are non-tech oriented.

I disagree. Let's call it 95%.

>Those users will not really be aware of the specific problems with the search results, they won't understand the concept of a good vs bad result and they certainly won't bother to tweak/ban/filter their results.

So have only people that have enabled the advanced features of Google search ban sites. All of a sudden only people that "get it" are the ones that can ban.

>So now, the complaints have gone away, but 99% of users are still using the broken system, so the good sites that create good original content are still ranking below the scrapers and spam results for 99% of the users.

So we need to use the votes to stop the spammers.

>(And you can't take the 1%s filtering and apply it to all users in some kind of social search because the spammers will just join the 1% and game the system)

Sure you can. If you couldn't then Reddit would be a wasteland of adds, but it isn't. They only have 4 or 5 engineers there and they can write code that will stop vote rings, let alone Google.

It's actually a pretty simple exercise to stop vote rings, unless the anti-vote ring code is open sourced, but even there it should be possible.


Do you think 99% of users are too stupid to click 'report spam' when they get a spam email?


Not "too stupid" no.

I think 99% of email users have not been adequately trained in why or how they should report spam, and even if they were I think most of them would still not care enough to actually do it with any regularity.

When pushed many may acknowledge that they know it exists, they will probably even be able to find the button when asked if given a chance. But they won't remember to do it when they see spam, they'll just ignore it and move on to the messages from people they know.


Do you think this could be improved by alternate wording? Instead of reporting 'spam', ask the user 'was this useful?' or 'did you want this?'.

After all, the real goal is giving people a better, more relevant experience, detecting and removing spam is just one facet of that. Whether it's email or search.


In general yes, I think improved UI can guide user behaviour to more desirable outcomes.

However I don't think it's as simple as changing the button text. Even a process driven UI, like a wizard style interface where you have to click next to progress through each step might work, but users very quickly become immune to dialogs. They don't read them, they just evolve the actions that get them to their goal the fastest, and the users goal does not include reporting their spam.


Maybe something more automatic : a user makes a search, clicks on a link, if it is spam, he'll probably hit the back button within a couple of seconds. The average time spent by a user on a result before going back to Google could be used a as metric of quality.


I'd be very surprised if Google does not already implement this in some form.


I don't think changing the wording will solve the problem entirely. On the rare occasion that my wife receives spam via gmail she'll simply archive it. The action to hide an email you never did or no longer do care about is so common and so easily accessed by a well-learned key cut that it doesn't occur to her to do otherwise until after the fact; she is well aware that marking spam improve spam filtering, of course.


I just told my wife: think of that button as being "Report Abuse".

She's been applying it ever since quite efficiently.


Oh, for sure. My wife knows what the button _does_ down into the very statistical application of it. I think in her case the issue is a UX problem: it is much easier to archive than it is to report spam. Gmail also makes it far more common to archive, as well.

Shift+e for "report spam" would be perfect. Same key, just a bit chorded.


People would always be trying to game such a system. If you worked at company A, and search for keyword shared with company B, lots of people would be tempted to down vote company B in results


Not all spam is so obvious, both in email and search results. These are all assumptions, but I think it's safe to say the average person identifies the email 'Grow your Pe Nis like a Woman' (true story) as spam. However, the less obvious 'Enter to win a trip to Hawaii' still fools many people. Think about how effective the 'I am a Nigerian Prince that needs help out of my country and then I will give you 1.2 million dollars' scam has been over the years.

With search results, the spam is more often than not Made for AdSense sites that the average user doesn't realize are pure garbage. Then there are the mass-produced content sites like eHow that most technical people realize are worthless, but the average user loves. It isn't often you see Viagra sites popping up in searches for woodworking. It does happen occasionally though.

So no, I am pretty confident a majority of users would not utilize effectively a feature like that.


Frankly, I'm not fussed whether the majority of users would use it right.

I posted this as an entirely self-serving request for a way to make my results cleaner.

Also: If the majority of users do like eHow, then that's a sign it's not spam. But a lot of 'power users' consider it spam. This is an argument FOR personalised blacklists, not against them.


Exactly. The point is that users who consider eHow useful, don't filter it, and users who do, do. Let the user decide instead of doing it for them.


That's the point. It might make your results clearer, but it doesn't solve the problem for good original content producers which is the real issue here, not the individual.


You report spam when you know it's spam.

And when it's about search result. People are browsing and clicking through adsense filled "landing page sites". Most of them think that it's their fault that they couldn't find the thing they were searching for.


I think you are ignoring a very obvious fact, those 1% that are no longer 'complaining' are really doing so in an automated fashion with their blacklists. Further I think the complaints will be even better because people are more inclined to a fix a problem if it is easy to do so (adding it to their blacklist) than sending an email saying please remove this from my search results (and hoping it gets removed in the future), or doing an ad-hoc in browser solution that does not give feed back to google.

What I think really needs to be exploited is a ring of trust type aspect, I'd like to have the Hackernews ring where all us on here work together to remove the spam from our results and let's Google see what are taking out, maybe that will help them improve their algorithms.


Google already has reading level data per domain (append &tbs=rl:1 to any site search)

Why not apply that reading level algorithm to users gmail data and public social network profiles, estimate the users IQ, and then those at the top of the pile are given "result burrying" moderator privileges.

Confirmed user accounts (cell phone verification) combined with other algorithms, such as profile age and activity, could make spamming sufficiently complex to de-incenvize all but the most illicit spammers.

Users at the bottom of the IQ pile (non-logged in users based on past search data and geo-location socio-economic status) don't even get the option to bury results. Which, by the way I think is more like 20% of US internet users than 99%.


That's not true. Google is already having a Spam button in Gmail. The same idea can be applied here.


Yes that would be good. They could then look at the number of people blocking certain domains and de-weight them in the global results.

Traditionally google seem against human powered editing (as this would be), but I think as the black hat SEOs run rings around them, its needed badly.


An extremely easy way for a bunch of people to get together and destroy someone's ranking? That doesn't sound like such a good idea.


4chan would go bananas.


If this wasn't at all preventable the idea of adwords would never work, people would just be running up competitors costs.


I don't know how many people get away with it, but that sort of thing definitely happens. Google claims they filter all the bogus clicks out and you aren't charged for them, but you kinda have to take their word for it since there's virtually no way to verify.


Given that Google gets hundred of millions of search and visitors, then you'll need a hundred of thousands of down votes to get black-listed. No black hatter can really do that (create a 100K account/IP to avoid Google radar and down vote the website).


Hundreds of millions of searches and visitors in aggregate: yes.

Hundreds of millions of searches and visitors in any one keyword niche: not so often.

Many websites live off a handful of visitors a day coming from a few core keywords and associated long-tail traffic. For a keyword that only gets 100 searches a day, it wouldn't take many down-votes to affect the rankings of the relevant sites.


For a keyword that only gets 100 searches a day, it wouldn't take many down-votes to affect the rankings of the relevant sites.

Why do you assume a flawed implementation?

Naturally there would be thresholds. There is no reason to devalue a site that's only displayed in 100 search-results/day at all.

The sites we want to hit are orders of magnitude worse at polluting the results. We're talking about the Mahalo's and expert sexchanges.


Is experts-exchange really a spam site? I always thought they had real users who ask&answer questions. But that they are just a crap site that hides this behind a paywall (I'm aware that the actual content is at the bottom of the page).


I don't know their current methodology, but they got a bad rep because they used usenet postings to answer questions. You could find their answers by searching usenet groups. So they were setting themselves up as a Q&A service, when really they had searched groups for questions and used the answers as though the answers were coming from themselves.

Basically they were leveraging off other people's work and charging for it.


That sounds like trying to excuse junkmail and spam because someone out there finds it useful and orders the products being offered.


I'm not trying to do that. I'm just arguing that classifying it as spam is not fair. I remember (before SO) finding some useful info there from time to time. Whereas junk mail trying to sell me a bride from Russia for 50€ is defenetily a hoax :)


Google seems to like building systems that automatically scale across all keywords and niches.

For example, for a low competition keyword, 1 link can make a big difference to a site's rankings. For a high-competition keyword it takes many many more links to change anything. It's the same algorithm affecting the rankings in both cases, just in the latter case there's a lot more data being used as input.

That's the way they seem to do things, so I'm guessing there's a fair chance they'd take a similar approach to the influence of down-votes on rankings, if they went down that road.

There'd have to have some sorts of thresholds, but there'll always be points above which the algorithm can be gamed, just like pretty much every other aspect of their ranking algorithm can be gamed.

Right now, in a low-volume-keyword niche, a sleazy operator can kill the rankings of competing sites pretty easily by buying them a few dirty links and letting Google's algorithm do the rest. If a site hasn't got lots of quality links pointing to it, and let's face it, the vast majority of sites don't, it's pretty easy for someone to kill its rankings.

A lot of people say that it's impossible to affect the rankings of someone else's site, but that's simply not true. You can't easily affect the rankings of a big established site with lots of good links, but a little small-business site? The reality is it's pretty easy. I don't think Google have any interest in quashing that particular myth, because the reality is actually kind of scary for small business operating on the web.

(Apologies if I'm going off on a tangent there. You are absolutely right in pointing out that I was assuming a flawed implementation.)


    Google seems to like building systems that automatically
    scale across all keywords and niches
They haven't been doing a good job lately.

    For a high-competition keyword it takes many many more 
    links to change anything.
Not that many, and do you think that's a problem?

Search "viagra": http://www.google.com/search?q=viagra

This is the third result (after wikipedia + viagra.com which have a hardcoded boost): http://www.genericviagrarx.net/

Here's a sample of links to it: http://www.google.com/search?q=link:genericviagrarx.net

I.e. there are thousands of forums / blogs where you can post your link to (even in a proper context, the more popular the subject the better). Hire a couple of offshore workers for $5 / hour, and in a week you'll have thousands of links pointing to you.


Google's link operator is not reliable - it intentionally hides links to confuse SEOs. Take a site you own, for example, and compare the number of links that webmaster tools shows for it with the number of links that Google's link operator shows for it.

That site you point to has many more links than Google shows. You can check yahoo to get a more realistic count:

http://siteexplorer.search.yahoo.com/search?p=www.genericvia...

But even that is probably only indexing a small fraction of the links that are likely to be pointing to that site. Generally when a site has a few thousand spammy links, they've usually got tens of thousands more that haven't been indexed.

Now I'm not for a second saying that that site should be ranking where it is, just because it has a lot of links. I really wish Google did a better job of discounting spammy links.


Most botnets would easily have more than that number of computers.

So yeah it can be done.


It would also be a significant undertaking if the down-votes are tied to google accounts.

Not only do you have to bypass the captcha, but also the heuristics that google could (does?) employ to detect fake accounts.

If the votes were weighted by account activity then the spammers would not only have to sign up and vote; they would also have to keep all these accounts busy with fake queries or other means.


Exactly. Google accounts have so many uses (gmail, youtube, google code, google docs, picasa...), it would be easy to detect how "active" / "real" an account is.

Of course, anything can be gamed... But it's better than nothing.

If I were to create a web-app with user-generated content, I would try to create a "trust" system based on a "reputation" / "real person" metric. It may come from his StackOverflow / reddit / HN reputation, his Google Account activity, Facebook activity...

Before buying a product based on a review I read on a forum, I always look at the author's post count / history / reputation. That's why I like HN / Reddit and their Karma systems. We need to automate that


Gmail accounts and capcha solving services are cheap compared to the profits you could make off of getting your site ranked #1 for a profitable search keyword.


Not necessarily when you factor in that you may need tens of thousands of accounts to suppress one popular site. - And in a profitable niche you're likely competing with multiple sites.

Keeping all those accounts active, cycling out dead accounts etc. could take quite an ongoing infrastructure investment.

Moreover google could give significantly more weight to accounts that look very real (i.e. actively using multiple google products for a long time) and thereby further devalue the botnet impact.

It's yet another arms race of course. But one with fairly good odds I would think.


Not really, you just have to steal the account of the person who owns the computer.

If you have a botnet, you shouldn't have much trouble doing that.


Sounds like a perfect challenge for 4chan


Perhaps, but I'm sure any legitimate site could contact Google and resolve the issue...


Google doesn't like to be contacted. Unless you're a (high) paying customer.


Sounds like a support headache. I'd love the feature but I don't blame Google for not just rushing out and adding this.


True, but as it stands, someone's ranking can be destroyed by not playing the black hat SEO game when your competitors are and allowing their crappy spam filled site to outrank you. Swings and roundabouts.


Anyone have any ideas why the clone sites like efreedom are ranking above Stack Overflow when SO's inbound links and reputation values are likely far better than efreedom's in Google's algorithm? I'm surprised that search engine optimization could do THAT much to a site's ranking. Also it's not like SO isn't doing the same kind of SEO themselves.

What I'm trying to get at is, with all things equal, let's say Stack Overflow and efreedom's SEO is on par with each other, shouldn't SO's reputation/inbound link ranks automatically trump things?


My understanding is that the clones take the material and modify it to have exact matches for phrases that people are likely to search for. The exact match causes it to rank higher for those searches.

SO is not editing the material for SEO, they just have whatever content the users generated.


Efreedom is deliberately manufacturing links, whereas SO may be just hoping for the best.


Efreedom has been the bane of my recent programming related searches, polluting my results to no end.


Do what I've taken to doing nowdays, for programming questions go straight to stackoverflow.com before Google. Or use google with site:stackoverflow.com


Or just don't use google anymore. I've switched to Duckduckgo and Blekko. So far, so good.


Or use stacksearch.org (to include serverfault et al.), or the custom search built into stackexchange (which uses Google behind the scenes as well).

stackoverflow has been my first port of call for programming queries for quite some time. If Google wasn't so filled with scraper junk that probably wouldn't be the case.


Just use this chrome extension, it redirects them all to the relevant SO page.

https://chrome.google.com/extensions/detail/gledhololmniapej...


Shouldn't the sites hosting deliberately manufacturing links also fall to the algorithm's reputation/inbound link values as well. We all know they don't, and that seems like that is one of the fundamental issues here. As much as I'd like to have a site black list, that doesn't really help the major issue at hand.


This would only work if 1) it was sufficiently painful to put in a block for your searches and 2) this had no effect on global results.

1) - It doesn't have to be extremely painful, just painful enough, such that true loathing is needed as motivation. This way, we filter out frivolous decisions. A few seconds pause would be enough.

2) - We need to let the reduced ad revenue do the job for us through the market. Anything else will be gamed much to everyone's detriment. Just empower people to remove the annoyance, and let the money do its thing.


Re 2, yes. A lot of people have reacted as if I suggested letting people affect everybody else's results. I'm not convinced that's possible to do safely, I'd be happy just to see it for my own results.

Re 1, painful? WTF? The whole point is to make it quick and usable. I can already blacklist sites the painful way, by adding them to a Google Custom Search page. The whole point is I'd like a quick add-to-killfile button, like email clients have had for decades.


Yes, #1 - very slightly painful. Not terribly painful, but just a pinch. Maybe something like a 2 second wait. It has to cost more than an instant and doing that also makes it a bit more of a pain to game with scripts.


Google does provide this service: it's called Google Custom Search. You can prioritize or blacklist sites and it's pretty easy to add it to your browser searchbar. I don't always use it, but I'll switch to it when I encounter a spammy topic, usually dev-related searches.

http://radleymarx.com/blog/better-search-results/


Not so fast.

CSE wants you to list sites that you want to search from. Of course, you can't default to '' or '.'. They even stated that '.com' and '*.org' etc. won't return any results. That's unacceptable. Secondly, given you could configure it meaningfully it seems it's pretty hard to configure your browser's search bar to use this CSE instead.

And that's what I think most people use for searching. At least I do.

Facebook got it right this time: with each post, there's an option to hide that post, that person, that application, or that site which posted the post. One click that means "don't show stuff from them anymore": that's what Google needs, too.


I made my custom search secondary.

By this I mean I added it to my browsers, but I still use regular Google search daily. If the results is laden with bogus sites, then I switch over and start again, weeding if necessary.

Initially I thought I'd use GCS all the time, but it lacks the Google menu (Images, Maps, etc) which comes in handy more often than I expected. I use GCS most for code/development related searches.


If you leave the included sites empty, and only supply excluded sites, then it searches everything and excludes those particular ones.


It did require me to fill in the included sites :-(


You have to add one to get started, then you can delete it.

Business rule bug...


I can't believe this isn't the most popular comment. Kinda makes HN look a bit like a knowledge vacuum with all these recent discussions of how to ban results when a google service that's existed for nearly 5 years can do the job. The format of the results isn't quite a nice as the normal google search though.


Err, I mentioned this well before the comment above:

http://news.ycombinator.com/item?id=2075437

It's not so much that it's a knowledge vacuum, just that someone didn't read the whole thread before replying.


I'm on the "wrong" side of the Flash/HTML5 debate, so my average post value is really low. It wasn't a big deal until the HN algorithm got tweaked recently.


Agree with the sibling poster - this is the best comment in the thread. Do you have any interest in sharing your hard work with us?

If you go to the CSE website and select 'Advanced' and then download annotations, you can export the list of sites you've excluded.

Further you can make the exclusion list ("annotation list") into a feed - so it is entirely possible to implement the kind of user-generated blacklist of sites which has been discussed here.


GCS is rather personalized, which makes sense. For example, I don't want experts-exchange showing up, but some people have paid for their service and want it. I'm also a Flash developer, so my list probably won't be useful for most HN readers.

GCS is really easy to set up - takes only a few minutes. I spent the most time hunting down rouge sites - which was actually kinda fun and cathartic.

Big tip: keep an easy-to-get-to link to the GCS Sites Control panel, so it's easy to add new sites. I've added ~40 more in the past two months.


It's definitely easy to set up. I'm thinking about making a greasemonkey script to put a button next to my search results and let me block the site if I hate it, rather than go do it manually as you suggested (which is easy but takes more time than hitting a button).

Alternatively I will stick to my plan to change search providers. One way or another.


Gmail already does it, and the global system uses an algorithm to look at reported spam results in order to automatically move future emails from that party to the spam folder automatically, not just for the person that reported it, but for everyone.

If they're not looking into integrating that nicely into the existing search results page (not a separate form that the average user will never find or use), especially after all the internet chatter about it recently, then they definitely should make that a top priority in 2011. I definitely don't want them to do a rush job on it though. I don't want competitors to start reporting each other as spam in search results to try and game the system even further. I'm assuming they have anti-gaming measures in place for Gmail, so they won't be completely starting that from scratch...


I don't see how you could anti-game this, the SEOs would just use mechanical turk to hire 100s of people (with valid Google Accounts) to do the reporting.

At best G could use the information as a list of potential spammers and filter domains manually, but I really can't see this being automated without giving the SEOs another weapon.


I don't think anyone wants to filter sites that could be gamed with "100s of votes". We want to filter sites that will require tens of thousands of votes to get rid of.


And even then it should consider the votes themselves to be suspect and watch for blocks of users who only vote in unison (qualitatively and temporally).


Google were experimenting with voting on results: http://techcrunch.com/2007/11/28/straight-out-of-left-field-...

Also there is this form for reporting spam sites: https://www.google.com/webmasters/tools/spamreport

Integrating the above into standard search results would be difficult unless it was restricted to users with a good "karma". That might be possible in our increasingly socially networked world


You could also restrict it to people who buy, and sell, on Google Checkout. Putting real money/goods on the table tends to weed out the fake accounts.


The thing is, the SO scrapers like efreedom aren't spam, strictly speaking. It's just that they clone existing content without adding value, and as such are just noise in the results.

Perhaps we need to frame the discussion differently, considering what the searcher wants, rather than "spam-free hits".


That was my point really. I don't want to see eFreedom hits, I consider them spammy, so I'd like to be able to click-ban them from my results.

If Google use that information to gradually adjust their ranking overall, then fair enough -- won't affect me, I can't see them anyway.

EDIT: Even if they don't let that affect everyone else's results (because of gaming), then I still don't care, I still don't see the crap in my results ever again.


Also, I would like '[any widget] review' to take me to an actual review, not pages upon pages of spam. I usually end up looking at comments on a few trusted sites (e.g. Amazon). This seems broken...


Yes, most of the results for this query wind up pointing to pages saying "be the first to review [any widget]".

As a workaround, try searching for "[any widget] sucks" and "[any widget] good".

EDIT: tying this to other discussions on the topic, it's a symptom of Patio11's observation that natural language search doesn't work very well. If you want to find something, you need to paint a picture of what it looks like, rather than asking a question about it.


I've noticed for years that "[product] hosed" brings up good results on how to work around bugs in various software products.


automated version of this: http://www.sucks-rocks.com/


I think the worst culprits are the ones that skim StackOverflow questions and rehash them into their own supposed original "question and answer" site


Jeff Atwood discusses this in his most recent Coding Horror blog post (http://www.codinghorror.com/blog/2011/01/trouble-in-the-hous...), in which he states that he doesn't really mind people copying StackOverflow questions (and answers), but that he does mind that the copies get a higher Google ranking than the original.


How does that happen though? efreedom must be doing something special with SEO to get higher than SO given that SO must have huge page rank score.


Pagerank is based on number/quality of links.

Nobody links to an SO question, except perhaps mentioning it in a blog post. Scrapping sites create lots of links to the question, even if they are low quality ones


Yeah but does anyone link to efreedom question pages? And SO as a whole must have way more inbounds than efreedom, surely?


What do you think most StackOverflow answers are? It's a karma-paid labor pool where you can post questions and a lot of under-employed people will rush out and do the necessary Google searches, collating and slightly rewriting the results to yield the most votes.

Everyone is ripping off someone's content.

And just to be accurate here, SO content is creative commons (created by the community). Are those just cheap words?


What do you think most StackOverflow answers are? It's a karma-paid labor pool ... said the comment on Hacker News, earning the poster 3 points so far.


2 points. I started at one. I desperately hope for more, though, as this is going to give me my big break. I'm trying to earn six more trophies, a ribbon, and link this profile on my resume.


Except that thare are some genuinely useful compilations on SO which you don't get elsewhere, e.g.: http://stackoverflow.com/questions/72394/what-should-a-devel...

These add significant value to the original content IMO


Of course there are. It is a community and discussions can be interesting.

However most of the time that I and most people come across SO it's the top search hit for a technical question. The answers are seldom unique, but instead are usually pulled almost word for word from the online manual for the same.

That is overwhelmingly the value of SO -- it benefits by collecting aggregated content from all over the web. It is, hilariously, by design exactly what many in here are complaining about. In that case instead of an automated process it's a mechanical turk.

Hey I don't care because it gets me my answer, but I can see the hilarious paradox in some of the complaints.

I was once searching on a problem -- one that I had faced years earlier but had then forgotten -- and the top link was a SO QA. The answer sounded oddly familiar, when I remembered that it was actually from a blog entry I had posted two years prior.


stackoverflow consistently provides high quality answers though. If someone was to come along, take the creative commons material and present it in a way that actually improved my experience of finding the answer I was after I'd be all for it.

In my experience though the sites that are taking the content are ad ridden messes which remove value rather than add anything.


Totally agree with that. There is an opportunity for someone to take the content and actually improve on it somehow (e.g. an expert engine or something, or better machine driven ontological navigation, etc), however it thus far its just been improving the SEO of it.


" It's a karma-paid labor pool where you can post questions and a lot of under-employed people will rush out and do the necessary Google searches"

Maybe it's like that in your field, but in Mac dev questions you're fairly likely to get answers from established OS X developers, and even Apple employees.


It's not the answers I'm bothered about, it's just how they spam my search feed. I whack in some queries to google and the first 10 results will just be the same SO question but spread over multiple different sites.


As programmers, our typical complaints are for sites that bog us down in common (expert's exchange, stackoverflow scrapers, etc.).

What I found interesting: I was doing a search on something I normally have no interest in (a sewing machine manual for my wife) and I was amazed by the level of spam I was encountering.

We have no idea how bad the problem is for others whose topics we do not usually see. The web is far more full of spam than we even realize.


Proof that true AI is a long way off?

If the best and brightest (arguably) on the planet can't figure out how to filter out search with algorithms, what makes us think we can mimic true human intelligence any time soon. (I think it will happen, just not as soon as some claim)


Or maybe it just means that the day we can algorithmically emulate human intelligence is the day spam becomes useful (or the day we get impossibly good at fooling ourselves).


We are already able to (badly) mimic true human intelligence. The problem is when our mimics compete with humans in general contexts. There, being smarter means you can cream the competition.

Basically, it's the difference between PvP and PvE.


That will never happen, if they ever did that it would be an admission that there is something inherently wrong with their algorithm. They won't do it.


Nah. They let people 'report spam' in gmail -- they're happy to admit that algorithm doesn't always guess right. Also there's already a "give us feedback" mechanism for reporting bad search results, it's just too slow and manual.


People are used to spam buttons, and would realize that the lack of "report spam" would mean that Google might be marking too many things as spam if they think the user doesn't need a "Mark as spam" button.

People are also a lot less tolerable of spam in their inbox than they are of irrelevant search results.


On the contrary. As suggested by another commenter, it could be used to improve their algorithm. Greater number of exclusions would reduce their relevance.


I'd definitely make use of this feature. Some ancillary features might include:

a) Google could warn you if it thinks the sites you have blacklisted seemed to have regained credibility.

b) Google could suggest additional sites you may wish to blacklist, based on other user blacklists.

c) Google could allow outside parties to curate blacklists.

d) Google could list the most commonly black-listed sites publicly. For the webmasters that find themselves listed who want to run an actual honest business, this is a good sign they should change their tactics. For the folks that aim to spam and profit... well screw those guys.


Maybe this could be implemented in the way of sticky search operators?

So for example, I could define -site:efreedom.com as an operator to be applied silently for every search I make.


How many gmail accounts do we need to band together to lower the rank of stack overflow against our super-duper question-and-answer site QandAdsWithMe.annoying.com?


An excellent point. Blacklisting works both ways, and there's nothing stopping spammers from creating hundreds, maybe thousands, or even more, of throwaway Google accounts just to blacklist the original site. Sure, the logged in Google users wouldn't see the spam site (if they blacklisted it), but it would still appear, and outrank the original site, in standard search results for those that aren't logged in.


So don't use the blacklisting stats for re-ranking everyone else's then. I didn't consider that when I put the post up. An instant way to filter sites just for yourself would solve 90% of people's complaints straight away.


Looks like a lot of people are assuming a solution would some sort of voting system like stackexchange, etc.

Why not allow individual users to hide sites from their own search results and save the info in their google account? For example, provide a "hide this site from my results" link next to each result. Each person decides which site they don't want to see and SEO and global results remain unaffected.


I feel like I'm taking crazy pills. Am I the only one that remembers this EXACT feature on Google about a year ago? You had to be logged in to iGoogle, and each search result had a small [X] to the right of it that would appear on hover.

If you clicked it, that result wouldn't appear for you again. I used it all the time.

Then, lately it's gone. Maybe I was part of a small, randomly-selected test group?


That was part of Google's SearchWiki experiment. http://googleblog.blogspot.com/2008/11/searchwiki-make-searc...

That experiment was replaced by Google "Stars" in March 2010 because, according to Google:

> In our testing, we learned that people really liked the idea of marking a website for future reference, but they didn't like changing the order of Google's organic search results.

http://googleblog.blogspot.com/2010/03/stars-make-search-mor...

I personally think there is much more going on here than Google admits.


Wasn't this a problem Google Search Wiki tried to solve?

http://googleblog.blogspot.com/2008/11/searchwiki-make-searc...


Yeah. I miss searchwiki. :-(


I'm not sure how this would be implemented. Where would the blacklist be held and how would it influence the search results? I know that they already do a lot of search customization but most of it is just aggregate statistical computations. It's not that they return results specifically tailored to you but more like results tailored to a very fuzzy average version of you. A blacklist seems way too specific to each user to be susceptible to meaningful aggregate statistical operations like spam filtering which is one of the reasons that spam filtering in google is so good. Each user contributes something and everyone benefits. I don't see that happening with blacklists. I think to make it worthwhile they would need to figure out how to feed the information from blacklists into providing more meaningful results for everyone.


That's the point, what I want is exactly a user-specific blacklist.

I can even do that already with Google's Custom Search, all that's missing is a little 'block this site' button. Instead I have to go and configure Custom Search manually for each URL mask.


You could write a little GreaseMonkey script or extension to do that, shouldn't be too hard.


> how would it influence the search results?

Presumably it would do the same exact thing as '-site:foobar.com'.


How about decentralizing the search page? Hear me out for a bit.

My theory is that these complaints are coming from specific interest groups, not the general public. For example, spammy-content is created and targeted at a developer/programmer audience, and that is the source of some of these complaints.

So my suggestion is Google should platformize their search; and give out dedicated search instances to specific communities. The community should have enough levers to govern/influence what is spam or not. In addition, the community can promote certain high-value resources, which are otherwise unfairly listed in search results. Invite some high-profile communities for a test-run, and let the communities make their own choices.

The public Google can still handle the general public. This can also bring in some transparency in the way spam is determined.


Here is a conspiracy theory for you guys.

1. How does Google make money? Search Ads.

2. How do people click on search ads? Bad real search results.


In the interim, you can do your searches by adding -wareseeker -efreedom to the search string.


I've discovered you can also set up a Custom Search Engine, with no included sites (default to everything), and specifically exclude the sites you don't want. Then do all your searches through this.

http://www.google.com/cse/

Usability-wise, though, it's not nearly as much use as a 'ban' button next to each result would be. But it shows Google already have the infrastructure and code that would allow this -- they just need to make it instant to use.

EDIT: The other downside of this is you lose a load of bells & whistles, e.g. previews, "pages from the UK" (without typing), icons for images/news/etc. Time will tell if I miss those.


Yes, it works and it's completely free. Here's an example Custom Search Engine I set up:

http://www.antimoon.com/ce/

It includes only sites which are known to use (mostly) good English. It's designed for English learners/teachers who want to find correct usage examples without risking exposure to Yahoo-Answers-style English.


But its not free, right ?


From the signup page:

"Standard edition - ads are required on search pages"

There's an ad-free premium version as well, but you definitely can get it for free.


It's free if you show ads on the pages. It costs money if you don't.


Basic ad-supported version is free.


It is free.


Didn't Google have downvotes for results - shouldn't they be sufficient to achieve the result you want? Presumably Google would learn that you consistently downvote wareseeker and exclude it from results in the future.

I haven't used it because I don't want Google to remember my search history. But if you are willing to stay logged into Google (which would be required for your proposal), it would not be an issue.


It seems that it might be more helpful to whitelist sites. The web grows too quickly, and the mass of spam sites overwhelmingly so. If I had some way to blacklist sites, I'd end up spending a lot of time doing so. In fact, it could quickly take up most of my search time.

If, though, we could whitelist sites, it seems that results would get cleaner faster. I don't care about how many bad sites are out there, as long as helpful sites make it to the top. Plus, I typically use just a few sites to access reliable information anyway (the number's about 7, right?), so if I can whitelist results from those sites, I'll probably find my desired content more quickly.

What about the case when there are 30 spam sites listed before 1 good site? That hasn't happened too often for me. Instead, the results I'm looking for are usually just 4 or 5 spots down the front page, and very occasionally on the second page.

White listing seems like it would still be faster and easier for now.


For those wanting Google to put a penalty on the sites who are banned/removed from the user's view, what's to stop someone from gaming that system via Mech. Turk (or some other way)? Just pay people $0.12 to open gmail accounts and ban a competitor or whatever.

That's the only negative I can think of - other than that, I say bring it!


I'd ban eHow.


And mahalo.


And ExpertsExchage and About.com


I don't get the hating on About.com. There are sub-sites there that I actually read intentionally, e.g., http://heavymetal.about.com/

This sure looks to me like real, original content.

EDIT: how about the courtesy of an explanation for the downvote?


About.com's content is decent sometimes. The problem is finding it amongst all the ads.


I notice that nowadays, I don't really care where the answers come from so long as they they answer my question. While I tend to look for and click on StackOverflow links and skip efreedom links (just because I want to be loyal to them), I find that I still use eHow and About.com and ExpertsExchange results since I find that the answers there most times help me with what I am seeking -- for me it doesn't matter really that it is the original source of the info or not. I think these sites are better than the true spammy sites that really have no content on the answer to your question, but somehow manage to get up there in the search results.


ExpertsExchange is at least original content (as far as I know)


ExpertsExchange used to be nearly 100% scraped usenet. Then eventually a few poor souls stumbled in thinking it was an actual site and started posting questions there directly. Eventually people even started answering them.

Had they not had such a terrible site design and revenue scheme, they might have actually made the transition from scraping bottom feeder to respectable site.


I never knew that. I always assumed that someone out there was paying for all these questions.

Good to know.


bigresource has annoyed me most recently, I don't think they could make the thing more unusable if they tried, merging together only slightly related posts from different sites into a list without providing any of the responses to the posts.


I want a search results page similar to the "Priority Inbox" we got recently in gmail. Set sane defaults and let me override them with "Important/Notimportant" buttons (or thumbs up/down or whatever) next to results.

Let it learn what I think is a good result for my needs.

If you make it a little bit social, make sure you weight other people's opinions by how much they agree with my own in other areas (making it harder for sockpuppets to muddy the waters)


Does anyone remember when google had this feature?

Well sortof, you could block individual responses from coming up under a specific search term.

There was a little x by each result if you were signed into google and it said "never show this result again"

Not enough people used the feature for it to stick around...

I would love this ability but google please, good UI and consumer education. I love your features but don't love when they get taken away because users don't know they exist.


It doesn't seem like it would be hard, but if the rankings aren't driven by money, then there will be attempts to game the system. The problem I feel is Money. As long as everyone has to compete for it (meaning money doesn't work for the people, people work for money - in a system owned by the few), we'll have shady marketers, shady products, spammers etc... so, I think that it will remain a cat and mouse game.


It seems like a obvious answer, but why not just use "-site:annoyingpage.com" in you search? In fact "-TotallyUnRelated" has helped me narrow down searches effectively too. You are asking for a feature that only a small subset of the users will benefit from and use, it makes more sense for google just to find a way to rank sites better then it does to build a additional filter on top of the current system.


Because I'd want to add a list of excluded sites for pretty much every single query I do.

Would you want to type out a string of 20 or 30 excluded sites every time you search Google?

Ranking clearly isn't going to be good enough, because algorithms can be worked around and gamed.


I use a Chrome extension called Google Search Filter which solves this exact problem - https://chrome.google.com/extensions/detail/eidhkmnbiahhgbgp...

It lets me sync my config accross multiple machines.

Has nice hacker-ish config. Basically a text file you can share with others. This is my current config:

# Make these domains stand out in results

+en.wikipedia.org

+stackoverflow.com

+github.com

+api.rubyonrails.org

+apple.com

+ruby-doc.org

+codex.wordpress.org

+imdb.com

+alternativeto.net

# SPAM - never show these results

experts-exchange.com

ezinearticles


Sounds to me the web search is not yet a solved problem. As the hardware (storage and memory) is getting cheaper and cheaper, and the emerging enabling technologies such as cloud computing, building your own search engine may not sound impossible any longer. Wonder how feasible it is to apply anti-spam algorithms that work well on emails to web pages.


Google Domain Blocker: (userscript/greasemonkey), for those interested.

http://userscripts.org/scripts/show/33156

You can also sync them for Firefox across multiple machines using Dropbox, as the preferences are stored in your profile (IIRC, in a javascript file).


Wouldn't implementing this feature be a tacit admission that there's a problem with search results?


"This would solve a lot of people's complaints in one fell swoop."

And doing this would spawn a lot of people's complaints in one fell swoop.

If you owned a site, and created enemies, they could band together and flag your site as spam.


I don't think you understood his suggestion. "Banning" a site would be local to your signed-in google account, not a global ban from results (which would indeed suffer the backfire you mentioned.)


doh!

You are totally correct. I completely missed that. Maybe I should drink a bit of coffee and wake up ;)

Although... I have a suspicion that at some point it would effect non-logged in users. Many logged in users banning a site is a signal that may effect the global results. Maybe in the same way marking spam in Gmail...


Now they can bot-click your adsense ads to cause google to ban you. If you are ad sponsored this effectively puts you out of business.


Startup idea: Create a service around google custom search. Select the "Search the entire web but emphasize the selected sites" Then create a gui to allow people to prioritize or ban their search results.


In the old days we had killfile. Why can't we PLONK content sources like authors or sites by handles like nicks or domain names? There should be some standard protocol for that. Httplonk.


So what we are talking about is censorship. You are suggesting a non-traditional type where a government does not do the censoring, but a few people do. How many votes would it take to put a website on a blacklist? 50, 100?

Who decides if a site is spam?

So is free speech dead under your proposal? What is I build a site that criticizes the Governor of your state. Or a federal agency. What would prevent my site from being blacklisted in your proposal? Even if I had great content (your argument is about poor quality content) my could be voted into a black hole in a few hours. Lets think about this carefully. Is that the price we are willing to pay to get rid of EE?


Did I say anything in the OP about my blacklist affecting other users? Please read before ranting.


My question is why stackoverflow hasn't banned efreedom yet?


SO provides their database content for free under creative commons license. efreedom is not doing anything illegal.


Although it's sad because it speaks volumes that we're fed up with all the garbage in many of our search queries.

I do hope those working on the algorithm are taking note.


you can do it with seeks... http://www.seeks-project.info/ http://www.seeks.fr/

on your local machine and/or remote server... and it's free software.

blekko ? try this query, http://blekko.com/ws/?q=debian duh ?


Just use Google SearchWiki.

Oh, yeah – they pulled it.


the problem I have with this is that some black hat people can do this to any site they feel they are competing with. what would prevent someone from blacklisting a legitimate blog or website just because they did not like the content?


So what? All that would mean is that the site wouldn't show up in their results.

I never said anything about it affecting other people's results...


me gusta esta idea, muchos de los resultados iniciales son spam, y los resultados que de verdad me sirven aparecen dos o tres paginas después, apreciaría mucho que se pudiera banear los resultados alejados o que considere spam, thxs


[deleted]


I believe blekko does


YES. Like the useless chromeextensions.org

This would be an awesome feature.


great idea. Let this be the first question asked at any Google event.

In fact, let there be a sea of hands all gesticulating wildly to present it.


Can't this be done with a browser plugin?


Like I said in the OP: "There are gresemonkey etc. scripts to do this, but they're tied to a single browser on a single machine. A global filter (like in gmail) would be so much more useful."


simply add -site:foo.com to your search request.

And no, this doesn't solve the problem.


blekko . com is doing this and much more


And no more bloody experts-exchange...


I honestly don't mind that site - about half the time I search for an issue, I find it's been resolved by someone over at Experts Exchange.

Do people realise that if Google is your referrer, you can scroll all the way down and see the solutions to the question?


It seems like every few months, EE tries to cloak its site in a new way ... then Google catches them and they revert to merely misleading people (by putting the answers several page-scrolls down) into thinking that they need to pay to see the answers.

Actually, I sort of sympathize with the predicament EE faces. They want to show up high on Google search results, because that's how they get new customers ... but they don't want to give away their content for free.

Here's an opportunity for a search engine start-up: Allow users to search (by default) for only free content -- but also allow them to search, if they so choose, for content that's behind a paywall. Pay-for-access sites would love something like this.


Actually, I sort of sympathize with the predicament EE faces. They want to show up high on Google search results, because that's how they get new customers ... but they don't want to give away their content for free.

I have no sympathy at all for them. All the answers are community-generated for free, aren't they? So they're trying to charge for other people's generosity. And plenty of sites on the web give away better content than theirs for free, with no attempt to trick people into paying.

Their business model is broken and I'm surprised they've lasted this long.


I thought that the experts that answered the questions got paid? Guess I was wrong...


I have always known that, but I don't think a lot of people do. A while back I was working with a high dollar consultant who really thought he knew it all, but he seemed really surprised that I didn't need to use his Experts Exchange account to see the answer.


It didn't used to be like that -- you used to have to 'view source' to see it <facepalm>

Then I think they went through a stage of not revealing it at all.

But in any case, it's far less useful than SO/SF etc.


Depends on what you are doing. I sold out to greysuitland and ee is better for all my retarded vb, vba, c# questions.


If you use the cached text version of the EE page you find, you can get through the pay wall.


https://chrome.google.com/extensions/detail/ddgjlkmkllmpdheg...

Is there something I'm missing here?

It's not in Google's financial interest to provide this feature, but it already exists rather trivially.


Still tied to a specific browser though. Which isn't available on all platforms.


Which platforms are that? I'm using Chrome on Windows, Mac and Linux. It can be run on FreeBSD if you're willing to deal with the bridge troll running the show there.

http://i.imgur.com/1Amu3.png

Do you really think this feature doesn't exist for Firefox?

Further, it'll even sync with your google account making it global if you give it access.


Well iOS for one.

Also much as I like Chrome, I can't run it on my Linux box because the font rendering is terrible and it lets sites' font choices supersede the user's. That means I can't overrule their painful font choices with ones that look good, like I can in Firefox.

But that's another rant...


>iOS for one

whistles and pops finger

> it lets sites' font choices supersede the user's

Googled, found it 5 seconds. Sure there are at least 3 other ways to do it, one of them while rubbing your tummy and patting your head.

http://www.google.com/support/forum/p/Chrome/thread?tid=2121...


I'd like to see an option for searching only ad-free sites, or perhaps just sites that don't use AdSense, as well. Surely Google would have no problem with that.


Use duckduckgo.com. Its pretty good with excluding spam. And with a new service there is an indicator of how spammy a site is.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: