Personal domain blacklist.
There's a lot of spammy bullshit on the web and Google seems to have given up on keeping this away from me. Fine. But for my specific searches, there's usually a handful of offenders who, if I never, ever saw them again, it would improve my search experience by an order of magnitude.
So let me personalize search by blacklisting these clowns. Why can't I filter my search results so that when I search for a programming issue, I never see these assholes from "Efreedom" who scrape and republish Stack Overflow?
I don't, personally, need an algorithmic solution to spam. Just let me define spam for my personal searches and, for me, the problem is mostly solved.
(Also blacklisted: Yahoo Answers, Experts Exchange.)
I am actually suprised there is no Labs application for this, unless there is a business case against it.
A cross-browser effort that implements a few key features from OptimizeGoogle, would be a very good idea. I'd be up for that.
Maybe IE9 does, too, but that's not important. :)
Typically it's ok to err on the side of caution but when someone offers to do a bunch of work if you just indicate that you'd like it done the safe bet is to assume that in fact they are qualified, after all their reputation is on the line in public.
The first thought that came to mind was what happens when I disagree with a couple of items on one of these 3rd party blacklists?
Then I thought, FORK IT and make the changes you want. You could even merge in lists from other people. Github for blacklists?
Now, this doesn't mean that filtering them wouldn't be useful to you, since at first glance it appears they're solely a duplicate. Just pointing out that they're not actually doing anything wrong, and they're (probably) not scraping.
SO has specifically said that this is okay.
It doesn't look like Jeff is that okay with it, especially when it comes at the cost of Stack Overflow's own ranking:
Sorry, this is absolutely necessary, otherwise we get demolished by scrapers using our own content in Google ranking. – This is from a question about Stack Overflow's SEO strategy:
In the same way I can call someone's mother bad names because it isn't illegal, it doesn't mean I should do it, because I can follow the letter of the law 100% and still be an asshole. Overall, my policy is that it's best not to be an asshole and it annoys me when others can't share that basic ethos.
That said, you could make an argument that the value they're adding is SEO and promotion, it's pretty impressive to be able to out-rank SO...
New media will make it work anyway. IP is not needed for content producers to survive and even thrive.
(On a side note, capitalism is defined by the legal enforcement of property rights. Abolishing intellectual property is probably the exact opposite of capitalism. The word you're looking for is "market".)
You're still right though: market would be a better fit. Markets and capitalism are pretty much interchangeable in my mind, which is why I made the slip.
It's fine that they have an expressed policy that says it's okay, but I'd keep it at that and not refer to terminology like CC licenses.
There's nothing dubious about this legally at all.
I prefer how YouTube handles it:
“You shall be solely responsible for your own Content and the consequences of submitting and publishing your Content on the Service. You affirm, represent, and warrant that you own or have the necessary licenses, rights, consents, and permissions to publish Content you submit; and you license to YouTube all patent, trademark, trade secret, copyright or other proprietary rights in and to such Content for publication on the Service pursuant to these Terms of Service.
For clarity, you retain all of your ownership rights in your Content. However, by submitting Content to YouTube, you hereby grant YouTube a worldwide, non-exclusive, royalty-free, sublicenseable and transferable license to use, reproduce, distribute, prepare derivative works of, display, and perform the Content in connection with the Service and YouTube's (and its successors' and affiliates') business, including without limitation for promoting and redistributing part or all of the Service (and derivative works thereof) in any media formats and through any media channels. You also hereby grant each user of the Service a non-exclusive license to access your Content through the Service, and to use, reproduce, distribute, display and perform such Content as permitted through the functionality of the Service and under these Terms of Service. The above licenses granted by you in video Content you submit to the Service terminate within a commercially reasonable time after you remove or delete your videos from the Service. You understand and agree, however, that YouTube may retain, but not display, distribute, or perform, server copies of your videos that have been removed or deleted. The above licenses granted by you in user comments you submit are perpetual and irrevocable.”
(Can someone tell me the <pre> syntax or something appropriate for a blockquote?)
prefix with four spaces
Regardless, thanks for the tip.
The question is whether StackOverflow is closer to YouTube or Wikipedia. I think it's closer to Wikipedia because it's a curated reference source, not just a medium for self-expression.
The articles are in a constant flux of change, and I don't know if anyone deserves more attribution than others for contributing to an article.
Knoll might be a more relevant example, but I haven't really checked it out in a while. (Who has, really.)
I recognize I may be being overly simplistic.
> Always redirect to stackoverflow from pages that just copy content, like efreedom, questionhub, answerspice.
Very handy. I put ehow.com on mine and never see results from them.
In the interests of results diversity, you don't want the same content repeated ten times on the first page, although this has the side effect of pushing the original source onto the second page if you guess wrong.
There must be some way Google's search engine could learn by looking at the blacklists people uses.
Now, it's possible that SearchWiki just needed a few more iterations, and with a few details changed, could be a big success. There have been a few other recent launches that were tried years ago, didn't work then, but had a few more iterations and now are big successes. I could at least raise the issue. But unless I can tell a convincing story about why people would use this when they didn't use SearchWiki, it may be an uphill battle to get resources devoted to this.
Compare that to the GMail labs. It's pitiful.
They already have the exact opposite curation feature: the star system. And it's crazy.
When I search, and click one of the 10 results, and the result turns out to be satisfying, the last thing I want to do is click the back button and star it.
When the result turns out to be spam I necessarily have to hit the back button and try again. Staring me in the face is the now-purple link spam - let me X it.
Personal blacklists are the least Google could do, because my SERP is never going to be perfect. Feeding those blacklists back into the general SERP population is an interesting research project.
> I don't, personally, need an algorithmic solution to spam.
Not now you don't. But if everyone started using your approach then the spammers would adjust their behaviour and use many domains instead of one.
For whatever it's worth, blekko has this. It's one of the main reasons I switched to blekko over Duck Duck Go for the majority of my searching.
Also, I'm also happy to take requests to ban these stupid sites for everyone.
It's even really a complaint. I am glad that when I show up on DDG (which I still do several times per day) I get the same high-quality results without regard to who I am.
I'm also happy to take requests to ban these stupid sites for everyone.
The problem is that I have, for example, en.wikipedia.org marked as spam, simply so that their juice doesn't overwhelm my search results. It makes sense for me, but I suspect it's not even close to what your average user wants or expects.
In any case, thanks for the recent addition of non-Google options for searches when DDG runs out of results. Small as it may seem, I consider that a major step in the right direction.
Haven't you used Gmail? If I tag a site as spam, DDG shouldn't show it to me. If a thousand users mark it as spam... then it starts to be clear that is a shady site, and DDG should eliminate it from its system.
Uhm, but now we have a vector for script kiddies to ban sites from a... who knows, maybe in some years... major web search.
Perhaps definitively removing a site should be done by a human operator.
The issue of social search has a lot of mindshare. Some think it is the future of search. I disagree.
One of the things that made search successful anduseful early on was scale. Instead of having to go to the librar or ask your friends you can effectively canvas the connected world.
I find the notion that friends' recommendations will replace that as nothing short of bizarre. It's like a huge step backwards. The argument is that you can filter out the garbage as your social graph will provide a level of curation.
Let me give you a concrete example. If I wanted t buy a camera I'd stil need t go to dpreview and other sites. It's highly likely that my friends don't really know a lot about this (but some will have an opinion anyway).
This same idea of human curation is behind such sites ad Mahalo and the garbage sites themselves to a degree. Of course at some point computers will be powerful enough to generate this garbage content.
Blekko's idea of slash tags s interesting (to a degree) but if it's successful its easily reproducible. Google is still in the box seat here but of course that's no barrier to a link-baiting TC title.
Personally I'm an optimist. I believe that, much like email spam, the garbage from AC, DM and others I'd a transitional problem (email spam is basically a solved problem now if you use a half-decent email provider). If they succeed we won't be able to find anything. I don't believe that'll happen so these services are therefore doomed.
So betting on Demand Media is (to quote Tyler) like betting on the Mayans (meaning betting they're right about the world ending in 2012: it doesnt really matter if you're right).
So my money is on Google being the better Google.
I very much agree with you that social is limited. It's a filter which avoids spammers, but it also filters out experts and users. It also solves some of the problems introduced by the generic nature of search algorithms.
I'm betting my time on the idea that the solution which wins will combine an understanding of the product space, the value of new features, a current understanding of price, and which can be customized transparently to the needs of the user.
I'd divide the possible filtering processes into three approaches.
* Throw all your questions to all your friends and all the review sites you "trust". This means any third party site you trust gets a rather excessive ability to spam you - even a site with good user reviews on them are trying push pop-up windows on me. Just because I've gotten good info from X once doesn't mean I want any more from it.
* Throw friend recommendations and trusted sites to a third party "meta-filterer" who organizes things for you. You'd have to really trust that site and essentially there's no reason they'd be better than Google.
* Do the filtering yourself. Most people essentially do that now. I'm working on a project to create tools to automate and improve this this process. Create your own relevance and topic-weighting system that adaptively filters all the other filters. I believe that this kind of approach eventually going to be needed. Not so much because each person can or should do all their topic-relevance-weighting but because this approach would keep the other systems honest.
Eventually, people are going to realize that both their social graph and their content-relevancy-weightings/algorithm are far too personal to farm out unquestioningly to a third party. The present social networking system is like AOL-email in 1992 except with the added provision that provide can look-at and alter your emails as part of the service. I'd envision this stabilizing to the present email system where webmail exists but has to work more or less the same of email to your personal computer.
Your best bet ends up being to simply include everybody, which improves your chances of finding an expert opinion, regardless of the obscurity of the query.
Logically speaking, no it doesn't. Why couldn't a search engine be aware of topic/product areas? Many already are.
Not for generic terms. But for _buying stuff_. If I am buying a camera, I'd be more interested in what my friends own, recommend.
I believe the author is mistaken on this point. Quick proof is to do a search for [matt cutts] and you'll see the root page of my blog. Click "More search tools" on the left and click the "Past week" link. Now you'll only see pages created the last week, even though lots of pages on my site were indexed in the last week.
Most people have never used Google's Subscribed Links feature because it is not enabled by default.
User feedback can be used to determine which third party code to use in which contexts. Spam/unhelpful features would be detected quickly. There would be intense competition among third party developers for highly desired features.
Something like this could give you Wolfram Alpha like features among other things such as custom UIs for various searches (e.g., travel).
The Google App Engine could be used for computation. You could pay third party developers by how often their code is used in search results.
It would be an ideal place to experiment with and profit from novel search ideas. For example, third party developers may experiment with query-induced flash mobs where people who just performed a similar query could collaborate in real-time to find the information they need.
Finding ways to prevent such an ecosystem from descending into absolute chaos would be a fascinating challenge.
So I happen to know somebody who is taking a small section of the home appliance market and creating content around it -- reviews, news, advice, a place for other consumers to talk to each other.
Of course to do this you need to have income, so they are going to use some sort of ad-supported model.
My question is very simple: is their project a spam site or not? To some, I guess it would qualify. To others, not.
You see, there are two questions when it comes to search results: 1) Am I being presented results that match the query I entered? and 2) Am I being presented results that match what I want to know?
These are two entirely different things. A third-grader looking for information on a movie star might find a games page with all sorts of information on that star -- all sponsored by some kind of adsensey stuff. And he's very happy. A researcher typing in the same question gets the same page? He's pissed.
There is no universal answer for any one question. It's all dependent on the culture, education, and intent of the user -- all of which are not easily communicated to a search engine.
Look -- this is a real problem. I hate it. Sucks to go to pages you don't like. All I'm saying is that it's more complicated than "we need a new Google" Finding what you want exactly when you want it is a difficult and non-trivial problem. We just got lucky in that Google found a simple algorithm that can be helpful in some situations. It may be that we're seeing the natural end of the usefulness of that algorithm.
To me that depends a LOT on how they present the advertisements and what they do on pages where they don't have information for a product. My biggest complaint with so called review sites is that they are presented more advertisements than content. They also tend to have automatically generated pages for every model number you can imagine, including incorrect ones that they receive searches for. On those pages there tends to be links to shopping sites and prices for unrelated appliances and products. I absolutely consider that to be spam because they are content free.
The problem with google searches lately, and especially for things like appliances, is that the spammers and content mills are clearly winning. In my current search for a washer and dryer the manufacturers page was on page four or five or the results. The first few pages were flooded with bogus content pages, sale pages and unrelated pages. Trying to filter out shopping sites and explicitly target specific keywords and filter others doesn't help. There are a growing number of search topics for which google is simply broken.
The problem is, as you point out, that most people, most of the time, are beginning to see results they don't need or like when they type a search. This is a big problem for both searchers and the companies that provide search. If you create an algorithm for directing people's behavior (a search engine) folks are going to game it. You and I might not like it, but "gaming the way people do things" is called marketing in any other context and has been around for hundreds of years.
This leads me to suspect that no simple (or even complex) system of finding things for people is ever going to work for an extended period of time. It's a radar vs. radar detector problem. It's a natural competitive situation.
But it doesn't have to be all bad. From competition and fitness criteria comes evolution. Spammers and search engines will probably be a key part of how AI evolves. It'll be neat to see if we move beyond Bayes -- and if so, how would that work?
The one thing you bring up that's interesting is what to do with bad searches. How do you deal with a mis-typed part number? Should a system know which part number you have? If so, how would that be done?
I think the spammers covering all the misspellings are doing a service -- as long as the site isn't obnoxious and provides the user with the information they are looking for. We think of it as a failure of Google, but in fact it looks like a win: thousands of little spammers trying to find all the mistakes I make and providing content for them -- as long as they have my best interests in mind (and are not trying to trick me). I'll happily look at an advertisement for a Ford Explorer in return for valuable information on my 1978 dishwasher that I couldn't read the entire part number for. And I hate ads. I like that scenario a lot more than looking for a favorite mp3 for a cell phone ringer and spending the next 3 hours in spammer hell.
There's lots of ways to handle this but doing a fuzzy search for similar or possibly related part numbers should be easy enough. The user can then be presented with those search results. If the model/part number isn't found you could even provide them the option of adding content for that part if the site takes user generated content. I don't think I've seen any site take this approach but instead go with the spammy show a ton of ads approach instead.
Appliance models and part numbers is actually pretty interesting. A couple of people I know built and maintain a simple desktop application for smaller appliance retailers and parts/service companies. The application contains a database of all valid part numbers issued by every major appliance vendor for the last twenty years or so. This information is updated about once per week by the manufacturers and they supply this information freely to anyone that wants it or to members of specific programs. Some of this information is provided via faxes or emails which sucks but data entry can be farmed out to temps. This company aggregates the data and provides it as a service to their customers. The application can do full or partial matches for part numbers and can filter based on appliance type. If a small two man team that doesn't even work on the project full time can successfully manage that I don't see why the big web based sites are so full of bogus content and spam.
Fuzzy logic searches would be awesome.
The problem here, of course, is that the site doesn't own the search program. They can only influence it in certain predefined ways. So if you're building a site for dishwashers from the 1970s and you know that folks consistently misspell some brand name? You either provide a page for that misspelling that Google can crawl or those folks don't get content. Assuming you're doing a quality site, folks who can't spell need content as much as those who can. Yet if you provide a page based on a misspelling folks will yell "spammer!". It puts you in a bind. There's no answer everybody is going to be happy with.
I think people can tell whether or not site owners are trying to help them out or just trying to trick them using Google. At least I hope so. I know as much as I hate ads, I'm happy if I never saw one again for the rest of my life. I have to be careful not to take that personal opinion and apply it to all site creators, however. There's nothing wrong with noticing that folks are looking for something, can't find it, and providing content in that area.
In a lot of ways Google is a victim of their own success. The net was so new, the algorithm so cool, that it looked a lot like magic. People got used to the magic and forgot that it's just a computer program somewhere. I think we may expect too much.
this is not the same problem as different people perceiving results differently.
With a push to a mobile first world the Android model is especially sensitive to spam. On a full size browser you have a lot more context and results for a given search. 5 Results may be spam, but you can work around them. If the average phone screen shows 3-5 results and all of them are spam you will quickly find alternate tools.
Google ignoring spam is like Microsoft ignoring the cloud.
A better search engine is not what will do Google in, because they would understand the danger it posed to them. What will do Google in is a business which they don't understand that can kill search engines as a place to do business.
This is like writing stuff for the Voyager's Golden Record.
The impression they'd come away with would be something like: (Extremely large number) ignoring canned meat is like (random company name) ignoring the high-altitude water vapor.
What is the appropriate user response? Go to Stack Overflow? Find a branded knowledge base like O'Reilly's Safari? I'm genuinely curious to know what we can do.
What most (power) users are opposed to are scraper sites that reuse other sites' content to rank higher than the original content source, and tactics like eHow uses where they have 10 different articles about how to tie your shoe, but each one has a title that matches a different long-tail version of the search query.
Again though, the issue is that Google isn't reacting to and clearing out content spam. Most likely because the sites add to Google's bottom line and a spam engineer modifying the algo to remove the powerhouse content mill sites can actually negatively impact the Google's revenue.
Also, when we complain about search quality, we're a vocal micro-minority. Most people, like you said, find these sites useful, and haven't even though about the implications of these content mill sites.
This is definitely not the reason. See my comment at http://news.ycombinator.com/item?id=2059661 for more context.
"I was referring to how Google should respond to content farms. Historically, Google has been willing to take manual action on webspam. With the rest of search quality and ranking, Google tries to use algorithms as much as we can. So the distinction of whether something is spam vs. low-quality is an important one within Google." - Matt_Cutts
Here's a good example, coming from eHow:
I know you are probably asked to respond about specific spam cases constantly, so don't take this as me demanding an answer for this specific instance. However, eHow is clearly leveraging their domain authority here to scrounge up the traffic for each different long tail variation of the term "How to Tie Your Shoelaces".
The reason they're targeting each of these phrases with a different page of content is because of the data Google gives them (and all of us) about who is searching for what and how many times per month, coupled with the fact that they have a mega powerful domain which, when a new page of content is added to it that uses an exact keyword in its title, that page will rank top 5 in Google almost every time.
Therefore, the data that they're using to come up with these keywords to feed their gaggle of writers is the related keyphrases data provided by your keyword tool. Algorithmically, this should be easily detectable, as you guys have the list of related keyword data that they're using in the first place.
Why then, are they in the top 5 for each of these keywords? Are 3+ different guides on how to tie shoe laces really necessary? Shouldn't 1 page be ranking for all 3+ of these tight variations? Shouldn't dozens of related pages of content targeting minute keyword variations be something relatively easy to detect?
Seeing multiple Adsense units on these obviously SEO-fueled pages I've linked to leads me to believe there's at least a little bit of truth to what you quoted me saying.
Speaking as someone who has worked at Google for ~11 years at Google and worked on spam at Google for ~10 years, I can tell you that running AdSense doesn't get you any kind of special consideration in Google's rankings. You don't have to believe me, but it's true. :)
By the way, I talked a bit about content farms and Google's take on them in November at a search conference. Here's a link that blogged about it a bit: http://blog.search-mojo.com/2010/11/10/live-from-pubcon-vega... . That person wrote up the discussion as "Question: What is Google doing to detect content farms?
Matt: Google historically has tried to do most everything algorithmically. blekko does allow you to identify content farms, but blekko is more human based response. Google is having an active debate about this. If you can’t algorithmically identify a content farm, is it still ok to take action and remove a site?"
The other relevant write-up was at http://www.seroundtable.com/archives/023229.html and they transcribed the discussion as
Barry Schwartz: Q: Brian asked, what is google doing in terms of content farms?
Barry Schwartz: A: Matt fed this Q to Brian earlier ... hehhehe
Barry Schwartz: Tricky, Matt's team is in charge of web spam. If web spam doesn't last long in the index, what do they do? So a content farm is the bare min someone can do to get in to the index, but its borderline
Barry Schwartz: Some people in Google dont consider content farms as web spam
Barry Schwartz: They have been a little worried about people passing judgement on sites if it is a content farm a useful site.
Barry Schwartz: Think of Mahalo, Wikia, Blekko
Barry Schwartz: Those sites provide a curated experience
Barry Schwartz: It is a really interesting tension here, they don't want to bring Humans into the mix... They will let computers do it
Barry Schwartz: This is an active debate
Barry Schwartz: May Day, at least partially, was a first pass at this.
Barry Schwartz: If you can't algorithmically detect content farms, then do you take manual action?
Barry Schwartz: This is the problem they are thinking
Barry Schwartz: So if they do anything on this, they will update their guidelines
Barry Schwartz: This is an active debate in Google and we will see where we go
Barry Schwartz: Someone asked, Matt, what side are you on?
Brian Ussery (@beussery):
Matt says users are angry with content farms
Barry Schwartz: Matt said, users are not happy with content farms so he wants them out of the index."
I've never argued that running Adsense helps a site rank higher, I know that it doesn't. But I do believe that sites like eHow, who presumably make Google millions of dollars a year, are given a free pass to pursue content spam like I posted in my previous comment without any sort of repercussions that we can see. They're leveraging their domain authority and producing very low quality articles to target obscure long tail variations of keywords to keep getting that traffic.
What concerns me is "If you can’t algorithmically identify a content farm, is it still ok to take action and remove a site"...is the issue that the algorithms aren't sophisticated enough to catch these content mills from spitting out article after article of low quality, long-tail targeted traffic, or that you guys have thrown in the towel and believe that if the algos aren't throwing flags, then the sites are fine?
I posted up an example of a content mill type situation in my last response. To most people, a manual review should throw up a warning flag if the goal was to identify people targeting keywords rather than trying to help people. The top 5 rankings for each of those pages shows that neither algorithmic nor manual measures are in place to deal with such a situation.
I have 10+ content sites targeting random niches. I know how the SEO game works. I know dozens of internet marketers who have dozens of their own sites each who know how to game the algo to rank high with low quality content sites like these. It's obvious people are taking advantage of the algorithm, but it doesn't appear to be drastically improving anytime soon.
I do appreciate the time and effort you have put into your responses. If you'd like to talk privately, I would love to. I'll try and watch the video you suggested tonight.
The challenge (in my mind, at least) is how to improve the algorithms more and when it's appropriate to say "This is low enough quality that it's actually spam, and thus we're willing to look at manual action." On the bright side, we've actually got a potential algorithm idea that we're exploring now.
It's keyword variation content spam using hand written content and curated by very specific keyword data. So that seems to be a different algo trigger than a quality trigger.
If the search giants had any balls they'd cut the "Internet Marketing" community off at the knees. Because the money making methods pushed by that community either don't work or are unsustainable, so they're entirely reliant on a steady stream of new recruits. If they want to promote gaming your system don't let them reap any benefits from it.
Domains frequently being excluded by power searchers could be good signal.
(googling to find on that used to pester my search results, kods.net, it seems it has finally got banned. Hooray!)
I have a relative working at Home Depot and we just discussed this topic at length given my own recent purchase. Most of their appliance customers come in with some idea of how much money they want to spend and what basic feature set they would like. Then they are looking for a knowledgeable sales associate to explain to them the differences and benefits of the various models. They may have "heard from a friend" that a particular model was good or a particular feature was good but even that information is usually incomplete or incorrect. I would fully expect "social" results for these types of queries to result in even more misinformation.
Did this page help you find what you were looking for?
Was this page useful?
Honestly, such a system would be ridiculously easy to game with a botnet, so there needs to be significant work in the area beforehand.
Sure, it could be exploited, and I'm guessing that's why they haven't implemented it, but there's got to be a solution that would make it work.
I think something like this would be a cool thing for Google to test for a year (a voting system might take a while to set up properly)
Google briefly had a feature like this in their results, but they've removed it. All that's left is the Star system, which isn't quite the same.
Also, just got a Firefox addon working that provides automatic related links to most web pages. Unfortunately, it relies on search now.
But I'm sure that any sufficiently popular content-discovery tool will have a lot of spammers trying to game it. It's not easy to fight that.
Could you please elaborate?
People who actively like to be contacted by random persons surfing the Internet make their contact information readily available (and answer questions sent through those publicly visible contact channels). But to many other persons, not being readily visible on the Internet is a feature rather than a bug. (Disclaimer: my contact information is readily visible on the Internet, so readily visible that it has been used by point-of-view pushers on Wikipedia to give me harassing telephone calls.)
For instance, trying to find out the company a CEO worked at before their current one. The problem is that content copiers will produce so many copies of the PR announcement for their current job, it's impossible to find the announcement for their previous job. I've tried doing this exact search and it's very frustrating.
Similarly, contact and personal information for CEOs of major corporations does not exist online either and any search will turn up spam.
One more example, to add to the many. If you get a genuine wrong number call from somebody who made a simple mistake and type their caller id into the internet, you'll just get a bunch of reverse phone lookup spam while if you search for a phone number of a know telemarketer or bill collector, you'll likely get a full dossier on that company.
Where do we go from here? Well, I don't think the answer is just a radically new way of indexing/ranking websites. That might work in the short term but the spammers will soon catch up. The answer probably lies in a combination of better language interpretation, context sensitivity using browsing history and location, and user profiling based on the social graph and search history. All of which google seems to be working on.
All I'm getting is either the manufacturers slant (PR) or spam sites all harvesting the same reviews.
To solve this I now look for vertical based search sites. In this case http://www.printershowcase.com/small-officecolorlaser.aspx is the best I've found... but it's hardly to printers what dpreview is to cameras.
I stick with Google because it largely works well, but when I know what I want to see and that it must exist but cannot find it... then I find myself looking elsewhere all the time. DDG and Blekko I use in these cases, but even they're not solving these kinds of needs.
Too many products without reviews. No control for the reviews.
Example... the HP CP4025 appears to be good (I have a HP Z800 workstation so thought it was worth checking HP for the hell of it)... yet I haven't found a balanced and comparable review for it. I'd like to see it set against other small office printers and to see the average cost per print.
It's on Amazon twice:
But no reviews.
I'd like Google to take my search terms and help me find stuff... you know, that "sort the world's data" thing, to make sense of my terms and show me the results I'm looking for.
Google isn't working for me. Amazon isn't working for me. It takes hours and hours of searching in circles to research even the most basic purchasing decision.
They seriously need to hire a capable UX person. The logged-in interface is full of problems:
* Twitter-like status update. I believe this has nothing to do with search.
* Form with 10+ fields on creating a slashtag. You cannot possibly expect me to enter all domain names I could think of into that tiny <textarea>?
* I finally created /python but I have no idea how to improve or update the slashtag. I cannot update that slashtag from search results page.
Overall, very frustrating experience.
Oh of course, it's not in Google's interest to do this, because they make money from the spam sites. So I don't expect Google to really "solve" this problem.... their trick is to stay useful enough that users don't abandon them, but allow enough spam into the search results to provide revenue. A tricky balance...
It may not affect ranking (and I do believe that), but I would certainly be willing to bet it affects them (and similar sites than aren't AdSense-based) not being de-ranked.
If me saying so isn't enough evidence for you, consider that it makes sense. Google knows that losing the lead in search would be much more damaging then shutting down all of AdSense.
Thought Worth mentioning
Nothing is released yet unfortunately. The site is officially a hobby for me write more but I hope to have the new stuff up in the next week or two. I may just hide the realtime stuff and get the blekko feeds up sooner rather than later.
Now that I am focusing building the site to fit my needs getting up to date info about products and technology, the bulk of my personal searches, is the top priority. Have to admit the blekko api has helped.
In the mean time I would suggest the slash tags /reviews and /blogs with /date on blekko would be very helpful if you are doing product searches. With unscatter I am really only providing shortcuts for the with additional ui tweaks.
Disclaimer: I am in no way associated with blekko other than having been given permission to use their api for a personal project.
I was interested that blekko seems to have done a lot with a modest amount of funding.
Also, I wonder if they are getting some monetization with the association with Facebook.
A simple solution to this: Consumer Reports. A subscription is well worth it! The likelihood that it will pay for itself in the next year is very high.
They are also politically left-leaning, if that matters to you.
Yeah, don't use them to rate tech products. But as far as coventional appliances go, they are a very good resource.
Fair enough. There is the Open Directory Project (which is pretty old) and of course there is Facebook, Twitter, and other, human-curated services. Starting a whole new company to do search and compete with Google (and Bing)? Seems like a waste of time as Google can just copy what you are doing and incorporate it into its already massive site (complete with traffic, audience, and lots of other goodies). Instead, why not get Google to add more social recommendation and feedback features?
If you tied that with the ability to follow other people and their search edits, the number of spammy results could be reduced.
- less spam
- programmer oriented results, when relevant
- more legible search results
Also, "crowd-sourced curated lists of websites" sound like the old Yahoo directory of yore. They will either become obsolete very quickly or spammers will find a way to penetrate and dominate them.
The truth is, if some company comes up with a better search engine, whatever ideas behind it are not going to sound like an obvious win up front—if they did then Google would already be doing that. Instead they'll have to create a search engine that is better, but somehow antithetical to Google's business model so that they can't just copy it, because there's no way for a startup to come up with enough resources to stay materially ahead of Google in pure search. And of course that's only half the battle; then you have to be better enough that users can be bothered to switch (or a browser deal coup).
Personally I haven't found the spam problem to be nearly as bad as the echo chamber makes out. I think silicon valley types just have a good imagination about how good it could be.
The way del.icio.us does curation is smart in many level. Spammers need to create many accounts to affect tags they care about.
Essentially, you're talking about something like Chrome's history search (it indexes the content of every page you visit, and allows you to do full-text searches of them), but with the ability to expand the search to other users indexes. You'll want to make it entirely painless for users to add pages - a keyboard shortcut at most - and integrate into the browser as much as possible. It's also probably a good idea to add some mechanism to identify users with reliable indexes and allow others to use them; a karma system could be used for this (add karma if a result is useful, remove it if it's not - similar to how HN works with comments). Users with really low karma (indicating that they're pushing lots of spam sites into the service) would have their results biased against in full-index searches, or outright removed without them being aware (similar to how users on HN can be killed, so that the see all their posts normally but no one else does).
Disclaimer: I'm not quite awake, so take this advice with a huge grain of salt. It's just my thoughts on the idea.
Yup, social graph wasn't the right choice of words on my part. I was thinking about something like twitters "follow". For example, you could go around a follow a bunch of well known programmers.
I don't like the karma idea, at least not at a global level. I think it has to be about trust, at an individual level. These are the people I trust, search their bookmarks.
I do think think that some human curation, if for nothing else to mark sites as copy spam, might be workable. Pattern matching is still humanity's territory.