and each autogenerated page consists of literally hundreds of affiliate links stuffed with keywords for unrelated products.
When Google's webspam team takes action on websites in our websearch index, we can pass that information on to the ads group so they can check for violations. But it's a one-way street: we can send the ads team signals or information about spammers or other violations of our quality guidelines, but the ads team doesn't send information over to the quality/webspam team.
The non-dollar-amount queries seem like a bug -- triggering a default of listing up from 0.00, unrelated to the query. If this were an intentional manipulative tactic, wouldn't the outlinks be related to the query?
Still, tons of sites have infinite paths when fed unexpected queries. At least in a cursory look over the filleritem site, there's no obvious self-linking to abusive keyword-based queries -- like your example <?q=teen+sex>.
If he just fixes it so that non-numeric queries get an error page, doesn't his homepage still deserve to be found by people searching for 'filleritem'?
It's not obviously worse than the current top hit for that (and similar) queries, which behaves almost exactly the same, but might not trigger your same heuristics with its AJAX-results loads. (In such cases I also have to wonder if the initial penalty may have been triggered by complaints, or maliciously-created links, from similar competitive sites that thought themselves immune from the same enforcement actions...)
You know who else generates infinite autogenerated pages with affiliate codes attached? GOOGLE. http://www.google.com/search?q=teen+sex&tbm=shop&hl=...
It's scary that Google uses "logic" like yours to make these decisions. No: It's terrifying.
In this case we have something of an edge case; the site is useful, but the auto-generated pages cause potential problems for search results.
If you were talking about how Google communicates these issues, or how it goes about resolving these edge cases, then I'd be agreeing. Much to be improved.
But getting rid of the potential issue.. thumbs up.
My experimental sites(the contents that I made sure they are crappy in the eyes of humans) with cheap content ($3/100 words) without any editorial control marked as quality sites and perform well in searches. While the sites that we spent thousands of dollars with strict editorial control were punished for some reason.
At one point a scrapper site that picks up one of our sites partial feed with 150 words excerpt out ranked us in the search results. What kind of quality guideline is that, a 750 words post is bad quality while a 150 words excerpt of that is good.
Google's motive may be good, observing it for the past 7 months, I found their approach is wrong.
As you do I use BIng/DuckDuckGO these days. Google has so much market share (around 85% in my observation ), if it goes down to 30% then we don't need to worry about pleasing Google, but focus on pleasing our customers/visitors like normal business do.
- looking for messages in our webmaster console at google.com/webmasters
- asking in our webmaster forum, also linked to from google.com/webmasters
- doing a reconsideration request (also helps with confirmation of manual action)
- you can talk to search engine reps at various search conferences, e.g. I'll be at PubCon next month.
- calling AdWords support (only for AdWords customers, and this won't give you SEO advice)
We've also been experimenting with 1:1 support over email, by way of a link in our webmaster console. The tension there is finding a solution that scales. We do try to keep an eye on tweets, blog posts, Google+, Hacker News, and similar places around the web, but that's also hard to scale.
Figure out how much it costs to respond to someone, offer that as a flat fee with a clear indication, that paying only gets you a consult on what's wrong with your site, and not any promise that it will get you reinstated. Basically what you did in your comments above.
I would personally make it expensive to make sure people don't try continually working the system by paying for the consult to get inside information about how to rank higher on Google.
I work in a support center, so I definitely recognize that it's expensive and difficult to do well at scale, so I appreciate that you're looking into how to do it!
Larry Page was especially critical of pay-for-inclusion because it skews your incentives: if you don't crawl the web well, then people pay you to fix your own shortcomings, which in turn encourages you to have more shortcomings.
I think Google also comes from the perspective of self-service AdWords being successful, so the idea of self-service (free) diagnostics really appeals to us. That's why we've put a lot of effort into our free webmaster tools.
I wouldn't be philosophically opposed to a pay-for-support system if it were done well, but it would be a tricky thing to get right. Normally when we consider it, we end up saying things like "Why don't we just try to make it so that people don't need that option?"
It would be fine provided you explicitly didn't turn it into a profit center, and just made it pay for the people's time.
Torching people's websites and shrouding the reasons why in mystery skews webmasters' incentives - away from creating high quality content and towards figuring out how to circumvent the latest change to the search algo.
>I wouldn't be philosophically opposed to a pay-for-support system if it were done well, but it would be a tricky thing to get right. Normally when we consider it, we end up saying things like "Why don't we just try to make it so that people don't need that option?"
I'm sure you could do that tomorrow if you wanted, but explaining in perfect detail exactly why somebody's (legitimate) website got torched would open the details of your algorithm right up, which would not only open it up to gaming, but would open it up to being copied.
You really could use a team of humans who can explain in human terms (as opposed to algorithmic) exactly why webmasters' sites got torched for violating the spirit of your "high quality content rule". Those same humans could equally feed back data to the search team where in their opinion an algorithm accidentally torched something it probably shouldn't have.
This is going on the wall here at the office. Thank you.
We don't know each other but I think we know of each other. I'm rather immersed in webspam detection and found this incredibly interesting.
You imply that the challenge is finding a solution that scales. Yet it sounds to me from your response that this site was flagged via manual review. Did I misunderstand?
If I heard you correctly, then is manual review a significant equation in the webspam detection methodology? You guys are boiling the ocean so I find that rather hard to swallow.
The more likely conclusion I can draw is that he had a significant number of (auto-generated) pages on his site flagged as spam and that in turn raised some eyebrows.
BTW, you and your team are doing some amazing work. I wish the paid side was up to the standards you set.
The basic philosophy is to do as much as we can algorithmically, but there will always be a residual of hard cases that computers might not do as well at (e.g. spotting hacked sites and identifying the parts of a site that have been hacked). That's where the manual webspam team really adds a lot of value.
In addition to things like removing sites, the data from the manual webspam team is also used to train the next generations of our algorithms. For example, the hacked site data that our manual team produced not only helped webmasters directly, we also used that data to produce an automatic hacked site detector.
If you're interested, I made a video about the interaction between algorithmic and manual spamfighting here: http://www.youtube.com/watch?v=ES01L4xjSXE
What would really help is a clear reasoning given for the ban and a set of steps a site can take to get back in. From what I see often web masters are more than willing to make changes, they just don't know which to make. Often to the bans are said to be final, denying someone the chance to make it better.
I guess it is tempting to provide minimal support for organic search as people aren't paying and generally there are plenty of other sites to cover ones absence. One thing that would be good is say a yearly fee which guarantees that you can get one on one support if something happens.
From your reply, I understand that your primary issue is indeed with scaling any manner of support issues. As I understand it, the current thinking at Google is to seek automated solutions over technical to reduce costs and improve issue volume handling. With that in mind, is there a system of triggers that can flag issues for manual review or intervention instead of being handled completely automatically?
I think the majority of issues that have people up in arms aren't really that big, but there are the exceptions. Instead of monitoring G+/Twitter/etc., there could instead be a series of internal checks that look for particular criteria:
1. Age of campaign/ad - was it super old and not really applicable or not even running much budget? It may be worth a lighter hand than a complete ban because it isn't indicative of a trend or pattern on the part of the advertiser.
2. Amount of impressions/clicks/spend over time - what is the overall severity based on actual impact? Are users actively clicking on the ad, returning to the site and continuing to use it? As each click and therefore organic link is logged, it would be good to cross reference them and go "well, this site raises flags but has high return-user rates and decent sentiment in organic results"
3. Traffic vs. Content ratio - does the site have fairly thin content that still gets dramatic numbers of users? Tying in with #2, it basically can help tell if a site is offering something thin on content but highly unique and/or valuable. If people are using it, they may be on to something that the QS algorithm misses.
The one thing I'd love to see is just a little bit more of a verbose notification or warning for the people that get hit. While many only have themselves to blame, many want to make sure they stay on the right side and any chance they have to do so is of great benefit. Then again, it is also important to not give to much information to the black-hatters that just make use of it to get even better at gaming the system.
These are just thoughts, but it has long been on my mind considering the metric ass-ton I've run through AdWords for myself and clients over the years. Hope this helps spark a few ideas :)
I can't speak for the ads folks at all, unfortunately, but on the websearch side we certainly strive to do as much as we can algorithmically, but also to use our manual cycles effectively. The site in question was flagged algorithmically, but also sent to a member of the manual webspam team who concluded that it violated our quality guidelines. When a reconsideration request came in, it went back to a manual person for review again.
To the extent we can figure out ways to do it without compromising our systems, I think both websearch and ads would like to be as transparent as we can.
Put yourself in the shoes of someone who is not a scammer and is trying to do a good job. Perhaps someone how is learning the ropes. Getting hit with an unilateral, violent and --for all intents and purposes-- permanent action such as seems common in these cases can be devastating.
I'd like to relate a case that I witnessed that was truly perplexing. It involved about 200 domains that were registered with GoDaddy and placed in their "cash parking" program.
They were there for months with no issues whatsoever. One day, the domain owner realized that this cash parking service was Google AdSense with GoDaddy taking a bite of the minimal action. So...he moved all the domains to a product called "AdSense for Sites" (I don't remember the exact name). This services was marketed by Google as a place to park your domains and earn some money through advertising that Google would automagically place on these domains. Same service that had been on these domains for ages through GoDaddy.
The transfer went well and all domains were accepted. No problems. The domains go "live", if you will, and ads start showing as predicted.
Two days later the account is cancelled and the domains are taken off the program citing "suspicious activity". The irony is that the sites --and their content-- were handled entirely by Google through this "AdSense for Domains" product. No self-clicking activity took place whatsoever. No nefarious activity of any kind. This person was far too busy with real business to go around clicking on ads across 200 domains to make $0.25 at the end of the day.
This was simply a transfer to Google for the same service that Google had been providing through GoDaddy for many months, years in some cases.
There was no recourse. No way to speak to anyone. No way to even try to understand what this "suspicious activity" was all about. The account was banned, closed, done...forever.
Now, here's a person who had plans for legitimate and valuable real sites to be launched on some of these domains later on. The whole experience scared him to a point of simply rejecting the idea of doing anything with Google if he could avoid it.
We had no way to provide any kind of an argument to the contrary because of the violent and totalitarian nature of the cutoff. Who would want to do business under those conditions?
The fact of the matter is that AdWords/AdSense generated revenue could evaporate overnight and with no recourse whatsoever. That's a tough pill to swallow for anyone who is a legitimate entrepreneur looking to build value and make some money or earn a living through their efforts.
Scammers are a different matter. However, you seem to treat both groups with the same hammer which, in my humble opinion, is not right.
I now have to advise anyone we work with that any income that relies on Google for either lead generation or direct income (AdSense) has to be treated as though it could evaporate at any time and for any reason without any real opportunity given to restore it in a timely fashion. Without that caveat on the table I couldn't personally advise anyone to use your products.
In a normal business one would engage with ones vendors in order to resolve issues in mutually beneficial ways. In this case it is a one-way street with violent and severe consequences for your customers and partners. That's what you have to fix.
I can't speak for the AdSense (for Domains) team other than to say that when they shut down an account, they think that they have good reason for it. And unfortunately, that's typically a situation where they can't give many details--if the team sees abuse, providing information about how the abuse was flagged would help spammers quite a bit.
I know that Google can seem abrupt sometimes, and I dislike that, but part of the issue is also scale. See https://plus.google.com/117377434815709898403/posts/1hRWj489... that notes that if each Google user had a single 10 minute issue every three years, that would need 20,000+ support people to handle that load. Or consider that there's 200M domain names, and all those webmasters want to talk to Google and ask questions.
Even this link is discouraging: http://www.theatlantic.com/magazine/archive/2011/11/hacked/8... It mentions that several thousand people get their Gmail account hijacked every day. Trying to support all the people who want to interact with Google in a scalable way is a really hard problem.
I think it would be fair if there was some personal support for people who either spend a lot of money on Adwords (I think there already is?) or make a lot of money on Adsense. These people probably have the highest levels of stress due to "insta-evaporation probability", and no doubt the amount of support people would be several magnitudes less if you used a threshold like that.
With regards to the argument that describes the resources required for customer support even to a small percentage of your audience I can only say this: It's your chosen business model. I am paraphrasing one of my favorite answers when someone complains about their job ("It's your chosen profession").
The point is that Google's business is about doing what it does for a huge number of people. If supporting them is overwhelming either get out of that business or figure out how to do it correctly. I can't really accept the "it's too many people" argument as a valid reason for not doing it well or for applying the "criminal algorithm" to everyone.
There's a side thread there that says that "providing information about how the abuse was flagged would help spammers quite a bit". While true, I, again, find myself not agreeing with the idea of punishing legitimate customers for this reason. I would like to think that the vast majority of Google customers fall under the "legitimate" category. If spammers get better because you are providing detail flagging information you will simply have to get better at detecting and blocking spammers. This would trigger an evolutionary phase which, at some point, should make it very difficult for a spammer to game the system, even with "full source" if you will. Much like security algorithms become more secure if the source is released and is tested with full knowledge of the internals, yours should do the same.
Conversely, honest and legitimate customers would gain the huge benefit of now understanding how to behave or how to do things and why a certain approach might not do well in Google's ecosystem.
One of the most frustrating things I have seen is someone full of drive to launch an internet business only to be shot down by a Google shutdown. And, when no reason or actionable information for the punishment is given this entrepreneur simply had to throw their hands up and give up on that tack. Needless to say, their next attempt ignored Google products completely and they are doing OK. My guess is that they could have done very well and much grief could have been prevented had Google said: "Your site has the following problems ..." and then, rather than cut them off schedule them for a review in, say, thirty days (or variable length based on the seriousness of the issue). That would have been far more civilized and far more conducive to helping your community grow and evolve in the right direction.
New and inexperienced internet entrepreneurs (and some experienced ones) need a way to learn how to behave. What works and what does not. What is acceptable and what isn't. It is only reasonable to assume that they will make many mistakes in their zeal to get an idea off the ground. Penalizing them with a permanent hammer blow to the head is not conducive to growing better netizens. Guiding them with actionable feedback is.
The current process can only be characterized as violent. From the perspective of an honest business person it is tantamount to getting hit with a bullet while walking your dog. The reasons could have been many. Maybe the shooter objected to your walking your dog in front of their home. Had the shooter at least attempted to communicate with the dog-walker it is far more likely that violence could have been averted.
Above all, if your "Do no evil" is sincere, then you have to change the way this works right away. The way this hits honest entrepreneurs is nothing less than pure evil. Again, you take a bullet and you don't know why.
I do appreciate your visibility here in HN. In the past I have simply given up trying to raise these and other points with anyone at Google that might remotely have the ability to at least elevate the conversation internally. I hope you might be that person. I mean all of the above in the vein of constructive criticism. We all want to see the ecosystem becoming more conducive to the exploration of new ideas. Google, at this time, has taken a rather totalitarian position of being the "moral authority", if you will. With that, and so long as you want to be a benevolent dictator, I think, you inherit the responsibility to not cause harm through your actions.
Having said that, until things change I have no choice but to treat your offerings as something that one simply cannot rely on to build a business. The "Google love" can disappear from your site overnight and you'll have no practical way to fix it. That's not a business, that's going to Vegas.
Thanks for responding.
The point of the site is to find items of a particular price that qualify for free shipping on Amazon.com. If you want I will give you access to my google analytics to show you that this is a site people want and use.
I guess I should not let people link to the search results page?
http://www.filleritem.com/index.html?q=hacker+news and http://www.filleritem.com/index.html?q=anything+that+is+not+... returns the same thing as if you did a search for $0. I will fix the bug so it returns an error. These are pages that no one has ever linked to as far as I know.
Users would be happier if they landed on the root page of your site or the root page of http://www.superfillers.com/ or http://www.filleritemfinder.com/ than if they landed on a deep page full of links and unrelated products.
Also: are any specific notices or warnings given in Google Webmaster Tools, to inform the webmaster and give some clues about the findings and decisions of your team?
Thanks Matt for taking time to explain things here.
Thanks for submitting the reconsideration request. I just added Disallow: /iframe.html?q= as suggested.
A search for hacker news results in a page full of affiliate links, just as the example you gave above. Only difference is that they didn't re-write the URL.
filleritemfinder.com has no robots.txt that I was able to pull up.
So, filleritem.com, a google customer, was blocked, but filleritemfinder.com, doing the same thing, is the number one result.
Further, shouldn't this kind of advice be given to people who appeal being excluded from the index? Or should we all post to Hacker News when it happens to us so that you can come explain directly?
I think %50 of the problem is the arbitrary picking of sites to block (and it's not working, btw) and %50 of it is that google seems uninterested in explaining or advising people when it happens to them.
 Been buying gear for a project lately, and so doing a lot of google searches in the form of product-model-number review or product-name review. Overwhelmed with spam sites, and mindless human generated spam sites like dpreview.com, etc.
I mentioned filleritemfinder.com as a random example (there are many of these services), but filleritemfinder.com appears to use AJAX to keep results on the same page rather than making a new url.
"filleritem.com, a google customer, was blocked, but filleritemfinder.com, doing the same thing, is the number one result."
The filleritemfinder.com site is not doing the same thing, because it's not generating fresh urls for every possible search. But you're not really suggesting that we should treat advertising customers somehow differently in our search results, are you? The webspam team takes action regardless of whether someone is an advertising customer or not.
"shouldn't this kind of advice be given to people who appeal being excluded from the index?"
This advice is available to everyone in our quality guidelines. It sounds like the site owner reached out to the AdWords team, which gave him clear guidance that the site violated the ads policy on bridge pages. It sounds like the site owner also filed a reconsideration request, and we replied to let the site owner know that the reconsideration request was rejected because it was still in violation of our policies. It doesn't look like the site owner stopped by our webmaster support forum, at least that I could see in a quick look. At that point, the site owner did a blog post and submitted the post to Hacker News, where I was happy to reply.
Also is there a preferred monetization model, e.g. do Google think advertisements are more or less harmful to the user experience than affiliate links, sponsored posts, etc?
Obviously across different models you can't just track space taken up, so is there some kind of metric that tracks the rate of content diluting via monetization?
Our models currently suggest that the presence of contextual advertising is a significant predictive factor of webspam.
We use 10-fold bagging and classification trees, so it's not all that easy to generalize. But I pulled one model out at random for fun.
The top predictive factor in this particular model is the probability outcome of the bigrams (word pairs) extracted the visible text on the page. Here are a few significant bigrams:
Next, this model looks for tokens extracted from the URL and particular meta tags from the page. Similar to above, but I believe unigrams only. A few examples follow. Please keep in mind that none of these phrases are used individually... they are each weighted and combined with all other known factors on the page:
The model then looks at the outdegree of the page (number of unique domains pointed to).
From there, it breaks down into TLD (.biz, .ru, .gov, etc)
The file gets pretty hard to decipher at this point (it's a huge XML file) but contextual advertising is used as a predictive variable throughout.
Just from eyeballing it, it appears to be more or less as significant as the precision and recall rate of high value commercial terms, average word length (western languages only), and visible text length.
Based on what I'm looking at right now, my answer would be that sponsored posts are going to be far more harmful to the user experience than advertising.
Can't answer the rest of your question which I assume relates to the number of ad blocks or amount of space taken up by ads... we don't measure it.
Edit: Just realized that Google will probably delist this page within 24 hours. Should've used a gif for those bigrams. Oh well ;-)
"True" - because my current understanding (which Matt_Cutts can elucidate on if he chooses to) is that Google has looked into - but does not currently incorporate - the presence of advertising as a spam signal.
"Sadly" - because my independent research has shown that advertising - most notably the presence of Google AdSense - is a reliable predictive variable of a page being spam.
All things being equal, a page with AdSense blocks on it is far more likely to be spam. Yet as of a few months ago, that does not appear to weigh very heavily into the equation.
That way their focussing less on removing spammers and more on user quality and thus removing spam.
I agree but don't you think that from an algorithmic point of view Google would be better looking at what the user wants and what monetization models they prefer[...]
No. In fact, I am a rather loudmouthed opponent to Google's somewhat clumsy attempts to measure this ala "Quality Score".
In addition to webspam detection and machine learning, I have spent way too much time in marketing (I have a master's degree in marketing, in fact.)
A neat thing I learned along the way was the value of market research.
There are so many nuances in every line of business. Segments, preferences, pricing, even down to minutia (now well studied) such as fonts, gutter widths, copy styles, and so on.
You can learn a lot by combining large amounts of data and well chosen machine learning algos. But even with a few thousand businesses in most categories in a particular country (far less outside of the US), that doesn't give an outsider enough data to truly distinguish what can be a winning formula from a spammy one. This knowledge is hard won through carefully executed experiments and research.
A few years ago I was researching the topic of landing page formulas by category. One example that stuck out most in my mind was mortgages. There were a few tried and true "formulas" that significantly outperformed the rest. Two stuck out:
1) Man, woman, and sometimes child standing on a green lawn in front of home. Arrow pointing down from top left of landing page to mid/lower right positioned form. Form limited to three fields.
2) Picture of home/s docked to bottom of lead gen page. No people. Light/white background. Arrow pointing down from top left of landing page to centrally located form.
These sites were incredibly successful. More than a few of them had to contend with quality score issues over the years. Can an algorithm capture nuances such as the ones I mentioned? In theory... they could. But today, they don't. All of Google's QS algorithms to date have been failed attempts and have caused an incredible amount of harm and distrust.
You finished that sentence with:
to see versus the averages in terms of monetization models on spam sites.
I'm not at all sure what this means. Could you explain? Is it even possible to directly model the monetization model of a site without having direct access to their metrics?
And assuming they could (despite your arguments against) determine the method of monetization on a site, then simply compare the models they see on spam sites versus another metric which would track the user's reaction to certain types of monetization models.
I can assure you that that they cannot do this accurately. You would be amazed at the scummy business models that openly advertise on Google and are not caught. It would be incredibly difficult to do so as some of them are downright ingenious (one example: free software that updates your drivers.)
It sounds to me that you are positing some type of magical technology that doesn't exist predicated on Google's seeming omniscience. Of course, I am eager to stand corrected...?
This. Assume you work on the webspam team and you have a 92% spam detection rate but a 99.9% accuracy rate on what you do detect.
There are around 40,000,000 active domains in a given month listed on Google. That means 40,000 sites on average are being penalized without reason.
A search for the words "filler item" is not at all related and not at all helpful.
Autogenerated in the bad sense is link farms like fakesite.com/buy_drugwiththisname_now.html, with drugwiththisname replaced with 500,000 different possibilities, and all generating pages linked to each other, which also have incoming links from ten billion forum comments across the web where a bot has signed up en masse and posted spam.
Now, perhaps this guy is posting spam to his site in places which makes it valid to declare this a link farm and kill it.
Looking at the site now, being able to search amazon by exact price is a pretty neat function and is totally different from a link farm.
I think killing his site for having a search function is pretty unreasonable.
However, you are a private for-profit company, so you can do as you please obviously.
Actually there is no difference between search page result sites like the OP and what you said. Having a URL that ends in .html does not mean there is a static html file on the server. For examples, nextag and pricegrabber have URLs like blah.com/digital/Canon-EOS-7D-SLR-Digital-Body/m739295014.html. Scroll around these sites and you'll see they're anything but static.
Each page is simply a result of a query. Whether the database is local (like nextag) or remote (like Amazon API query like OP) is inconsequential. Personally, I would be VERY happy if Google hid/ignored these kind of search-query sites. I have not once found any of these sites useful. Problem is the large gray area in which these sites operate and what I find useless, someone else might find useful.
Uh, yes there is. http://www.somesite.com/index.html?q=foo is obviously a search (right down to the choice if q for query), but http://www.somesite.com/buy_foo_now.html does not look like one.
You might say that technically they could have the same back-end, but the web is flexible - just about any URL scheme could resolve to the same back-end. The difference is that the first is honest and the second one goes out of its way to hide what it does.
Obviously? Drupal's default non-rewritten URLs are index.php?q=foo, where foo is the page path. `q` can and does stand for more than one thing.
These conflicting messages from Google are very confusing!
In that case, Google Webmaster tools is not actually reporting an error. That's a report to show you what URLs Google tried to crawl but couldn't (due to being blocked) so you can review it and ensure that you are not accidentally blocking URLs that you want to have indexed.
I agree that it's confusing in that the report is in the "crawl errors" section.
(I built Google webmaster tools so this confusion is entirely my fault; but I don't work at Google anymore so sadly I can't fix this.)
So a couple of questions:
- What is of value for a user?
- Who is determining the value for the users in these cases?
As always, it's not always quite clear on what treatment you should use for search pages!
In any case, I agree with you, search engine indexing search results would be bad, but the line is not that clear all the time!
Some vertical search engine result pages are a great and relevant result from a user perspective on the question they are trying to solve.
Also if you have some kind of recent searches list and those link.
My sister always got a kick out of the fact there were random pirate stickers (or some other random cheap item) in her birthday gift. Free shipping!
Cleaning up spam sites is great, but it feels like Google is editorializing on this. A fine line.
I think that is the issue here -- probably the vast majority of such sites are undesirable, but you can't discount them all.
However, you seem to imply there's something wrong with the site.
Let me ask you a question, who decides whether a site is "right" or not? Google or the people that use the web? What was the original purpose of PageRank?
You can make excuses all you want, but fact of the matter is that Google's Search Results quality has been plummeting over the past couple of months and it all stems from Google pretending to better know what people want than the people themselves.
Good luck with your website.
since google has aquired some big websites like Zagat, theoretically - if one day they will just start to screw Zagat competitors , what can be actually done about it ,
i mean how can one prove that google is just attacking you and they just dont like you as their competitor ...
its relative - you can always say that your site is not original enough, or you have bad links etc.
Ultimately though if you don't trust Google, we don't lock your data in, so you can use a different search engine or service just by entering it in the address bar. That's the ultimate check on Google: if we start to act too abusive or "evil" we know that people can desert us. So it's in our enlightened self-interest to try to act in our users' long-term interests.
I hear this line thrown about quite a bit. And while it's true with regular users, it's certainly not true for webmasters or advertisers. Google controls around 67% of all US search share. If an advertiser doesn't play by your rules, they forfeit a significant amount of natural search traffic.
Maybe this has been addressed, but there are other areas. Affiliate sites are also penalized quite heavily by Google. It's one thing to take a stand due to the supposed quality of many of these sites (which frankly has little correlation with the presence of affiliate links... most sites suck). It's quite another when Google has a large affiliate advertising practice in house and a significant investment in an affiliate link tracking/cloaking company.
I mention this with all due respect and I hope you take it as constructive criticism. I think the organic side of the house does a great job overall. The paid side is another story IMHO. Part of this is organizational stupidity... I struggle with this every day and I have a much smaller organization.
Like Google, webmasters like us also try to improve quality of their web pages, but sometimes we fail to understand what actually "Quality" means to Google. May be the angle from your see, is not easy for us to catch. It would be better if Google can publish some Dos and Don't list for the Panda.
For a keyword like "dating", only top companies are being given space on the first page. Well, if you see dating.com, they have nothing like dating in their site and also the articles are nothing but crap, but they managed to be on top and survived in Panda, while small fish like us are no where. Having a far better site, useful contents, more than 200000 active users has no use.
You do not expect a "Dating" website to hire 10 content writers and keep posting fresh articles, because those are the "Contents" Google consider as content. All niche can not be put under one "Quality" guideline.
My apology if i said anything wrong.
1. When I search for a restaurant, 99% of the time Google places shows up before Yelp. So you are telling me that Google places always has better content than Yelp? Look at these reviews for Gary Danko - http://maps.google.com/maps/place?num=100&hl=en&biw=...
"love it" and "our favorite restaurant" Does Google Places have much better quality content 99% of the time?
2. Look at http://www.seobook.com/images/google-google-google-google-go.... A. Look at all of Google's sites in the results. B. What are those Youtube videos doing in these results? Every other result in this set has the words hollywood and cauldron right together(except the leaky-cauldron domain result), but since Youtube is a Google property it shows up. It's funny. Do this search on your own and you will see that the top 40 results have "Hollywood Cauldron" in its title, but the ones in Youtube do not.
So your results are all about quality unless it comes down to your own content. Then, it doesn't matter.
By the way, I can't wait for Google Plus to start hogging space in your results too. We all know how important "social" is to you guys. Even more important than "local." So we shall assume the results will be even more crowded google content. If we are lucky, maybe Robert Scoble will mention the words hollywood and cauldron in one of his wordy posts and we will see at the top of the results! And by the way, no one buys that don't be evil thing any more... long gone.
"Websites that feature links to other websites while providing minimal or no added functionality or unique content for the user
Added functionality includes, but isn't limited to, searching, sorting, comparing, ranking, and filtering"
The page in question provides the added functionality listed (searching/filtering at least)
I checked after this all happened, and there were several warnings from adwords, but nothing from webmaster tools. I simply ignored the adwords warnings as I was not actively using the account and figured they were just disabling ads. If I had know these were the consequences, I would have just deleted my adwords account.
It is very important to remember: when dealing with Google AdWords, the definition of a violation changes with time. If you violated the current policies in the past, you may still be banned. You take a great risk running there and any and all warnings you get should be dealt with immediately.
That said, seeing an exclusion from organic based on an AdWords suspension is extremely alarming. That may be exactly what the DoJ is looking for in terms of violations by Google.
According to Matt Cutts, it was the exact opposite: he got the organic suspension then the AdWords suspension.
filleritemfinder.com is reacheable from google, and as far as I can tell, has the same functionality, with better design. Maybe some google guru could comment if a redesign would save you.
My hunch is one of your competitors got the site taken down. I've had a few clients who had their competitors file complaints with Google, or inform them they were using black hat SEO to get on the first page of the SERPS. In one case they were successful, but I had the site back up in less than 24 hour,s so it wasn't a huge deal.
It sounds like it might be an ongoing issue and are not going to put your site back in the SERPS for a while. I feel for you brother.
Edit: In a comment above Matt Cutts from Google gives an explanation.
Nettkatalogen.no in Google.no ranks for about every possible term.
(They use the "powered by Google logo". They use very aggressive phone sales tactics and scam people. They buy links from newspapers.)
Is www.nettkatalogen.no violating Google Quality Guidelines? Or is this okay?
It may be a bit of a hassle, but may be worth adding a kind of 'interesting/unique filler items' blog to the site to increase the content to links/js ratio.
The guy needs to code a better site and I sure as hell would never use that so he can get affiliate money.
Sorry, but your fortunate this site has survived for so long.
My site got hit by Panda (no manual penalty). After relaunching my site (which I had luckily been working on for over a year), my bounce rate is 30% and my direct link rate is 50%. Still, Google has me by the throat. My traffic is almost dead. Panda doesn't care if you're solving a problem for users and giving them the eye candy they want. It apparently just wants your site to be a "unique, well-written" news site. People come to my site to browse vintage items and find stuff that they didn't know existed in the niche. They usually visit during work hours and late in the evening. Reading ease for these items is on a college level (Which Panda doesn't like). They don't want to read a 1000 word article, they usually just want to hear it, see it, maybe ask question or two about the item and see who may be selling one. I offer that.
Panda's obvious Bayesian nature doesn't understand nuance. It only knows napalm.
Niche markets are the ones getting the crap end of the stick. And it's funny, niche markets used to be what made the web so great! Not anymore. Now it's 100% Googlized Walmart-ization.
And yeah, I'll be using www.filleritem.com when shopping Amazon. I didn't know it existed until I read this post. I don't mind, transparently, giving someone a few pennies for a good service. You do it all the time.
Searchers have quality guidelines too, though they are not explicitly published by them on the web. If a search engine delivers low-quality results, the searchers move on to the next search engine. A search engine without searchers is hardly profitable.
Most of the quality guidelines of Google are just common sense, following them will in many cases increase the usability and findability of a site.
If this has happend to you many times, you are likely doing something wrong. I have never had one of my sites banned from Google.
Right. So. YOU'VE never personally had a site banned, therefore others who have must be doing it wrong.
You know what, I've never been hit by a car while crossing the street, therefore others who have must be crossing the street wrong.
I've never been in a plane crash, therefore others who have must be doing it wrong.
As somebody once quoted, "You can please some of the people, some of the time, but you can't please all of the people, all of the time!"
It is obvious that Google cannot communicate exact reasons why a site was penalized as that would help spammers. However, there is nothing that prevents them from adding a step to warn the offending website and give them a heads up before the ban/penalty takes place, along with an explanation of the policy that is/was being violated.
Most of these heads up would go ignored, some would not and yes, it would incur a support cost. However, the number of websites which are significantly penalized isn't onerous... I believe fewer than 1,000 each year?
When a company has become the defacto gateway to the internet, I believe they have a responsibility to webmasters. Google has lost a lot of goodwill over the years because of these seemingly arbitrary penalties... Instituting such a practice would be a worthwhile investment.
There's 200+ million websites out there. 1,000 spam sites would be a spam rate of 5.0 × 10^-6. If you remember the days of Altavista before Google, the actual rate of spam on the web is much higher. Here's one stat: I once heard a search engine rep (not from Google) say that they had to crawl 20 billion pages to find 1 billion non-spam pages.
So yes, we do tackle more than 1,000 websites a year. There's a ton of spam on the web, and Google has to operate on the scale of the web (e.g. in 40 different languages) to tackle all that spam.
You are of course correct. The fault is mine for miscommunicating... I find myself becoming less self-editorial these days when I write on the web and tend to think everyone is on the same page as I am.
I was actually referring to an informal study I did earlier this year. I measured sites which were receiving an average of 50,000 or more visitors from Google US search (organic) per month over a six month period. Then I compared those with a similar set from a subsequent six month period to see which had significantly dropped off in traffic and rankings. The purpose of this was to estimate the number of significant sites which were penalized over that period of time. The final estimate came to about 700 sites/year which were penalized. There are lots of uncontrolled variables here of course... but I was looking for an "order of magnitude" answer simply for curiosity's sake.
The 1 million spam pages created per day were of course excluded from consideration as they never received much traffic from Google in the first place.
So just to clarify my earlier response, I am advocating for a policy that would apply to websites exceeding a certain threshold of organic traffic for a significant period of time.
This isn't about desired ranking. If your highly useful, popular site gets blacklisted from appearing on Google at all, there better be a damned good reason.
Since Google has such an enormous position of power, they are really going to have to do better with customer service (and by customer service, I mean providing a way to talk to a human on their end).
I understand it's expensive, but if it doesn't get any better, eventually some government somewhere (that has jurisdiction) is going to regulate them.
Also, how did his site get canned but others like sportslinkup.com, which is an ebay affiliate spam site cloaked as a link directory, have over 7 million indexed pages and 8 million indexed images (all hosted by ebay, not the sports site) and it gets over half a million Google visitors per month (according to compete.com)?!? I think the sports site is even scraping Google for keywords, it's full of examples of what not to do, but it's been sailing along for years with Google approval, or at least no automated detection.
The fine line between sites getting canned and sites getting MASSIVE traffic for essentially the same thing is very confusing.