To take the example of Pandora jewelery, Pandora is a company that controls it's marketing channels with an iron fist. They're very careful to partner with better-than-average jewelers and each retailer has an exclusive territory. So far as I know, there's no legitimate channel for new Pandora products online (everybody who claims to sell them on Ebay seems to have fewer than 10 feedbacks.)
Thus, other than Pandora's official website there's no legitimate e-commerce presence for Pandora online, so there's nothing to compete with the junk. Somebody might randomly write about them, but there's nobody (legitimate) who's got a feedback loop going where revenue supports content creation and marketing efforts -- which will inevitable come out on top against amateur competition.
Demand Media, ExpertsExchange and quite a few junk sites similarly thrive on the lack of good content. I was having trouble changing the ribbon on an old typewriter a few weeks ago, and web searches asking about this particular model turned up junk pages with advice like:
(1) Buy a new typewriter ribbon,
(2) Take the old typewriter ribbon out,
(3) Put the new typewritter ribbon out
Now, these pages were keyword stuffed with the name of the typewriter, but they didn't even bother to have an affiliate (or other) link to a place where I could buy the goddamn typewriter ribbon, which according to them is 33% of the work!
Once more, the feedback loop doesn't exist to nourish a good answer here, so of course the blight is going to move in.
And, in those cases where the content mills do have a few morsels of useful information, they've usually just pushed other less professionally/cynically optimized sources for the same info down off the first page of results.
This complaint seems to come from the recent attitude absolutely everything is on the web. But tightly controlled products inherently don't have much web presence since their producers like control and all the search terms in the article seemed to involve terms pointing to these.
"NFL Jersey" - That sounds like "spam bait" already. Lots of (rather unsophisticated) people want these and the NFL itself controls its authorized products.
If you tried "cheap viagra", your results wouldn't be encouraging.
Sorry if the Interwebs are not meeting your expectations...
Perhaps Google should rate the quality of its own searches... That idea is serious.
The scary thing is that it's not automated. There are real people pasting in content and checking to see that it's correct. Fortunately for me, it's all going straight into my bayesian filter's spam corpus and making it easier to detect, but even for my one site it must be costing somebody a lot of money to post it all.
If Google had an API to report this stuff, I'd be happy to forward it along to them on the fly. Seems that there are plenty of User-generated-content sites like mine with a ton of valuable spam data if anybody figured out a way to use it.
If anybody's interested, here's what we're doing to keep the site spam free:
Wouldn't detected who's trying to generate link spam be a fairly effective way of removing them from search engine results?
I wanted to learn more about the bianchi infinito and what I found was a number of web stores selling bicycles at impossible prices. I mean oscommerce or zencart or whatever instances of legitimate looking web stores. Then when you look closer they're mostly in Indonesia and in order to complete the purchase you have to bank wire the money.
I think it's spilling over from alibaba and similar sites where 99% of the vendors are scams. Now those vendors are creating whole ecommerce web presences to make their scam sales.
The stores I saw didn't usually make the first page of results. Usually third or fourth. Sometimes second. Anyway I was a bit shocked out how many fake stores there were and how they ranked as highly as many legitimate bike shops.
Also per exhibit one of the article: The first hit for "nfl jerseys" I get, even with pws=0 is to nflshop.com. The website that nfl.com links you to when you click on the "shop" link.
More bandwagon jumping about google spam being out of control? I like to think so.
The morals are also different too. Some people might not like eFreedom, but the fact is that StackOverflow is CC-BY-SA. Anybody who wants to repackage StackOverflow content in a different way is free to do that. I do think that StackOverflow should generally outrank eFreedom, but a site like eFreedom can potentially add value a lot of value.
On the other hand, other spam sites are generating original crap content with their own crap content generation system... And if they aren't, they can switch to some other content generation method to get around duplicate content filtering.
(And speaking of which, duplicate content filtering content of some kind is absolutely essential for a workable web search engine... It's not even a matter of spam. Building a search engine for one the largest units of a large Uni, we found that there were many documents that were duplicated all over the place for all sorts of reasons, and that since the on-page factors are the same, these tend to form 'plugs' of search results that displace other results.)
Genuine question here...are you talking specifically about eFreedom, and if so exactly what value does it add? When I've inadvertently stumbled in there, the questions and answers are an exact ripoff of SO, and I (and I suspect everyone else) just immediately clicks on the "from StackOverflow" link so all the responses in the original can be read.
"Simplification of the user interface. We show only the accepted answer (or highest voted answer if no answer has been accepted yet). We removed the sidebar, comments and vote counts in order to minimize distraction. This gets you to your answer and on to your project quickly."
There are a few more points listed there such as translation and displaying snippets of related questions that seem to show a genuine effort to help answer questions.
I think eFreedom nicely illustrates the problem Matt Cutts and the folks at Google face. The site appears to be playing by Stackoverflow's and Google's rules, possibly doing SEO better than Stackoverflow. If Google also ranks based on page load times, eFreedom might be helped even more, since the site lacks the majority of Stackoverflows features and might load faster. So a programmer, having never heard of or cares about Stackoverflow, interested in only finding a specific answer that includes their Google query, might see nothing wrong with the eFreedom response. Suppose the majority doing Google searches preferred eFreedom based on measured clicks. Should the fact that Stackoverflow was the originator of the content guarantee them a higher page rank? What if Stackoverflow was slow? And stepping back from these two specific sites, how do you deal with that across all sites and their clones?
I avoid eFreedom links because I enjoy participating in Stackoverflow and use the other features, and despite the assurances of the folks at eFreedom the site still seems shady. It seems perfectly reasonable to me that Google would take steps to ensure Stackoverflow ranks higher. But that's just one case out of many. It seems like a tough problem for Google to solve across the Internet.
To each his own, I suppose, but I'm not impressed by what appears to be their "value" - SEO - and agree the site seems a bit shady.
Community sites, at least in their early phases, need to focus on getting people to put content in more than they need to focus on making it easy for people to get it out. Delicious is the classic example: it's a roach motel which makes it very easy to put your bookmarks in, but doesn't provide a useful browsing interface for your and other people's bookmarks (other than having a list of recently hot for various tags.)
Particularly in the semantic age I think there's a lot of room for remixing CC content to improve browsing and discoverability.
Not sure if I'm taking you too literally, but Experts Exchange does answer your question without having to pay (scroll down).
For example, I think within 10-20 years at the most we'll have systems that can decompose text into facts and then reassemble it into 'original' text.
Two ways to do this: you have good content either left intact (no value-add) or rearranged or otherwised structurally corrupted in order to appear to be a different/better answer (value-minus), or you have advertisers being led to believe their ads are showing on relevant content, when it's really just a jumble of random words loosely oriented around a concept. "The dog was dog walking. Dog food always is in the grocery store. RALEY's. It dogged him for years..." so on and so forth.
On one hand users are being defrauded and on the other, the advertisers/affiliates. There is no defense for eFreedom, nabble, mail-archive, and their ilk. They are bad people, bad for business and bad for the internet. I sincerely believe this.
I still don't get the value in that. Where is the "content in" (which I take to mean content generation rather than duplication) that you mention?
I agree with you about the other sites.
They have "nfl-jerseys" in the URL which is about the only redeeming thing. The page title is unrelated, which is what would show in a SERP. I clicked on the top result, a women's Ben Roethlisberger jersey and the page has nearly zero information and the images 404 (!).
Compare that to one of the results that comes up in Google and you can see why. Great titles, URLs, the filters don't require forms.
I believe Google's results are less spammy than duckduckgo for this particular query.
Won't this have a lot of adverse effects? And if keywords in anchor text become less valuable, can't spammers compensate by ramping up their existing efforts?
"I would not be surprised to see Google shift even more ranking signal power from anchor-text heavy links to relevant social media “chatter”."
Why would this be harder to game than links?
Spam happens because search is hard. There are probably solutions, but they're not as easy to come by as the ones suggested in tfa. Still, it's good to see this sort of community feedback on search results, especially given how responsive the search team is to this sort of thing. Keep up the good work, guys.
Some really dramatic changes to how we use links are on the way. (Sorry I can't say anything more specific. This is a really sensitive area.)
Something has been bugging me for a while, and it took a few hours after I read this article to figure out what it was.
I love the coining of a new term: "organized spam", and I love calling out things that are wrong, but I wonder if we're not taking this crime metaphor a bit too far.
Look guys, it's a search engine. You type in a search term, it gives you results. There's nothing magic or special about it -- anybody with a smidgen of database training can make one (although nowhere near as Google's, granted)
Although some of these examples involve people ripping other people off, I get the feeling that somehow Google has become such a part of our lives that we feel as if somehow these folks trading links and trying to get attention are acting criminally. That anything that gets in the way of my getting instant information is a crime against humanity. That really bugs me.
It's not. Get over yourself. Sure, large parts of this may be well-funded, but there's nothing necessarily criminal going on. For instance lots of poor people in lots of third-world countries are making money dropping by my blog each day and telling me how awesome I am. It's not expected, but I'm happy they're making a few dollars. I can live with the inconvenience or try to fix it on my end. I don't need to blame them.
I don't like the state of Google search right now either, although I'm still a loyal customer. But what I see in the marketplace is humans reacting logically to their best interests. If you're going to monetize google search so that billions of dollars flows through it, there's going to be some ancillary effects that nobody predicted. Instead of blaming the people, understand that the people are just regular, intelligent folks doing the best they can. Hell, my wife is in a social group with a lady who made several thousand dollars adding advertiser text to her blogs -- until Google delisted her. She saw nothing wrong with it, and still is pretty pissed at Google. From her standpoint Google crapped all over her party.
And yes, Google has every right to delist sites and such. More power to them. I hope they continue to delist and evolve their search engine. I hope they get a handle on this. But I think we should all separate our well-wishes for Google's success from our opinions of our fellow man. I've heard linkspammers and spammers called "subhuman" and all sorts of nasty things. While there are criminals who are trying to rip you off, there's no evidence that there are more criminals on the web that anywhere else. Most of these people are trying to make a living. The fact they might inconvenience you on your way to get an answer to a technical question or find the latest mp3 you have to have is really not that high on their list of priorities -- nor should it be.
Google needs to do a better job. Period. There seems to be this "conversation machine" right now where people post articles showing how bad search is, then folks come out and rant, then Google makes an announcement. Repeat and rinse. It's as if we went down to the local newstand and asked the grocer for a magazine on trucks. He gives us a bunch of magazines on boats, so -- we blame the magazine publishers! It's simply not logical. A little perspective, please. Google is the provider here and those of us who like them should try to help out. But we shouldn't cross the line into thinking that anybody that annoys Google or searcher is somehow evil or criminal. That's crazy. Much better to understand people as rational actors than to demonize anybody who tricks some random American internet company.
>It's as if we went down to the local newstand and asked the grocer for a magazine on trucks. He gives us a bunch of magazines on boats, so -- we blame the magazine publishers!
This analogy would be more true to the situation at hand if you say that the magazine publishers are using methods that they know will increase their chances of getting boat magazines in front of your eyes when you're seeking truck magazines. Do you think my assertion is off-base?
The magazine publishers are free to configure their magazines and the world around them in any way they wish. The newstand operator is responsible for what goes on inside his stand. If he's serving up junk, do we go blaming the rest of the world for the quality of his service?
Somehow we've taken Google out of the picture as an independent agent, It's as if whatever program they are running is somehow golden, and by outsiders changing the inputs that Google uses so that it doesn't work correctly that somehow the outsiders are at fault. But outisders don't set the inputs -- Google does. Outsiders don't write the ranking algorithm -- Google does. Outsiders don't make money from having ads alongside search results and tracking individual's search behavior -- Google does. Outsiders are free to do whatever they want -- that's the entire reason for picking one search provider over another, the fact that one engine can take the world as it is and do a better job of organizing it than another one can.
If we don't expect Google to be responsible for how they process data -- if we somehow place Google's poor results and put the blame on the world at large, then exactly what of value is Google providing here in our relationship?
Like I said, I'm a fan. I want them to do well. I'm happy to help if I can. But hell if I'm going to let Google off the hook for providing good search results simply because the nature of the internet has changed. Things change. That's what they're supposed to do.
This is like writing a web app that is open to SQL injection attacks and then getting pissed at everybody else when they crash your system. Except there's one big difference: with an SQL injection attack there is an outsider directly interacting with your system, perhaps malevolently. With Google, outsiders don't even enter data in, Google goes and gets it. We've got the shoe on the wrong foot, as my mother used to say.
> The magazine publishers are free to configure their magazines and the world around them in any way they wish. The newstand operator is responsible for what goes on inside his stand. If he's serving up junk, do we go blaming the rest of the world for the quality of his service?
There's a difference between selling junk and selling something that's obviously criminal. If you walked into a store where every magazine had "VIAGRA - 50% OFF. MAIL US YOUR MONEY". Do you honestly mean to tell me that the magazines in question were perfectly ok and it was the magazine vendor who did something wrong?
It's good that you realize that it's humans that are committing crimes, and not subhuman beings. But that doesn't excuse them nor should you.
Yes, Google has some level of responsibility here and they should be held accountable. But they're not the ones actually committing the crime.
what do you think?
if you search for 'nfl jerseys' you're probably looking to buy a jersey, and at least a few of those (eg football fanatics) do in fact look like legitimate stores.
Must be a localisation thing I guess?
Visit nfl.com, click "shop", then choose the "jerseys" tab, this is the page you are on. Seems perfectly relevant. The domain does not contain "jerseys" in it, and while the title does - it's the Jersey's category page for the nfl's shopping website, that makes sense. Hardly spam.
Visit www.clc.com, the collegiate licensing company, click retailers->collegiate retail outlets, Football Fanatics is one of 13 licensed collegiate retailers. Most major college universities sell their football merchandise through them. It's been around (run whois) since 1997, 14 years! Perhaps it's not ideal for NFL (non-college), but it's definitely Not Spam.
Unfortunately, below this some of the results do start getting ugly - there aren't too many online retailers that can legally sell NFL merchandise. Even Amazon is just a storefront for the NFL Shop (see http://www.amazon.com/NFL-Football-Fans/b?node=374273011). That might make it a good result, but it's essentially duplicated content given the NFL Shop result.
#1/#2) Pandora.net, totally not spam, this is the type in [amazon], get amazon.com kind of result.
#2.5) Below the second result I see a shopping results box which has only pandora jewelry from authorized retailers.
#3) http://www.pandoramoa.com/ - the Pandora Mall of America stores. Authorized pandora retailer. The domain has been around since 2007 (4 years)
Below this, the rest is getting ugly. Similar to [nfl jerseys], there aren't many online retailers legally able to sell pandora jewelry, so once Google has listed the only 3 good results available, what do want them to do? Try [jewelry] or [necklaces] - queries where there are lots of legit destinations and the top 10 results are all non-spammy.
#1/#2) ThomasSabo.com, just like [pandora jewelry], this is exactly what 99% of the people with this query want.
Same story for the non-existant good results below.
These 3 queries are a very specific type of query where there are only one or two relevant results, but there are lots of sites that "match" the query. I'm not saying the rankings after the first few relevant results are good, but what would you propose to show after those relevant results as an alternative?
Writing an article about a specific class of queries is fine, although the author doesn't really propose a better set of results. The implication made is that this issue applies to a broad set of queries which it doesn't seem to. Ironically, the author's signature line is a link to http://www.tomsgutscheine.de/, whose title translated to english appears to be: "Coupons, Coupon Codes & Coupons (January 2011) - Tom's Coupons".