
Anatomy of a bad search result - blasdel
http://cdixon.org/2009/12/19/anatomy-of-a-bad-search-result/
======
kmod
It's a systematic problem: often one wants to reward some good behavior X, but
X is very hard to measure so one finds some measure Y that is highly
correlated to X, and then measure Y. The problem is that once people know that
it is Y that is being rewarded, they seek to do Y as cheaply as possible. And
if "seeking to do Y as easily as possible" does not involve doing X, then the
system breaks.

So in this example, X is producing high-quality content, and Y is getting a
lot of links to you. Other examples include X,Y = (knowing the material,
getting correct answers on the test) and (needing money, spending money).

This has a well-known analog in economic circles known as the "Lucas Critique"
<http://en.wikipedia.org/wiki/Lucas_critique>

~~~
sili
Consequently, measuring Y is a good proxy for measuring not only when high
levels of X means high levels of Y, but also when high levels of Y means high
levels of X. Next great search engine should take that into account.

------
JeffJenkins
The fake blog is probably from a Markov generator. I've always liked that as a
blackhat SEO strategy. However, it seems like it isn't that much more
expensive to generate real content -- like the recent post about eHow and
bingo cards -- so there isn't a point in taking that risk

~~~
kwamenum86
I thought Markov generators synthesize text? The text from this blog appears
to be stolen from other sources.

~~~
JeffJenkins
This is the key phrase from the article:

"It looks like the “blog posts” are fragments from places like Wikipedia run
through some obfuscator"

A Markov generator will create text based on the text it's trained on, and
because of the way it works you'll end up with little snippets from the text
sources. Here's an online one you can play with:

<http://www.beetleinabox.com/mkv_input.html>

~~~
almost
That may be unlikely to be frame-ups hoping to be NYTimesCo's subsidiary. It
deserves more expensive to a blackhat SEO strategy. If having spammers link
farms pointing at them. This is why Google would be careful when making these
situations

~~~
pbhjpbhj
The fake blog is probably from a Markov generator. I've always liked that much
more expensive to generate real content like it isn't that as a blackhat SEO
strategy. However, it seems like the recent post about eHow and bingo cards
like it seems like it seems like the recent post about eHow and bingo cards so
there isn't that risk The fake blog is probably from a Markov generator. I've
always liked that risk The fake blog is probably from a blackhat SEO strategy.
However, it isn't that as a Markov generator.

------
gojomo
NYTimes pollutes web with garbage text to boost search rankings of its
ConsumerSearch subsidiary?

That may be the real story here. If in fact the craptastic linkspam copy was
laundered through some affiliate program, giving NYTimesCo plausible
deniability, they still bear responsibility.

~~~
ryanwaggoner
You have to be careful when making these kinds of leaps, because anyone can
link to a website. If having spammers link to you is an indication of guilt,
it would be trivial to destroy your competitors by setting up link farms
pointing at them. This is why Google would be unlikely to penalize
consumersearch in this scenario, even if they detected the spam. They'd likely
just not take that link into account.

Not saying that this happened here, just that it's not a story yet. More
information is needed.

~~~
gojomo
Absolutely agreed: these situations could also be frame-ups hoping to trigger
Google sanctions. But then the suspects are those sites ranked below
ConsumerSearch for these terms.

Someone's up to no good, and it _might_ be NYTimesCo's subsidiary. It deserves
more research.

