
"Sufficiently advanced spam is indistinguishable from content" - moultano
http://lesswrong.com/lw/28r/is_google_paperclipping_the_web_the_perils_of/?
======
adriand
It's interesting that PageRank's measure of quality is entirely dependent on
there being a community that recognizes the quality of the content first,
before the search engine. Without a community, you're not going to get
incoming links.

In other words, work produced by lonely geniuses is quite likely to go
unnoticed.

For all we know, the content that is being produced by companies like Demand
Media has already been produced by thoughtful people, writing at length about
subjects they love on obscure websites that no one ever links to. What a shame
that would be!

~~~
moultano
I've actually seen that happen to one of the lead engineers in search quality
at Google. He'd written a great guide to ultralight backpacking that until I
linked to it, wasn't indexed by any major search engines.

<http://eric-and-april.com/Ultralight/index.html>

~~~
ovi256
Is this a meta-post ? Did you just link to it for the first time ?

------
moultano
I work on search-quality at Google. This is my life.

~~~
alexandros
Hi, I am the author of that. Would you say the depiction in the article is
more-or-less accurate? I am asking as I wrote this purely from an
outside/theoretical perspective.

~~~
moultano
It's very true for some things, and not for others. There's information
asymmetry in both directions, things site owners know that Google doesn't, and
things that Google knows that site owners don't.

The summary of the history seems pretty accurate to my perception of it, but I
don't think it's hopeless from here. :)

------
RyanMcGreal
Fascinating essay, but I'm not quite sure whether it's a problem that
sufficiently advanced spam is indistinguishable from content.

After all, Demand Media does produce real, editorially vetted content from
real human writers. The payment system encourages what I'll call extreme
efficiency of research and writing, but that simply optimizes it for the
_handy-reference_ domain of search results (e.g. How to fillet a smallmouth
bass), which may not be "high quality" as such but does provide direct,
clearly written and reasonably valid responses to the search queries that
elicit them.

~~~
moultano
I've seen a lot of pages where I couldn't tell if it was written by a markov-
model or a human. Many of the people who get paid for $1 content don't speak
English natively.

~~~
klenwell
On that topic, I present Emily Chimpinson, a little project I've been playing
with:

<http://twitter.com/chimpinson>

She's a Markov-based script inspired by the public domain works of a certain
poet. All of her incantations are checked to make sure they don't repeat
verbatim her model.

Every once in a while she comes up with something inspired.

------
pook
Hah, I ranted a bit about this evolutionary arms race years ago, from a
different angle.

[http://hamstermotor.motime.com/post/683104/the-future-of-
spa...](http://hamstermotor.motime.com/post/683104/the-future-of-spam-an-
information-theoretic-argument-%252Arestored%252A)

~~~
alexandros
That was very interesting. I added a pointer to your article to the OP.

------
randfish
Moultano - I have a strange request, but one I hope you'll take seriously.

I think this issue is very important - to Google, to web searchers, to
businesses seeking to be found by Google and even to less scrupulous web
operators. I'd love the opportunity to engage in 20-30 minute written chat
with you and publish it (anywhere on the web you'd like).

As background, I've worked for years as an SEO consultant, founded a community
and company in the space (SEOmoz.org), and have been spending the last few
years developing and launching search marketing software.

I certainly respect your background and beliefs, but I think there's some
flawed logic in your assumptions and arguments that I'd love to dig into, talk
about and maybe even have some of my own perceptions changed. I would not ask
you to disclose anything that's confidential - I'm much more interested in the
theory and logic behind web spam, SEO and search relevancy.

You can reach me via email - rand@seomoz.org. Would love to hear from you!

~~~
moultano
Sorry, I'm not equipped for that sort of public discussion. Talk to Matt. ;)

~~~
randfish
It could be anonymous on your end?...

I like Matt a lot, too, but his opinions (at least, those he publicly offers)
are well known and well publicized. It would be great to hear other voices.

If not, how about if/when you ever leave the team. Happy to be patient :-)

------
Tichy
Haven't read it all, but I am just wondering: by now data dumps of people's
connections are probably making the rounds in the dark channels? I think
sending spam that appears to be from your friends could be a big
"improvement", and should be child's play with the data that is already freely
available.

Maybe that could become one of the first privacy disasters, when people
realize they made their email unusable by publishing their connections.

------
Gormo
If we presume that any algorithmic, procedural, or structural system built by
one party can be reverse-engineered and understood by another party, the
concept of Optimization by Proxy, and the more general Goodhart's law, form a
pretty compelling argument against designing optimized systems as solutions to
problems in general.

Maybe in some cases keeping a system convoluted and inconsistent can actually
help ensure stability and durability?

------
samg
Just ask Calacanis!

------
BoppreH
_Internal Server Error_

And sufficiently advanced errors are indistinguishable from pages made for
pure irony.

~~~
moultano
Google cache:
[http://webcache.googleusercontent.com/search?sourceid=chrome...](http://webcache.googleusercontent.com/search?sourceid=chrome&ie=UTF-8&q=cache:http://lesswrong.com/lw/28r/is_google_paperclipping_the_web_the_perils_of/)

~~~
BoppreH
Thanks, but it's back again.

------
diN0bot
absolutely....sometimes i mark as "spam" conversations that i'm personally not
interested in, even if the author is "legitimately" spamming me. (eg a mis-
guided friend's mass email...or more likely the dozens of mis-guided reply-
all's)

~~~
alextp
I think this is a valid use case of spam filters. I have trained more than one
to detect my father's powerpoint emails and bad chain-mail jokes and separate
them from his personal messages that I actually want to read.

