

How to prevent scraping? - metaprinter

I've gone and built an extensive website for nursing students (that took forever to populate with data) but I'm wary of launching it until I learn how to prevent or minimize automated scraping of the content.<p>I thought about showing a teaser and requiring login to see everything, but then I lose out on google juice, no?<p>It's an LAMP environment.  Any thoughts?
======
georgemcbay
Any time you spend thinking about this is a waste. You can't stop scraping on
the web, period. And any halfass attempt you make to try it is going to kill
you on SEO as you already suspect.

------
jnbiche
Your site will likely not be scraped unless/until it takes off. And once that
happens, you'll have your foothold and no me-too site is going to surpass you
unless they add more/better content. I wouldn't worry about it at this stage.

~~~
metaprinter
Thanks for your insight.

------
stray
You can't prevent scraping, but you can poison it. I can think of two
approaches:

1\. Replace bits of text on output with unicode look-alikes. Humans will still
read what you want them to read, but non-humans get crap.

2\. The mountweasel approach: put fake entries in that humans would never
find. Then you can google these fake entries - any site other than your own
with Mt. Weasel, is the result of scraping your site.

But honestly, most of our efforts to protect "our" work is just misguided
busy-work...

~~~
metaprinter
Thanks for your insights.

