

Do Not Crawl in the DUST: Different URLs with Similar Text - quilby
http://www2007.org/papers/paper194.pdf

======
quilby
I just found this paper in my school's library. Its not new and it looks like
google + yahoo + msn have maybe given up on trying to find 'DUST' because they
now let you do that

(
[http://googlewebmastercentral.blogspot.com/2009/02/specify-y...](http://googlewebmastercentral.blogspot.com/2009/02/specify-
your-canonical.html) , [http://ysearchblog.com/2009/02/12/fighting-
duplication-addin...](http://ysearchblog.com/2009/02/12/fighting-duplication-
adding-more-arrows-to-your-quiver/) ,
[http://blogs.msdn.com/webmaster/archive/2009/02/12/partnerin...](http://blogs.msdn.com/webmaster/archive/2009/02/12/partnering-
to-help-solve-duplicate-content-issues.aspx) )

Its still interesting that

1\. Many sites have a lot of 'DUST'

2\. It is not very hard to find the 'DUST'- which obviously reduces crawling
time.

