Hacker News new | past | comments | ask | show | jobs | submit login

As an analogy, imagine that a gardener builds a beautiful flower garden, bisected by a cute stone path, which she invites the public to view freely, save for a single restriction; a sign reading "keep off the flower beds."

There is a well-understood social contract here. I should not drive my car along the path, even if don't crush the flowers. I shouldn't walk on the flower beds, even if that sign isn't legally enforceable. And if a runaway lawnmower, RC car, or some other machine of mine does end up in the garden, I am responsible, because it was my machine.

With websites, there is even a TOS specifically for scrapers - robots.txt. The fact that it is easy to bypass or ignore is no excuse for actually bypassing or ignoring it.

The anonymity of the Internet functions as a ring of Gyges, where since people don't face consequences (even social ones), they feel entitled to do as they will. However, just because you can do something does not mean you have a right to do something.




I think this analogy would be improved if the sign said "Please don't take any pictures." This is far more restrictive than a sign saying "Please don't take any seeds or cuttings." The latter is more understandable because such activity damages the flower garden (particularly if everyone starts taking seeds and cuttings).

Now let's say a photographer visits the flower garden, takes images, and sells them online as post cards? As long as the photographer is not hindering other people (flooding the site with repeat requests, in the analogy), it doesn't seem to be a problem.

On the other hand, let's say we don't have a flower garden, we have an art gallery or a street artist's display - or the pages of a recently published book. Now the issue is distributing copyrighted material without paying the creator... but what if there's a broad social consensus that copyright is out of control and should have been radically shortened decades ago?

The vast majority of data being scraped is not copyrightable creative work, however, so as long as you're not obnoxiously hammering a site, scraping seems perfectly ethical.


Robots.txt is definitely not any kind of ToS - some people (Google) said they will respect it. No reason to expect people even knowing about the concept - practically nobody knows about it, not even most developers.

And again - there are countries where any ToS without explicit signature or other kind of legal agreement don't apply at all.

Just like writing "by using the toilet you agree to transfer your soul for infinity" on a piece of toilet paper taped somewhere in the vicinity of a toilet gives you nothing - even if it was a more reasonable contract, nobody agreed to anything.

As for your other point, I think this is more like standing next to a highway with a sign that reads "don't drive cars here" and expecting people to stop and turn around. They didn't even see your sign at their speed and it's kinda unreasonable to expect they would be checking for that kind of a sign on a highway. At least make it properly - big, red, reflective (e.g. a Connection Reset, or at least 403 Forbidden).


Yes, there is no legal enforcement mechanism behind robots.txt. Nor do I particularly want there to be. However, most people agree that reasonable requests made regarding the use of someone's property should be followed. The capability to do something without consequences is not the same as the right to do something.

Our gardener should not need to build a brick wall around their public garden to keep your lawnmower out.


[flagged]


Is it? Just ask around. I have web app devs around me, they don't know it. Only those who actually specialize on web sites (for presentation) do.


I couldn't set up a web server to save my life and I know what robots.txt is.


Because you frequent this site where it's a very common topic. The devs around me often don't even speak English and don't care about Google that much.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: