Hacker News new | past | comments | ask | show | jobs | submit login

Why do you put it on the open internet if you don't want machines to find and read it?

ToS is nice but you can't expect that it applies - the user (of the machine doing the scraping) might be a child which makes the potential contract automatically void, for example. Also, there are people under jurisdictions where such things have no power, or that don't recognize your rights to the data.

And the whole thing of putting data out publicly and then just expecting machines to see the pile of data and go "oh so where do I sign the ToS?" is weird...

Just put it behind a rate limited API key...




As an analogy, imagine that a gardener builds a beautiful flower garden, bisected by a cute stone path, which she invites the public to view freely, save for a single restriction; a sign reading "keep off the flower beds."

There is a well-understood social contract here. I should not drive my car along the path, even if don't crush the flowers. I shouldn't walk on the flower beds, even if that sign isn't legally enforceable. And if a runaway lawnmower, RC car, or some other machine of mine does end up in the garden, I am responsible, because it was my machine.

With websites, there is even a TOS specifically for scrapers - robots.txt. The fact that it is easy to bypass or ignore is no excuse for actually bypassing or ignoring it.

The anonymity of the Internet functions as a ring of Gyges, where since people don't face consequences (even social ones), they feel entitled to do as they will. However, just because you can do something does not mean you have a right to do something.


I think this analogy would be improved if the sign said "Please don't take any pictures." This is far more restrictive than a sign saying "Please don't take any seeds or cuttings." The latter is more understandable because such activity damages the flower garden (particularly if everyone starts taking seeds and cuttings).

Now let's say a photographer visits the flower garden, takes images, and sells them online as post cards? As long as the photographer is not hindering other people (flooding the site with repeat requests, in the analogy), it doesn't seem to be a problem.

On the other hand, let's say we don't have a flower garden, we have an art gallery or a street artist's display - or the pages of a recently published book. Now the issue is distributing copyrighted material without paying the creator... but what if there's a broad social consensus that copyright is out of control and should have been radically shortened decades ago?

The vast majority of data being scraped is not copyrightable creative work, however, so as long as you're not obnoxiously hammering a site, scraping seems perfectly ethical.


Robots.txt is definitely not any kind of ToS - some people (Google) said they will respect it. No reason to expect people even knowing about the concept - practically nobody knows about it, not even most developers.

And again - there are countries where any ToS without explicit signature or other kind of legal agreement don't apply at all.

Just like writing "by using the toilet you agree to transfer your soul for infinity" on a piece of toilet paper taped somewhere in the vicinity of a toilet gives you nothing - even if it was a more reasonable contract, nobody agreed to anything.

As for your other point, I think this is more like standing next to a highway with a sign that reads "don't drive cars here" and expecting people to stop and turn around. They didn't even see your sign at their speed and it's kinda unreasonable to expect they would be checking for that kind of a sign on a highway. At least make it properly - big, red, reflective (e.g. a Connection Reset, or at least 403 Forbidden).


Yes, there is no legal enforcement mechanism behind robots.txt. Nor do I particularly want there to be. However, most people agree that reasonable requests made regarding the use of someone's property should be followed. The capability to do something without consequences is not the same as the right to do something.

Our gardener should not need to build a brick wall around their public garden to keep your lawnmower out.


[flagged]


Is it? Just ask around. I have web app devs around me, they don't know it. Only those who actually specialize on web sites (for presentation) do.


I couldn't set up a web server to save my life and I know what robots.txt is.


Because you frequent this site where it's a very common topic. The devs around me often don't even speak English and don't care about Google that much.


What makes you think putting data on the Internet all the sudden means I unilaterally surrender the rights to my intellectual property?

If I choose to make my data available to some businesses to make discovery of it easier, and I choose to decline to allow others to unilaterally copy my data to develop a different business, that's my right. And it is unethical and unreasonable for any other person to assume otherwise that they are entitled to the same rights I granted someone else.

If I own some data, I get to the be arbitrator of the who/what/when/where on the use of the data. Period.


Sure, you can do whatever you like. Cut the connection if you don't like it. But I can do whatever I like too - read the data that your machine sent me, for example. If your machine sends my machine data it's IMHO reasonable to expect that you don't care about me having it unless we agreed otherwise. But in many countries ToS is not considered a legal contract at all - just having it on your site somewhere is not enough. Sometimes not even having users check the ToS checkmark would form a valid contract.

There are many kinds of data that can't be owned at all. Actually it's the other way around - there is a very small subset of data that can be owned. You can try to cover it under some kind of a non-disclosure clause in a contract, but again - a contract would have to exist.


[flagged]


What I'm saying is - your machine is fully capable of providing just the right amount of data to fulfill your purposes. If you don't like people taking it all, don't build a machine that gives it to them at 1 Gb/s. Stuff about some ToS or rights or IP ownership is just noise.


> What makes you think putting data on the Internet all the sudden means I unilaterally surrender the rights to my intellectual property?

Because intellectual property doesn't exist.


Scraping doesn’t imply IP violation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: