This is the type of thing that we risk loosing as the internet matures and internet companies with vested interests gain more power. Setting this type of precedents will absolutely curtail innovation and freedom in the future. Think about it, would Google have been created in an environment that is overwhelmingly siloed and filled with red tape?
I see parallels to the net neutrality discussion in this.
Careful, this could legitimize things like accidental denial of service. Depending on circumstances, even basic scraping could cause problems.
(I need to be vague to avoid violating an NDA.) A major internet site had a URL that went something like somedomain/group?id=xxxxx. It turns out that a simple scraper, that called id=1, id=2, id=3, ect, ect, caused a major problem! This was because rendering these pages required significant resources; so most active pages were kept in RAM. Of course, the scraper tried to read everything.
Of course, no one thought the scraper was malicious in any way!
This is a failure on the part of the developers at that "major internet site". Using a guid instead of consecutive IDs, a rate limiter, hell even just a cache...or all of the above. There are lots of solutions here.
You have to take robot scraping and indexing into consideration, and assume people will ignore robots.txt. (Certain bots, i.e. msnbot/bingbot are quite aggressive!)
You are right, but few organizations are sophisticated.. or wealthy enough to employ all of that. I mean, a couple years ago there was a thing that Google's Docs could be enumerated.
And that's Google, they can obiously afford to get competent people working on that, yet they made a mistake (and who doesn't?).
Who owns LinkedIn again?
It’s not easy to craft a law that will punish bad behaviour without blocking innovation.
Of course, some sense is more than welcome, but if my scraper makes one request every 2 sec knocks down your server, it's your fault, not mine.
Otherwise you get situations like Uber paying out an enormous "bug bounty" totally-not-in-exchange for having their stolen data destroyed. If that person had simply pointed out that they had credentials published in a public repository, how much would they have been paid? Probably somewhere within an order of magnitude of the program's stated maximum payout.
You're still culpable if your actions break your neighbor's window, even if it was accidentally while you were opening it.
Yes, if my crappy software costs you money by knocking your site offline by accident, I should make you whole.
I think it has to be something substantially more impactful, clearly intentionally malicious, or in some other way much worse than aggressive timeouts before we start thinking criminal penalties.
After doing something that pisses a lot of people off, they start getting 1000X calls per day on the same number, almost all complaints.
This cases actual damages (no "normal" customers can get through) and is also clearly outside the scope of "normal" usage.
Do you think the same rules apply?
I must've responded while he was editing it, and I didn't catch the change.
I am not saying that the trouble with the law enforcement in the internet is a neither a good nor a bad thing. Actually, it depends and in the 'real' world I am pretty happy that the law enforcement works quite good where I live. I think the thin line is somewhere where I start to fear my own governments more than the bad guys (while not having any evil intentions or plans at all).
No. It's only been ahead of most laws for a while, as all frontiers are while they remain frontiers. But all frontiers eventually close, and laws catch up with them as they do so. That is what we're seeing now, and have been for a decade or more.
Are you saying that the writers of a bot that causes accidental issues with a site due to poor development standards on that site should spend years in prison with a federal felony conviction?
- the "delete" button in the admin area was implemented as an <a href=...> that simply did a GET request (violating the idempotent nature of GET requests).
Looking at the logs, it was pretty clear who the "hacker" was: Google. They'd come, follow all of the links, make their way into the admin site, and follow all of the delete content links.
I consider the work that the original developers did to be grossly negligent, and I certainly don't fault Google for anything.
cf. for example the law on trade secrets. If you take "reasonable steps" to safeguard the secret, and impose NDAs on the people you do grant access, then courts will punish competitors who steal them, even if your security happens to suck.
I have to "deal" with that problem every day. Misconfigured scrapers are dealt with by apache as are idiots who try to DoS the site (an intelligent attack still needs manual intervention, though).
Linkedin is not trying to prevent access. They want to prevent information from being scraped, and then used to their detriment.
This is an article about the LinkedIn v hiQ case at AdWeek.
curl --user-agent INSERT_ANYTHING_HERE http://www.adweek.com/digital/rami-essaid-distil-networks-guest-post-linkedin-hiq-labs/
How do they do it?
Pattern match against the User-Agent string.
Effective shibboleth.^W engineering.
Clarification: If a user, not a "bot", makes the "wrong" choice of user-agent string (e.g. in the browser settings), then they will be labeled a "bad bot", even if their behavior is no different than other users who are not labeled "bad bots". For example, they make one HTTP GET request just like any other user. There are databases of "acceptable" user-agent strings available to anyone. If still unsure about the point I am making, see this post from several days ago: https://www.sigbus.info/software-compatibility-and-our-own-u...
And if this means that bots are altered to become indistinguishable from users, and therefore have a minimal impact on a site's loading? Well, mission accomplished.
ETA:  Recent behavior (as opposed to all historical behavior) is used so that someone inheriting a "bad" IP isn't completely screwed over.
It's surprising that malicious bots aren't exploiting those things already.
Sure, obviously limited utility just like “the right to be forgotten”s flaw of diffing USA internet from EU internet to find specifically what people want forgotten, but shenanigans interest me.
You mean, bots that obey robots.txt?
https://www.linkedin.com/robots.txt very specifically prohibits scraping by any bot besides a small whitelist.
robots.txt compliance is not difficult to build. I'm fine with robots.txt violations being considered hacking.
Hm, I disagree. Either information is public, no matter for who. Or the information is private, and you should have ACL for accessing the information. I don't think it's fair to say that information is public if you're a human but private if you're a machine, or vice versa.
It's not about if it's difficult to build but rather the principle behind if you can just allow humans to read something.
And that bots don't lead to revenue depends on why the bot is navigating on your page no? If it's some indexer that links back to your website and it's a popular index, then you'll maybe end up with more revenue thanks to that bot than a normal user.
Your arguments about revenue are website-dependant and it's the website owner who is in the best position to decide whether robots are good for them or not (and plenty of sites don't ban bots in their robots.txt). In this case, the company that ran the bots is directly competing with Linkedin's products that sell aggregated data to employers and such, and linkedin clearly decided it's not going to lead to more revenue for them.
Does an ad-blocking browser counts as a bot or as a human? And what is something that concatenates all of your infinite scrolling to represent a paginated view? What is something that changes the structure of your page? What is something that concatenates different pages before displaying?
I would definitely want some intent provisions in, but saying something is accessible therefore free game seems too wide.
The problem with analogies is that many equally valid analogies that can be made, but with many different points. I would argue that the real life equivalent is "Have this free book, but you may not read Chapter 4."
If I put on my website terms of service "please don't try to go everywhere" , and then you do... seems like you did _something_.
I don't really get what sort of stuff is enforceable, though.
the ACL is the robots.txt. A door with or without a lock doesn't determine whether the place is public or not.
One can't post a sign in public that tells people not to look at other publicly visible signs and expect the government to arrest or fine them for ignoring it.
What if I user curl to pipe web content to my mail so that I can read it in a quirky way? What if I write a Chrome extension to crawl a site? Where does w3m stands?
This is not a question of the tool (UA) but of the intent (mass crawling, indexing, mass-replicating stuff). robots.txt is made as hints for crawlers and the like, not optimistically ACL whether something is public or not.
The robotstxt.org site states that a robot "should" obey the rules. "should" is not a legal term that implies compliance. "must" would have been more appropriate to indicate enforcement.
Archive.org also dislikes how robots.txt is being used mainly for search engines and goes against their mission in particular. Are they now hackers for not throwing away information just because someone was overzealous with robots.txt or retired a certain website and uses robots.txt as SEO to let another one take its place in Google search results?
If some big corp wants to cry and bring legal matters into software they should first be accountable themselves for not securing themselves and the data of their clients (see the LinkedIn hack people mentioned elsewhere here and in general the high profile hacks like Equifax, Sony, etc.). Or should software shape up to be like many other areas today are - multi-million corporations are free to play fast and loose and endanger people while small guys get fried over meaningless bullshit and vaguely defined "crimes".
 - https://en.wikipedia.org/wiki/Robots_exclusion_standard#Nons...
 - https://intoli.com/blog/analyzing-one-million-robots-txt-fil...
 - https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...
I am pretty sure none of the standard libraries/ tools that respect robots.txt would continue after being fed that file.
>throwing away information
This is entirely irrelevant. If they receive data from someone they have no obligation to discard it because of the current status of robots.txt. The question would be if they should continue to actively scrape that website.
It seems like they've done that for gov sites, but nobody particularly cares about enforcing gov robots.txt. It would've been interesting if the government sued them, although if they cared they probably would've just told them to stop.
And all this to protect some corp's business model of not letting others collect automatically the public information they provide, while they are free to use outdated or buggy software, store passwords in plaintext, etc. and get away with leaking data of millions of customers that should never be public.
And it'd fail to stop anyone except benign, private and low fund actors because instantly Indian (or other low wage country) services for "scraping by human thus not a bot ignoring robots.txt" would pop up, just like there are captcha solving services that employ humans already, and malicious bots wouldn't care anyway just like they make 0 effort to respect it now and run from servers in some country that isn't friendly towards USA so there is 0 potential for catching the perpetrators.
Requiring a human would increase costs and it doesn't seem like a good argument against anything.
The only people a robots.txt law would affect are private users who set up a Python script to scrape a single page for themselves to check for something, things like archive.org, researchers, automated website testers, etc. while anyone nefarious can just rent a shady VPN or use a server in Russia, China, Middle East, etc.
Requiring a human barely increases the cost if that data is so valuable in the first place and would be last resort anyway, far after just running the bots from a shady country, for captcha it's done because it's technically easier/cheaper (although supposedly automated solvers exist too).
But laws that punish outright gross negligence would help protect everyone who uses these American websites (and most of the world does) from data leaks of data that is arguably way more sensitive (emails, unhashed passwords, SS and CC numbers, real names even like in Ashley Madison case, etc.).
LinkedIn used sha1 with no salt as recently as 2012 (when they were hacked) for passwords and over 100 million such username + password combinations got stolen. Not only is sha1 not good enough for passwords but for many common and simple words (yes, yes, they are bad passwords, but people do use them) just googling can "crack" them due to lack of salt. The law should either go both ways or neither.
To suggest such heavy handed laws like considering robots.txt ignorance hacking while multi million corporations with millions of users get away with stuff like that (and I mean true negligence of most basic practices, not some obscure bug in the underlying software or something else that isn't absolutely obvious) over and over and over again that every random free my-first-login-page and my-first-SQL-injection-prevention tutorials advise against is absolutely ridiculous and anti-consumer.
I'm not. You can set up a server to serve different versions of robots.txt to different folks. A malicious actor could deliberately feed inputs to a specific crawler that convince it to violate the terms of the robots.txt it serves to everyone else, and then press for criminal charges against the operator of the scraper.
In a sufficiently adversarial relationship, this lets website owners turn any well-behaved site scraper into criminal activity. That's not a power we want to grant.
Okay. Start with something simple then - how would you define a "bot" and thus subject to your robots.txt rule?
Is my web-browser a bot? What about a proxy? What about a deaf persons screen reader?
If my web-browser pre-fetches links near my mouse pointer, is that a bot? What if it downloads the whole of an article split over, say, ten pages?
I think of robots.txt similar to posting a "No Trespassing" sign. For a private residence, it's almost not even required, yet for something like a shopping mall during opening hours, the default assumption is that anyone is allowed to be there without a specific invitation, until they are expressly asked to leave and not come back.
Trying to nail down the exact line is a tough issue.
Not only does it make it way too easy to prosecute software developers, it really devalues the term "hacking".
I do think that a company who has been victim of a SQL injection attack will have a better chance in court then, say, LinkedIn in this specific case. At least this theoretical company has made some small effort to protect their data, however inept.
Really?? That would mean private corporations, or private citizens, can write laws.
The information is available to the public, just not for certain classes. This is and should be legally unenforceable.
If something is truly meant to be private it should not be referenced from a public-facing page or it should have access control enabled.
"If you truly didn't want trespassers you should've put up a gate."
As a property owner, a no-trespassing sign won't protect you from the lawsuits that result when a toddler drowns in your pool. You're expected to do more (like putting up that gate).
Equifax's systems are peppered with "no-trespassing" motds at login. They also have a robots.txt file. We expected them to do more.
Same for leaving keys in your ignition, guns unlocked on your nightstand, etc. "Don't touch" signs won't absolve you of responsibility when either gets stolen and used in a spree killing.
So yes, as the owner of any sort of asset, in most contexts it is your responsibility to implement access controls to keep unauthorized traffic out.
Good analogy. I wonder i operating fa poorly secured website that leaks private information could be seen as an 'Attractive Nuisance'  and the owners could be prosecuted for that, rather than the hackers!
Even for those who think that robots.txt should be enforceable, allowing some bots but not others makes it difficult for a new player to have the same equitable access to information as the big players.
But I think what is being argued is that "if it's publicly available on a URL, it's available for any client to download and use." I think the latter argument holds more water, as since they are making it publicly available it is implicit authorization.
# Notice: The use of robots or other automated means to access LinkedIn without
# the express permission of LinkedIn is strictly prohibited.
# Profinder only for deepcrawl
The purpose of robots.txt is to guide bots away from circular links and such that would result in bogging down the site and causing undue amounts of nonsense traffic.
The purpose of robots.txt not access control.
EDIT: typo fix
I would almost be willing to concede making not following robots.txt a violation of CFAA if the trade-off was Mark Zuckerberg being brought up on several billion felony charges every year.
Actually I'm considering building "API-fication" of websites with bindings for major languages (Java, Python, JS). With luck websites could participate by providing & maintaining a parseable API-sitemap.
This would open door to my 2nd project: orchestration a-la BPEL on top of websites. visual editor, macros, scripting. Call this PIPES 2.0
- cleaning (big) data. Automatically reconcile data to canonical format / names using authoritative source (say wikipedia)
Can you understand even the simplest TOS? I'd argue most (all?) are too restrictive to be enforceable.
Those things should be crimes as the data they fetch is not publicly available on some web page but exists only on my personal device and they take it without my consent.
I'm not saying what Linkedin is trying to do is right but it seems to me there needs to be a way to say "Dude, that's not cool." A regular B&M store can refuse service to disruptive people and trespass people who don't comply, why not servers?
Pretty much what rayiner is saying, they posted while I was typing.
1) Blocking TCP connections
2) Returning a 4XX error, perhaps even "401 Authorization Required", "402 Payment Required", "403 Forbidden", or "429 Too Many Requests"
> A regular B&M store can refuse service to disruptive people and trespass people who don't comply, why not servers?
A Brick and Mortar store has to _tell_ you you're being banned. The mechanisms I listed above both tell you and lock the door whenever you attempt to access.
Edit: In this case, it's more like someone was looking in the store window from the public sidewalk and asked to stop. Can you really ask someone to stop looking at you from a public place?
Let's try a thought experiment: you're at a supermarket, and you're abusing coupons to the point where you're holding up the line for everyone. Someone complains to the manager, and the manager escorts you out of the store and tells you you're banned for life (as an aside, I wish this would happen to extreme couponers).
The supermarket also has automatic doors and a self-checkout. They're also pretty understaffed, so there's a good chance you won't run into anyone stocking the shelves as you're shopping. A few days after you've been banned, you waltz in through the automatic doors, grab some items off the nearest shelf, go through the self-checkout, and leave without a single employee getting a good look at your face. At the end of the day, the manager starts fast-forwarding though the day's security camera footage looking for anything odd and notices you've been in the store. They call the police and have you charged with trespassing.
Do they have a case, yes or no?
I say yes.
Walking into a store is a clear violation of private space. Is looking at their window?
So, if you had to have an account to view any linked in information, and you got a c&d and your account banned, and you sign up for a new account, I think it would be like entering a store you've been banned from. But we're talking about information available from a public space: on your window or without an account.
I also take issue with the CFAA being used here. I'm sure there are other laws more applicable to keeping someone from talking with you.
To recap: I don't think LinkedIn is wrong to ask them to stop, I just don't think they're using the appropriate means of forcing them to.
I think it's more like calling the store and asking them what their prices are 20 times a minute.
No one is accusing HiQ of performing a denial of service attack.
I'm down for that.
Also, a phone call consumes, as a percentage of available resources, vastly more than an HTTP request.
Disregarding that though, I think you'd need a court order telling someone not to talk to you, and you'd have to take action to prevent them as well, blocking their number and tell them to stop before that would be granted. If they persisted after being told explicitly and having their number blocked, then yes, I do think legal action would occure and be swift.
I would also assume, presumably, that "you" can be extended to be an automated phone system. (Which is still more limited in capacity than a server would be, but even disregarding that.)
FWIW, I'm not saying that "hiQ Labs" is blameless or acting in good faith. I'm saying that unimpeded access to publicly accessible information requires more than asking someone to stop and that the CFAA isn't the right tool for this.
I'm not an expert in this field, but I doubt the vast majority of anyone in this thread is. It also becomes interesting because I believe the CFAA has been used in similar situations before, but those were where the accessed knowlege could be assumed to be private, even if made public (client details at a phone company, or articles known to be behind a paywall) (and not that I agree with its usage there either, but the data accessed there could be assumed, by a reasonable person, to not be public).
So the key thing here is: if something is publicly available, can I ask you to stop looking at it, or do I need a more stringent court order to prevent you from viewing public information?
And in this case, I do think the capacity constraints disregarded above would come into play. I think the courts would look differently at someone calling your clerk 20 times a day vs looking at a menu you post on the window.
Like, for example, sending a C&D letter?
This whole hubbub is over them sending a C&D, they just made the mistake of trying to use the CFAA as a means to enforce it -- which, honestly, hiQ is fighting the good fight trying to stop.
Edit: If that's your point I agree with you. C&d followed by some more appropriate (than the cfaa) seems like a not-raise-everyones-backs approach.
This lawsuit is an attempt to stop competition by curbing access to data, not about ensuring reasonable use of apis and rate limits.
A different situation would be scraping a website to make business. Worst being directly using the data - for example those StackOverflow clones with original data doesn't sound ok to me. I am not sure what to think about bots doing various derived work like stats and analysis. I think that if they are part of a business, making money, it shouldn't be legal unless those request are permitted by robots.txt.
You, however, go the extra mile, here. How about you explain exactly how accessing published information on a public website is like building a network of cameras to monitor a city with?
Boom. Easy to have both opinions.
I would love to limit corporate databases, but not via letting website owners declare arbitrary use to be criminal.
LinkedIn creates an implied covenant with public consent (mostly) to then publish and make discoverable their professional profiles.
While LinkedIn 100% should have the right to stop others from embedding without permission since it's possible to claim the data structure and presentation is proprietary to them, this should never extend to the actual data itself, since this was willing gifted by the actual owners (Joe public) into public domain.
I think an argument could be made that LinkedIn is being burdened with a degree of data mining that affects their business and therefore should be able to charge a minimal fee e.g. an API firehose to acquire the data in bulk from providers in an raw data stream.
That seems reasonable depending on the charges associated with that offer, this would be the correct compromise, since their data structure is all that actually separates their service from say About.me or any other site of that type. All of which don't disallow scraping; as long as it doesn't present as a DOS attack (of course).
Anyway my comments are as a marketer and not a programmer or lawyer, but personally I'm very interested to see this case resolved in a manner that doesn't suit LinkedIn in the slightest.
Servers are no different. The Internet isn't an abstraction--it's just pieces of private property connected together (servers, routers, switches). When you make an HTTP request, you're accessing a piece of private property. The owner of that property has every right to decide not to let you do so.
> LinkedIn sent hiQ cease and desist letters warning that any future access of its website, even the public portions, were “without permission and without authorization” and thus violations of the CFAA.
The EFF's point about terms of service is a good one, but also irrelevant. Terms of service don't provide adequate notice that someone's implied license to access a website has been terminated. But here, hiQ had actual notice through "human" channels.
There are many ways to do this short of claiming that hiQ doesn't have permission or authorization, an argument strikes me as wholly without merit. If the data is publicly available on the internet then how is permission or authorization required?
In my opinion, the bottom line is that if LinkedIn doesn't want to serve data to this company, then they should immediately cease doing so using the many well established means available to them.
For LinkedIn to claim that following a URL and downloading the data is somehow "hacking their website" is entirely ludicrous. I understand they had a lawyer tell this company that they didn't want them to visit the URL, but I don't see how that somehow turns lawful web browsing into illegal hacking.
Also I would agree that absent a specific order to stop accessing publiclly available server resources, there is an explicit permission to do so. So I’m the case of Weev I think he did nothing wrong, AT&T were the ones in the wrong.
If someone was walked out of a supermarket and explicitly told that they were banned for life, and they tried to claim that the ban was lifted because the automatic doors opened for them, they'd be laughed out of court.
You could extend that further and say that the supermarket has a self-checkout. You may very well be able to walk through the automatic doors, grab something off the shelf, check it out yourself, and leave without anyone noticing you, but it's still trespassing if you've been banned from the store.
It becomes less clear where that delineation is not clear: a menu posted on a window or an automated phone system. These are both private things intended for at-large public consumption. My impression is that the EFF and hiq labs is taking the stance that it's a menu placed in the window, not being let in after being told you can't come in.
But to respond directly, the paper and tape had to be bought, printed, &c. Capital was expended to place the paper there. Sure there is not the ongoing cost of maintaining this paper in the window, and if that's where your argument lies, then you should be less condescending about it.
Moreover, we're not talking about the costs associated with access, we're talking about the permission granted to access. As such, ignoring the cost of serving an HTTP request is a valid comparison, because it is not at issue here. LinkedIn's argument is just as strong even if their only argument is they denied permission with no reason given.
Thanks for the ad hominem, by the way. Your childishness and inability to conduct a civil discussion has caused this discussion to end.
As a permission issue, the bot _may_ have been authorized and authenticated, however the company was sent a C&D letter that revoked all authorizations. That is why I say that logging in and accessing the resources did not constitute authorizations.
If a C&D letter would not have been sent, I think I'd agree with you.
> your server replying 200 OK should implicitly be considered permission to access that resource
I do see your point and how you could disagree with my statement above. However, if the store owner forgets you next time and says "Come on in! Oh and here is a take-home menu with all our items and prices" but then calls the police to have you removed, there is a problem.
Now imagine said store owner actually owns several locations possibly even with different public names and doesn't want to serve said customer. They could provide a list of all addresses of stores they run explicitly banning permission. Otherwise, that customer walking into store B would need to be told again they would not be served at time of entry.
Assuming the CFAA C&D from LinkedIn does have legal standing here... If hiQ were using IP addresses and not DNS resolution to crawl, how would they know a particular IP is a LinkedIn resource they aren't allowed to access? Did the C&D provide all addresses they are not permitted to access?
My point is that its not black and white, and certainly not clear that this should be covered by the CFAA under "hacking".
Edit: You could also make the argument and analogy to a restraining order which places the responsibility for compliance on the banned party. However those don't just happen because one entity sends a letter to another entity, it needs to be explicitly granted via the legal process.
The law applies to people, not computers. The only question is: did Linked In convey its revocation of hiQ's implied license in a way a reasonable person would understand? The computer code is only relevant if a reasonable person would take the HTTP status code to take precedence over the C&D letter.
If all requests sent by robots would clearly identify themselves, the server would easily block all of them. But if they fake their user agent to look like a browser and ignore robots.txt, that's not a good faith request and they shouldn't be able to plead ignorance.
If you scrape a site that prohibits it in robots.txt, that should be considered notice that they don't want that, for whatever relevant law. (I don't know if this argument would hold up in court, IANAL.)
LinkedIn wants to make their data available publicly, except under certain conditions. In my opinion, if they can't find a technical solution, they should stop making the data available publicly.
While I don't know about the EFF's overall argument, as an absolute statement I don't think you are correct here. In the USA at least, "Public Accommodations" (which your cafe example would be) are in fact subject to regulations that limit their ability to discriminate, require accommodations for the disabled, etc., and these apply regardless of whether it's public or private property. Something that is open to the general public is different in law then purely private property (private clubs and religious institutions are specifically excluded from federal law, but that's it). There are also going to be different expectations of privacy and default access levels.
Physical to digital analogies are often a poor match anyway, but in this case I'm not sure even if we accept one that it fully supports your point. Private property open to the public is not legally the same as purely private limited access property in terms of who it may exclude, when it may exclude, and why (as well as lots of other standards).
Neither the corner cafe nor Linked In can refuse to serve a request by someone because the person is black. But both the corner cafe and Linked In can refuse to service someone for any non-discriminatory reason, such as say because they're a Michigan fan.
Same with a cafe. You can request access and the cafe can turn you away or serve you. If they turn you away and you refuse to leave, then you are breaking the law (like hacking).
Basically if I request something from you and you give it to me, that's your problem, not mine.
Regardless, this is not analogous. If LinkedIn is making information public, then they cannot simultaneously say that this information is private for a specific use and expect the courts to intervene.
But that's not the point, the point is it's possible to give something for free and also refuse to give it to everyone under any circumstance.
They didn't give me a free cup of coffee and someone could reasonably mistake me for a homeless person based on my (lack of) fashion sense but that doesn't mean I could just reach over the counter and grab a cup because I saw them give one to somebody else when I walked through the door.
How can you say that hiQ isn't allowed to have this, but everyone else is allowed to take as much as they like? All that will happen is hiQ will create a string of shell companies that accesses LinkedIn as their proxies, and you will be wasting the court's time. Step zero is to establish that no one can have access unless authorized, and LinkedIn refuses to do this.
That's not even a rational argument, ask some hacker sitting in prison how well that one went over.
> How can you say that hiQ isn't allowed to have this, but everyone else is allowed to take as much as they like?
Umm, private property? Terms of service? Take your pick...
> All that will happen is hiQ will create a string of shell companies that accesses LinkedIn as their proxies, and you will be wasting the court's time. Step zero is to establish that no one can have access unless authorized, and LinkedIn refuses to do this.
hiQ isn't fighting the validity of giving access to some people while denying them access to the very same data they are fighting a misapplication of a totally unrelated law (because it's the right thing to do).
The data itself isn't LinkedIn's property (argued elsewhere), so they don't have control over it after it leaves their servers.
This is wandering... please decide whether you want to argue the article, the case, or hypothetical free coffee.
It can do exactly that. It can respond with an error code or start dropping packets entirely. As far as I'm aware, LinkedIn didn't do that.
Any access to LinkedIn's data requires that LinkedIn send it in a response. If LinkedIn is sending it in a response, LinkedIn can't claim that it's not authorized.
This is entirely different. LinkedIn wants to make the data available on the public internet... Except sometimes. They can't figure out a technical solution so they are pushing for a legal solution. If you'd like to try to further your coffee shop argument, this seems more like a coffee shop giving away free coffee with a notice letting customers know that there's a limit of three free coffees per person and then being shocked when some customers take four or five. Or all of them.
2. LinkedIn has every right to define what the use policy is for information it makes available publicly through its own product. In this case, the policy was violated, and the violator was notified through appropriate channels that they were in violation. They continued to access LinkedIn and violate the policy, which is illegal. The critical distinction is that what they were doing only became illegal when LinkedIn notified them that they were in violation of the policy, no longer welcome on the site, and they continued to do what they were doing anyway.
LinkedIn has every right to define their use policy through technical means. If they want to make it publicly available, then they understand part of that public is their competitors. In my opinion, website operators should not get any legal protections for things they can easily do themselves through readily available technical means.
I wholeheartedly disagree that LinkedIn has any right to define the use policy for data it makes publicly available. A wide variety of data is available to the public and you can't simply sue people who use that data in a way that you dislike. If you would like to keep that data private then do so.
So why don't they do that? If they're responding to a bot's HTTP requests with content, they are choosing to give the bot access.
They were formally told to leave the coffee shop and not return.
Hello, is there a cafe here?
Yes, here’s some coffee! Anyone who asks gets some!
Thanks, I acknowledge receipt!
I’ve changed my mind, I shouldn’t have been giving out coffee! What kind of a business is this? The only way my actions make sense is if you’re a thief. Theif! I will now try to ruin your life via the legal system.
Protect them from what, your unlocked front door? 
 "Hackers selling 117 million LinkedIn passwords" http://money.cnn.com/2016/05/19/technology/linkedin-hack/ind...
I'd also note that these companies are barely (if ever) held liable for life-compromising hacks on their platforms.
Nothing is removed or destroyed, and nothing was hidden or publicly unavailable.
IANAL but, uh, seems legit... ¯\_(ツ)_/¯
The bot says "GET /blah" and LinkedIn says "200 OK".
Not bot's fault.
The linked Snopes article that they use for this viewpoint is badly worded. Although the headline claim is 'It is illegal to take photographs of the Eiffel Tower at night without explicit permission', nowhere in the text does it describe the act of taking a photograph as being illegal. It is all about publishing your photos and sharing them with others.
I really wish articles would refrain from potentially untrue clickbait headlines, but oh well.
LinkedIn is a piece of crap in societal concept and implementation. Recently I was so frustrated by removing old connections I just simply deleted my account.
Warning: I am going to be crude at this point: linkedin is an HR circle jerk of pointlessness
I'm actually not sure.
If you don't have a legal right to be on a piece of property, in a given structure, or in a vehicle, you're trespassing.
If you used force to gain access to the property, vehicle or structure, it will often be considered breaking and entering. Typically, these laws use a very loose definition of "force" which includes opening an unlocked door.
If you leave your door ajar, it's just trespassing. If you had to open the door, it's probably B&E even if you didn't break anything to do it.
In that context, a wall of a house being at least four feet high, would carry an implicit "No Trespassing" sign on it, but the picket fence would not. However, if the property had an obvious path to an entryway, then walking up that path to the entryway was not trespass. So walking through a picket fence with a low-latch would not be trespass, unless the pickets were four feet high, or if the latch was locked.
If the door is unlocked and nobody says you can't enter, why can't you enter?
I think a better comparison would be comparing LinkedIn to a public property (such as a commercial store) and thus there is an implicit "access allowed until revoked".
I think that realistically, there are strong parallels to this being a customer/company dispute over who has access to the company's store. The door (HTTP protocol) has to be walked through for the customer to see the wares (LinkedIn profiles) and can be guarded by security (some form of authorization).
I think the question being asked is a valid one - should a company have the right to bar access to otherwise public information if the customer is not tampering with your system? If so, to what extent? If undesirable robots shouldn't be turned away what about DDOS traffic? What forms of flow control become legal in this case?
I'm honestly curious what the courts decide and how that may impact other websites that have tried to combat scraping, such as Craigslist.
So it's illegal to, for example, go door to door looking for one that somebody forgot to lock and then spend the night there.
Beyond that, it's only unlawful.
100% legal (castle doctrine) to shoot them, think about that for a minute, not generally legal to shoot someone engaging in a legal activity.
Also legal to shoot them through the door but probably not such a good plan...
An intruder must be making (or have made) an attempt to unlawfully or forcibly enter an occupied
residence, business, or vehicle.
The intruder must be acting unlawfully (the castle doctrine does not allow a right to use force
against officers of the law, acting in the course of their legal duties).
The occupant(s) of the home must reasonably believe the intruder intends to inflict serious bodily
harm or death upon an occupant of the home. Some states apply the Castle Doctrine if the occupant(s) of the home
reasonably believe the intruder intends to commit a lesser felony such as arson or burglary.
The occupant(s) of the home must not have provoked or instigated an intrusion; or, provoked/instigated
an intruder's threat or use of deadly force.
Here, unless you give them a reason they're just like "yeah, dude opened the wrong door, heh?"
You do know it is impossible to stop all cyber attacks? Its always a matter of when, not if. Zero day attacks are developed everyday with not even the best funded cyber security systems able to thwart them. The geniuses are on the offensive side, if they want in, they will get in.
Let's not legislate specific practices.
Imagine if we had security legislation from 1995 to follow when programming today. Imagine trying to explain to senators why last year's XSS protection rules need updating. Imagine Oracle lobbying to get their database enshrined as the "security-compliant" one.
The law should focus on outcomes: if a site gets hacked and people are harmed, the site should be penalized.
WRT some defences becoming outdated by time, well, it probably would not be two-decades behind, but a couple years or so at most. Even then, ensuring that is better then nothing.
People need tools to judge if they can safely use some product, and that's why standards exist. Otherwise companies are going to continue to screw us until they drop the balls.
Not necessarily. What if the law mandates use of, say, an encryption algorithm that has been cracked? You can't move to a new one without breaking the law.
This is a fallacious argument, specifically the Nirvana Fallacy. Perfection not being achievable in no way means that there can't be standard best practices that are a minimum requirement, nor that liability cannot still exist. Certain types of cyberattacks are in fact possible to stop perfectly merely by virtue of not holding onto information at all. As a trivial example, there should be no plaintext password leaks (or even easily brute force password leaks) at all, ever. Adaptive hashes/key stretching have been a thing since the dawn of security, Robert Morris described CRYPT for unix password usage in 1978. bcrypt is from 1999. There has been no reasonable basis at all for plain text or even raw fast hash primitives to be utilized, ever, yet they have been. In no other industry dealing with these kinds of privacy and safety concerns is that sort of practice considered acceptable, not should it be.
Holding personal private information at all long term should fundamentally be considered a liability situation, because it's not necessary, it's a commercial choice. Can't be hacked if it doesn't exist. If businesses choose to hold it, they should also be taking reasonable steps to protect it, and accept liability for failures. That's the natural balancing flip side to them getting profit from using it. If they're allowed to turn any costs of holding it into externalities that distorts the market.
For example, bcrypt has been around for how long now? And don't almost all the reports of hacks report that a database was lifted with usernames and passwords either in plaintext (for the love of all that is holy) or hashed with unsalted SHA1, or similar?
But at the same time there is a line. I would be for holding companies liable if, for instance, the data gets out there and you find it is entirely unencrypted and the passwords are MD5 hashed or plain text. There has to be a baseline.
Mistakes should not be punished as long as there is not also negligence.
If the company is only liable when negligent, it is incentivized to minimize the cost of security to the bare non-negligent minimum. This pushes all the costs onto the people whose data are compromised. These people are not in a position to spend small amounts of money to dramatically lower the expected costs of breaches, so they just end up paying huge costs that cannot be mitigated.
Personal liability is going too far, IMO.
> Mistakes should not be punished as long as there is not also negligence.
The problem with this is that you'd have to enshrine, in law, what "negligence" is. Technology changes too fast to put that into law.
"How many people got hurt and how badly?" is a question attorneys can reasonably address. "Was there sufficient input sanitization?" is not.
The problem isn't that someone is getting IN;
it's that the company throws up their hands and says "tough sht."
Or in a worse case, when Equifax puts up a compromised site to find if you were hacked that requires a significant amount of your SSN and personal details.
What exactly is your solution to the problem? You are more or less complaining without providing any insights into addressing the issue or without knowledge of the threat landscape.
Full disclosure: I work in security architecture/risk management in the financial services industry.
instead of investing in securing their customer data these companies pad their bottom line. so yes, they should be held accountable for failing to follow basic industry-standard data protection practices.
If the criteria is that it must be possible to stop all instances of an action to make it a legal issue, then we should just shut down all the prisons.
Edit: adding context.
I'm doing QA to validate information collected by my recruiting company, both acting within Linkedin's terms of service for a paid subscription, and violating their terms of server by improving my own company's process. Like the article said: Linkedin wants to participate in an open internet and also abuse CFAA.
However: That’s not what this article is about. That we don’t have a perfect solution for whatever weird corner cases (accidentally clicking on child porn?), should not change this very honest, serious and real issue the eff is addressing here. It is a distraction. We can hypothesise about edge cases until the cows come home, but to what end?
I get how a life of working in binary makes us immediately jump to the corner cases. It’s a curse on any legal discussion on HN. But it’s not relevant, and, imo, it dilutes the energy.
Edit : that came out harsh so I’d like to clarify: I get, 100%, where this “looking for the flaws” mentality comes from. It’s what makes a good programmer. A function that only follows the spec for 75% of its possible inputs is wrong. A law, not necessarily. We need to be careful not to keep our engineering hats on when switching to discussing law.
And, if push comes to shove, one doesn't really need a domain.