LinkedIn, of course, wants to get all the benefit of the public Internet with providing as little as they can. This, coming from someone who used to work at LinkedIn.
These companies have built their fortunes on the public Internet and now that they are successful they seek to not pay homage to the platform that give them their success. It's very clearly anti-competitive, and bad for users. LinkedIn should be forced to compete based upon the veracity and differentiation of their service, not because they have their users' public data held hostage from competitors.
In a world where there are no (persistent) copies made by third-parties, the user still is in control of the visibility of their data by updating their profile directly on LinkedIn to show/hide pieces as they see fit. With a 3rd-party in the picture, updates to user-data may or may not be respected by the 3rd party, leading to poor user-experience.
Quoting the article, "HiQ Labs uses the LinkedIn data to build algorithms capable of predicting employee behaviors, such as when they might quit."
Based on that one statement alone, as an employee, I would be uncomfortable with the use of my data to supply my employer with my future plans before I choose to disclose it myself. That choice is mine, and mine alone; not something to be monetized simply because the option exists. And while I have no control over the sharing of data, should something like this happen to me, I'd be more inclined to stop using LinkedIn, which in-turn affects LinkedIn's ability to do business.
You can't expect someone to forget you once had a bad haircut just because you now got a really cool one.
But don't be surprised when you find out that your expectations and their expectations were different, and you're the one they blame, and they outnumber you by a lot.
When did we ever teach people that you can control where your data ends up on the internet?
Aren't we trending towards teaching people to not even share data on "closed" services like Facebook and Gmail precisely because they are a single source for a lot of data to be misused by the company, or hacked by a malicious actor?
Regarding data that is accessible without "friending" someone or logging in as the user themselves (e.g. Gmail), I hope people already realize that this data can easily be re-used.
> But don't be surprised when you find out that your expectations and their expectations were different, and you're the one they blame, and they outnumber you by a lot.
If the majority of people think that they can share nude photos of themselves on their own blog or twitter and that this won't be re-used elsewhere, well... I must be living in a different reality, or I misunderstand your point.
Instead of having information about you be owned by which ever corporation collects it, have it at all times be owned by you.
While there are some clear problems with this approach, something needs to be done about companies building databases of ruin where every moment everyone lives from the day they are born until the day they die are cataloged into database for review either by society in general or by algorithms looking to make predictions that impact an individuals future.
I should have some level of control over my Personal information, today even if you actively suppress the 1st party info you put out in the world the number of 3rd parties adding to your profile dwarfs any data you personally put out there, from credit reports, to government databases to credit card companies to soon your web browsing history sold by the ISP's
It is ,IMO, out of control
I don't think you've granted LinkedIn the right to sue to enforce your claims against third parties, so you'd have to sue directly.
So is your public profile copyrighted by you?
The actual server is controlled by you or gives you a way to take the data down. But by then it could have been republished elsewhere!
What I cannot understand is how such a system could reasonably be enforced. Let's say John Doe posts his resume on a job board. If I print out his resume, but he later updates it, am I now somehow in the wrong for retaining the old copy?
I am also a little puzzled by the notion that "persistence" is a new phenomenon. Of course there have been paper records and such for quite some time, but I'll put that aside for a moment. When I was younger, I was often cautioned to think carefully before acting, as a reputation decades in the making could be permanently ruined in just minutes. It seems to me that when it comes to mistakes and "bad" deeds, society's collective memory has always been rock solid.
Rather the persistence, I think the new factor is that things are less regional than before. You can't just pick up and move to a new town, because they basically have the same Internet everywhere.
That would be a 26 billion dollar question, and one I would very much love to solve one day! :)
I believe that your example is fairly simplistic to capture the crux of the issue here. The metadata associated with posts often contain features that are quite revealing, but not necessarily the kind of data that the source would wish to be revealed. E.g. I have noticed multiple times that the number of cold emails I have received from recruiters is higher immediately after I update my LinkedIn profile, leading me to believe that the last-update timestamp is a feature that the LinkedIn search engine may be relying on to rank results. While my evidence is purely empirical, it isn't a stretch to imagine that it would be a reasonable thing to do, given that most users normally update their profiles when they are about to start searching for new opportunities.
On the one hand, this is a an excellent product that allows recruiters to reach targets that exhibit behavior associated with active job-seekers, resulting in better connections. It results in a win-win situation where the recruiter gets a pretty good return on his/her investment, and the target receives the attention/information they were looking for with their update. False positives in this situation result in a few unsolicited emails/unwanted attention.
On the flip side, this information can be repackaged to present a manager with a graph plotting the probability of an employee quitting. While this is a perfectly good product, no employee would every use a service that might reveal their future plans ahead of a time of their own choosing. Furthermore, false positives here can have a significant impact and LinkedIn may not remain in business for long if word of this product gets out.
Given that a larger amount of information can now be stored easily, the economics of what is stored is different, and one would expect this would lead to the storage of more information over time. If you can store in a single HD what would have previously taken a large room, for a tiny fraction of the cost, the bar for storing the information vs. discarding it is much much lower.
I also think it's not only storage, but searching of large datasets that's another reason that information changing over time. Again, if finding information in a large blob of data is easier and as discussed above, is much cheaper, then this will also lead to people storing information for later retrieval. As you say, the fact that all this storage is not networked together means that this information is now easily retrievable from anywhere, only intensifying the impact.
What choice? The only choice you have is whether to post public info or not. You have no choice over what others do with it. You can't police what others do with information that you freely publicize.
Secondly, while the updated profile is public knowledge, the temporal characteristics of the update isn't a feature that is directly published by LinkedIn. Call it a product feature; it is tailored to present the qualifications of a user, not to advertise the fact that they may be looking for a job. While one can argue that update is public knowledge, and must therefore be available for data-mining purposes, there is a subtle, but potentially dangerous leak of information here that is open to interpretation.
LinkedIn's position, that this leak may be potentially harmful to its users, and by extension, its core business, is therefore a fair point.
They collected the data, host it, etc, and incur costs for doing so. Just because they allow the public to access it, doesn't mean the public should have a right to re-use it.
People argue that the data is public. I say that's not the issue. While the data itself might be available elsewhere, it is raiding the _collection_ of it that is being argued, not that 'public' data is 'private'.
The _value_ that LinkedIn adds is that they've built the structure to collect and maintain the data. They are _not_ asking the court to prohibit anyone from collecting the same data on their own, at their own expense. If someone wants to start a rival LinkedIn, they are free to do so.
1. ban the use of ad blockers when accessing the data
2. ban users making an offline copy to view later
3. ban users from disabling auto play or other features
4. otherwise control what you do with data once you get it, which is *huge*.
E.g. what if they want a 1% share of any revenue you get by using the data, etc.
Of course now, they have a technological option to try to force each of the above, but users also have a technological option to try to outsmart them. But I wouldn't want to give them a legal right to force the above.
I would LOVE it if the courts would remove the legal protections of DRM. It seems so strange that this court has gone so far in the viewer-rights direction, but hasn't bothered taking the baby steps to remove the legal protection of DRM.
Hmm, now I'm hoping LinkedIn implements some DRM so this fight can get truly interesting and maybe make some positive difference.
That is exactly what public means. Do not make it public if you don't want 'the public' to use it.
If I post an essay on LinkedIn, and then someone posts it on their blog, copyright has been breached because that is my original work.
If I post the fact that I worked at Dunkin Donuts from 2007 to 2009 on LinkedIn, and then someone records it and feeds it to an employee quitting predictor algorithm, they've done nothing illegal. Me stating the fact that I was employed at a certain place for a certain amount of time is not me publishing an original work.
Public means everybody can do whatever they want with it, no exceptions (except, as with all things, by law). If you want to restrict the information, then do it, but don't make it public and then when a competitor uses it claim it wasn't public 'for them'.
But the EULA says that simply by accessing the site, you agree to its terms.
To view the EULA, you must view the site.
That's just one of many problems wrong with assuming the EULA is binding.
The ruling in this case, has said it isn't, instead because some people can index and scrape (search engines), but others (startups for example) can't. Which is an anti-trust issue.
No it didn't. This injunction has temporarily prevented LinkedIn from blocking HiQ, and only HiQ, while the case is argued. The court might rule that LinkedIn can't block anyone, or they might rule that HiQ is not entitled to scrape LinkedIn's data.
> Which is an anti-trust issue.
HiQ claimed it's anti-trust using inflammatory language in their PR statement. I disagree with that assessment. LinkedIn is not preventing HiQ from collecting their own copy of the data, in any way, shape or form. HiQ is claiming they should be able to take LinkedIn's copy because the data is "public" data. Even if that's true, HiQ always has the option to get the data from the same source that LinkedIn did.
> To view the EULA, you must view the site. That's just one of the many problems wrong with assuming the EULA is binding.
Absolutely right, EULAs have all kinds of issues. In practice, the issue of having to access the site to view the license isn't a problem. You can choose after reading the EULA to not agree, and you can choose to not access any other data on LinkedIn.
But there is no reason to assume the EULA is not binding because there are no other legal documents that cover your interaction with LinkedIn, aside from any state and federal laws that might override parts of the EULA.
This is mostly irrelevant to the point I was making though, it doesn't matter if the EULA is binding. It's purpose there is to establish that LinkedIn is not providing a public service. It's communicating that there is no expectation of responsibility on the part of LinkedIn, and that doesn't really depend on whether you are specifically bound by the EULA.
It's just like a sign in a store window that says "we reserve the right to refuse service to anyone, at any time, for any reason." You can could say that the sign is not a binding contract, and go into the store naked and yelling and start breaking stuff. When they kick you out, nobody will come to the defense of your right to walk into a store that everyone else is allowed to walk into.
> No it didn't.
True. I should have said, "the ruling in this case, has said it isn't clear if the agreement should be binding".
> HiQ is claiming they should be able to take LinkedIn's copy because the data is "public" data.
Nobody is taking anybody's data. LinkedIn are providing copies of the data to anybody who views the page. You can't take something from somebody else in this context. It is not possible. Copying, and ineffective deleting are the only methods available for transfer.
> In practice, the issue of having to access the site to view the license isn't a problem. You can choose after reading the EULA to not agree, and you can choose to not access any other data on LinkedIn.
It is absolutely a problem. You don't view data. You download a copy.
You are not presented with the agreement upon visiting a public page, you first download the public page, which then links to the agreement.
Thus, when the agreement becomes enforced, you already have in your possession data from before you agreed, which is then governed by rules you were not aware of, and may not become aware of as the agreement doesn't require intervention.
If we have to come up with physical analogies for a problem that is inherently digital:
You walk into a store. The store hands you a CD, that they made just for you, saying its yours.
You then say thankyou, and only then does the store say that there are conditions attached. But you can't give the CD back. You can only agree that you will destroy it at an indeterminate time in the future. And your method of destruction is almost guaranteed to be reversible, but its all you have.
Oh, and you might not have chosen to even walk into the store. You were stumbling around other stores, and a door led you here.
In common law, once possession is established, new conditions on the possessed item are next to impossible to apply, unless the method of possession was itself a crime.
> But there is no reason to assume the EULA is not binding because there are no other legal documents that cover your interaction with LinkedIn, aside from any state and federal laws that might override parts of the EULA.
A EULA, as its name suggests, is a license agreement. So far as I'm aware, most nations capable of accessing LinkedIn have a definition of a license agreement. Insofar as I'm aware, they all require a license agreement to at least be:
"A valid agreement between two parties, where both parties have read, understood and accepted responsibilities (or had ample opportunity to do so), pertaining to the use of the licensed item."
Prior knowledge is a requirement. You can't agree to something you haven't had the opportunity to comprehend.
But LinkedIn happily gives you a copy of their data before you are able to access the agreement. (Such as if your first visit was to a public profile page).
There are many laws that may invalidate the EULA.
> It's purpose there is to establish that LinkedIn is not providing a public service.
Its purpose is irrelevant if it is not binding.
A store can put a sign up, saying that only customers who buy a product before leaving may enter. But if someone does, the store cannot force the individual to make a purchase, because their policy was in conflict with other systems of rights.
If something is non-binding, and therefore invalid, it cannot be applied as... It has no validity.
If you have a driver's license, but it became invalid for some reason, you would not be permitted to continue driving, until such time as it became valid.
If the ownership of your house became questionable, you would be squatting.
The non-binding status of any agreement that becomes invalid, regardless of intention, is a problem in law, but it isn't a solved one.
If a license is invalid, you are not bound by it.
Caveat: I'm no longer a registered lawyer, as of two years ago. I may not be up-to-date on some things, and my main knowledge was in cross-border and Australian crime, specifically in the realm of IT.
The injunction didn't say that either. The only thing it said is that LinkedIn can't block HiQ for the time being. This is common in lawsuits that both parties be prevented from action until a decision is actually made. The decision has not been made yet.
> Nobody is taking anybody's data.
I think I used a poor verb, or you misunderstood me. I meant that HiQ wants to copy LinkedIn's data for their own business. In some sense that can be viewed as theft, and that is the way LinkedIn sees it. Under that view, the verb "take" is appropriate, but it doesn't mean that the original copy is transferred or destroyed, it just means that HiQ is now in possession of a copy.
> There are many laws that may invalidate the EULA.
True, and I don't claim otherwise. "No court has ruled on the validity of EULAs generally". https://en.wikipedia.org/wiki/End-user_license_agreement#Enf...
> Its purpose is irrelevant if it is not binding.
It's (a EULA's) main purpose is for communicating expectations, which I'm arguing is relevant even if it's not binding. If the EULA says "we can refuse service to you", and then service is refused, then it's not a surprise.
In a legal sense, this could (but is in no way guaranteed to) reduce liability. What I'm suggesting is that even if the contract is not binding or valid, if you break the rules and get banned from a site, the EULA may still provide a defense in court from the site being sued by the person to whom service was refused. The site can say "we posted the rules, this person broke the rules" and the person may not have any legal support in favor of getting the service after they broke the site's arbitrary rules.
Also not a surprise when a judge orders you to restore access because your agreement is invalid.
> The site can say "we posted the rules, this person broke the rules" and the person may not have any legal support in favor of getting the service after they broke the site's arbitrary rules.
Absolutely. Sites are largely free to enforce rules arbitrarily, by modifying their HTTP responses.
However, you are not free to exclude individuals whilst including their competitors.
Google has been under the hammer for that recently, though that is the EUs anti-trust laws. . Of particular interest to this case, you might find this quote telling:
> we believe that Google's behaviour denies consumers a wider choice of mobile apps and services and stands in the way of innovation by other players, in breach of EU antitrust rules.
LinkedIn are accused of standing in the way of innovation by other players, in this case, hiQ, whilst simultaneously allowing other players to innovate, such as Google. One can copy the data, the other can't.
> It's (a EULA's) main purpose is for communicating expectations, which I'm arguing is relevant even if it's not binding.
I can expect the rain to move upwards, but that's irrelevant to how gravity actually acts. Unrealistic or false expectations are not taken into account with the rule of law.
A police officer might let you off with a warning for speeding, if you hadn't noticed the speed change. However, if it went to court, your false expectation of a different speed is not a mitigating factor.
If LinkedIn was wrong to prevent access in this case, their liability will not be reduced, if precedent is followed. They will still be responsible for the actions they took, in full, as Intel , Microsoft , Google and Apple  before them have been.
If however, LinkedIn are seen by the court as acting correctly, hiQ may be asked to pay legal costs, or counter-sued for damages.
If the EULA is non-binding, then it may as well not exist, because it has no legal relevancy.
That's not what happened here, there has been no ruling on any agreement, and the injunction order that was given only applies to HiQ, only temporarily, and nobody else. It is not a statement on the validity of EULAs or of LinkedIn's EULA, and it is not a statement on whether LinkedIn is being anti-competetive. It is an injunction and nothing else.
I didn't say it was.
> It is not a statement on the validity of EULAs or of LinkedIn's EULA, and it is not a statement on whether LinkedIn is being anti-competetive. It is an injunction and nothing else.
An injunction is not given without merit. It has meaning.
Injunctions are regularly denied when the arguments are clearly in one direction or another.
The injunction strongly suggests that the judge finds hiQ's argument, that LinkedIn's public pages are not bound by the EULA, to "not be without merit".
No precedent has been set, but the conversation is definitively in the opening stages.
If the startup were republishing private information from Linked In I would agree with you.
I beg to differ. Their EULA covers "accessing or using" their site in any way shape or form, and defines the term "visitor" for what you're calling "public".
You agree that by clicking “Join Now”, “Join LinkedIn”, “Sign Up” or similar, registering, accessing or using our services (described below), you are agreeing to enter into a legally binding contract with LinkedIn (even if you are using our Services on behalf of a company). If you do not agree to this contract (“Contract” or “User Agreement”), do not click “Join Now” (or similar) and do not access or otherwise use any of our Services.
When you register and join the LinkedIn Service, you become a Member. If you have chosen not to register for our Services, you may access certain features as a visitor.
If the information is restricted, then restrict it. Do not make it publicly available then claim a webpage as the ruling contract of that information when it is used in a manner you do not agree with.
They're not restricting access to the information. HiQ is scraping their site using bots, and LinkedIn doesn't like it. This isn't a debate about anything being publicly available or not, this is a business fight between two private companies.
By reading (or not) this comment, you (“the reader") concede all the points of this discussion. The reader also agrees that the arguments presented by HackerNews user “redial” ("me", "we", "us") are correct even in case of conflict with his or her own previously stated positions, and that he/she will amend all of his/her previous comments to reflect this legally binding agreement.
The real point I was making is that LinkedIn is establishing that they are not offering a public service. It doesn't matter whether you can be bound by their contract, the EULA is more about covering their own asses when they do things like refuse service to HiQ. The wrote the rules so that it's clear what things you can do to get banned. Regardless, they have the right to ban IPs or specific bots or whoever they want, because even though they let anyone access the site, that doesn't mean they have to let everyone access the site always. Like it or not, them's the facts.
So what? The whole point here is that they can say and think whatever they want, but it doesn't make any difference if the law disagrees.
which the judge didn't agree with. So right now your argument is counterfactual and pointless.
I don't know, I'm not a lawyer, but Wikipedia says "sometimes".
> Typically for a contract to be valid, acceptance has to be actively communicated. You can't be bound by a contract simply by someone saying that you have accepted
Again, not a lawyer, but I imagine that use of a service could legally constitute your active end of the communication. You're right, you can't be bound just because someone says, but when you use a service you've gone one step past.
Honestly, I think the EULA is more of a CYA for them than a contract, in practice. But it does establish the potential legality for two things: 1- that this is a licensed service, and 2- that they can refuse service to anyone they want for reasons of business interest.
These kinds of agreement ostensibly are enforceable, but harder to enforce.
If yes, would it still be if they charged only $0.01 for it?
If yes, would it still be if the price was $0?
It seems silly to me to have the "rules" depend upon the price.
Movies are copyrightable.
Compiled catalogs of personal information are not (https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R...).
The correct analogy would be if someone took a copy of my personal resume that I put online, freely accessible on the Internet and did something useful with it.
Heck, Google does this already by indexing and providing a directory of public content. The fact that LinkedIn 'allows' them to do this is by virtue that it makes business-sense to do so and drives traffic to their site.
The rule should be plain and simple here: if you put user content online and do not make any efforts to restrict it (i.e. no passwords, no logins), call it "public information", you do not have any rights to say who can and cannot access that content, at the minimum. Unless I'm mistaken, you also cannot claim copyright infringement, as the user technically owns that content as well -- you just have a license to publish it (either to a private or public audience).
It should be up to the user --- and in fact their right --- to police their own content online. Personally, I find it offensive that LinkedIn seeks to restrict the distribution of such content that I have published through their service, where the expectation it is public. They are not acting in my interest here, they are very clearly acting in their own selfish interest, which I find odd considering LinkedIn's supposed mission has always been to empower their users to achieve professional success. How exactly are they empowering me by restricting who I have told them can access my public content? And the fact such restrictions are solely decided by LinkedIn with no input of their users -- the ultimate owners here -- is a disgrace and violation of their own mission statement.
This kind of concept is exactly what the Internet was founded on, folks. To say or think otherwise strikes at the heart of the open web and representing yourself as such is an affront against the great platform that has given rise to so many companies and provided so much opportunity in the world for the individual.
This concept is bigger and more powerful than any one company, and deserves to be defended.
In this case, it's not to _your_ benefit. They're going to warn your boss that you will quit soon.
No they didn't. Users input most of their data.
Obviously LinkedIn can't control the information itself. But this case isn't about the information in the abstract. It's about an HTTP request to a piece of private property, and how LinkedIn programs that private property to respond to an HTTP request. It's well-accepted that owners of private property can make it available to the general public, with whatever restrictions they please. There is no good reason to treat web servers differently than store fronts. LinkedIn should be able to control who accesses their web servers and how.
When you type in a url a request is made and the server responds. Linkedin controls that response and can send back whatever it likes.
McDonalds can control who they sell a burger to but if I want to give my burger to a homeless man outside they shouldn't be allowed to stop me. In this case it's worse the place will tell me the price of a burger but won't allow me to tell the anyone else.
Once the information is public it has entered the public domain
That's exactly what this injunction prohitibts them from doing: "To the extent LinkedIn has already put in place technology to prevent hiQ from accessing these public profiles, it is ordered to remove any such barriers"
LinkedIn have freely provided public data to any competitor but hiQ. If they were preventing any company from taking the data, say by putting it behind a user login with licensing, it would likely not be under consideration.
I.e. that's ok to build a wall, but it's not ok not to build the wall but sue some people among the ones who take a look at the house.
Then what do you consider the storefront, or public facing data of a website? Just the whois info on the domain?
The general public can't observe the locked contents of a server without hacking it. Hacking is trespassing.
I would modify your analogy to say the store front is public facing, just like some of the LI data is public facing. There is other LI data that is not public facing.
In my opinion, websites have a larger storefront, as well as multiple levels of access to internal data.
The web isn't an abstraction, it's a network of privately owned servers responding to requests. This order tells a company that they can't program their servers to look at who is making the request and refuse to respond on that basis.
What is actually happening, is that somebody is walking into the store, asks a question about the stock or the price of the products on sale, which the store employee willingly answers.
Then, all of the sudden, the store wishes to control what you do with the answer that was willingly given to you.
This is clearly absurd - and so too is wanting to control what people do with publically-available HTTP data. If it's public, it's public.
I personally do feel that LinkedIn is within their full rights to attempt to detect and restrict content being served to screen-scraping agents, but they must then accept that screen-scraping agents must be allowed to use any means necessary to impersonate a "normal" user browsing the (public) information that they publish.
This can't be a one-sided freedom.
These two statements do not agree with each other. An owner of a brick-and-mortar shop can't (legally) stand out the front and bar black people from entering, for example.
I don't think they can stop you from photographing through the window.
It's an interesting case.
They can't stop you from sharing the price its public now.
LIN gives this information (product price) for first few questions, then if you ask about 5th price they say: 403, no more for you. WHILE IF at the same moment, different person (or you in proxy-ip-disguise) comes and asks for 5th product price, LIN happily (and publicly) gives this information.
And if the person comes wearing googlebot tshirt, LIN drops to knees and give a ..... full-db-dump-job ;)
Bing/Baidu thsirts also fit. Source: www.linkedin.com/robots.txt
HiQ simply says: that's unfair you can't decide who is good and whois bad. Effectively LIN bans any google competitor (documented case).
E.g. you may enter and look around. You may not take notes, pictures, or otherwise record what is here.
LinkedIn is not a public service, LinkedIn is a private, for-profit business. A public service is normally publicly funded. https://en.wikipedia.org/wiki/Public_service
> with the intent to your users that such information will be available publicly, you cannot then police what users of that data you consider to be "public" because it serves your business interest.
LinkedIn licenses their service to both free and paid users, and they can legally and do attempt to police what users can access the information. Whether it's enforceable or not, LinkedIn has a EULA and use of their service, whether you have an account or not, is presumably governed by their license.
They are well within their right to restrict requests being made by spammers and DDOS attacks, for example. How do you tell between legitimate requests and abusive ones, and how could you compel public access without enabling abusive ones?
> LinkedIn, of course, wants to get all the benefit of the public Internet with providing as little as they can.
The internet isn't a public service yet either, at least in the US. I think it should be, but it currently is not. This is conflating the knowledge that everyone with a computer and net access can currently access a LinkedIn server with the idea that everyone must be able to.
> LinkedIn is not a public service
A service that is public(ly accessible through Internet) != Public Service.
It's totally fine to call a service that is publicly accessible a public service, but it is going to lead to miscommunication if you are suggesting that this public service should have the same kind of legal requirements and regulations that the other kind of "Public Service" already has.
The very first thing you did in quoting the GP was rearrange those words to suit your hobby-horse; a classic straw-man.
The GP said "you cannot then police what users of that data you consider to be 'public' because it serves your business interest."
Yes, they can.
That sentence is explicitly describing the public expectations of a public service in the government sense. LinkedIn can police whatever they want because of their business interest, precisely because they are a private entity and not a public service in the government provision sense. Despite their offering information to unregistered site visitors, they are within their rights to refuse service to anyone at any time for any reason.
I did not rearrange GP's quote, and GP seemed to be conflating public access with government provision, which is why I brought up the distinction.
Are you saying that you agree with GP that LinkedIn should be compelled by law to provide public access to all?
The judge just said .. no they can't. Until the judge's ruling is overturned. Your statement is incorrect.
And you keep using the phrase "public service" which is not the issue at hand.
A store owner can not dictate who is allowed to read or take pictures of their store window. Effectively the judge was saying that if LinkedIn offers information that does not require a login - LinkedIn can not then tell someone that they can't use the info that is publicly visible.
No mention of "public service" - please stop conflating the two concepts.
This was an injunction, no ruling has been made.
> Effectively the judge was saying that ...
This is an injunction in LinkedIn vs HiQ, the judge did not share an opinion about site visitors who aren't logged in, nor make any ruling about whether publicly visible data can be restricted or not.
> A store owner can not dictate who is allowed to read or take pictures of their store window.
True, from outside the store. But to make a more complete analogy to this case, a store owner can dictate what you can read or photograph while inside the store. And the store owner can legally block the view to the outside anytime she wants.
> "public service" which is not the issue at hand.
I just explained above, and maybe you reacted quickly without understanding what I wrote. I'm not sure why I'm getting heavy pushback on this, it is both accurate and not controversial.
I reacted to @iamleppert's idea that LinkedIn can't police it's users. The fact is that they can police their users (except for HiQ until the case is over). My interpretation is that he was saying they shouldn't be allowed to police this "public" data. Do you think I misunderstood? I'm not conflating the concepts, I am distinguishing between them. It may be that one is a red herring or that I misunderstood, but on 2nd and 3rd reading it still looks to me like GP is suggesting that LinkedIn should not be allowed to restrict access to some users. If that were true it would turn LinkedIn into a public service, hence the reason why I'm talking about public services.
They have a right to ignore your question and put rules around who they will respond to and when.
When they respond with data that cannot be copywritten (a name, address, title of past position,etc) of course someone can reuse those pieces.
Which is what LinkedIn did, and what HiQ is suing them over.
Boo. OP said it's a service offered to the public. Don't need to move the goal post.
> Whether it's enforceable or not, LinkedIn has a EULA and use of their service, whether you have an account or not, is presumably governed by their license.
Turns out I also have a EULA. And when linkedin responds to my HTTP requests, they opt into my EULA agreement.
It sounds like LinkedIn doesn't want to be accessible from the world wide web. Going offline is always an option.
Nothing moved. OP also said, in the same sentence, "you cannot then police what users of that data you consider to be 'public' because it serves your business interest." Yes, they can. It is clear from context that OP was suggesting that LinkedIn should be held to the legal standards of a "Public Service" in the government entity sense of the term. This is not the case, LinkedIn has no legal requirement to provide anything to the public. Booing me doesn't change that.
> Turns out I also have a EULA. And when linkedin responds to my HTTP requests, they opt into my EULA agreement.
Good luck with that!
Do you also believe you can take the satellite tv signals beamed at your house and decrypt them?
After all, they broadcasted them as widely as they possibly could!
If they didn't want you to watch them, they shouldn't have sent it to you!
(This is a great in-theory argument that simply does not mesh well with our law in reality)
If the stream is unencrypted, the reasoning applies perfectly.
That's a straw man, that is not the issue here. LinkedIn is not recalling the data, they asked to stop HiQ from scraping and collating their data and using it for their own business.
This is already in the EULA, so what HiQ is doing may already be breaking the law.
8.2 Don'ts You agree that you will not:
k. Develop, support or use software, devices, scripts, robots, or any other means or processes (including crawlers, browser plugins and add-ons, or any other technology or manual work) to scrape the Services or otherwise copy profiles and other data from the Services;
m. Copy, use, disclose or distribute any information obtained from the Services, whether directly or through third parties (such as search engines), without the consent of LinkedIn;
ae. Use bots or other automated methods to access the Services, add or download contacts, send or redirect messages;
 Here https://github.com/tosdr/tosback2/blob/master/crawl/linkedin... is an older version of the ToS, which one can read while enjoying the irony of a crawler focused on scrapping ToS
It's not LinkedIn's right to tell me how to use facts. Just like I can't make LinkedIn liable for my EULA. It doesn't constitute an actual legal agreement.
That said I think the judge is wrong that LI should remove barriers from accessing the information. Information is speech. To mandate information be suppressed or produced is less than ideal.
Yes, you're right about that. But it is LinkedIn's right to tell you how to use their service. Lots of people ignore what they say, and they might not be able to sue someone who breaks their rules, but they can state the rules.
They said they don't want bots scraping their site, and that's their right. They wrote software to detect specific bots & IPs, and refuse service. That's also their right, in my opinion.
Sites who want to use LinkedIn's database are free to collect their own facts instead. I don't know if this is what you were suggesting, but LinkedIn's refusal to serve HiQ's bots is not suppressing any speech, in my view.
> That said I think the judge is wrong that LI should remove barriers from accessing the information.
We totally agree. What are we arguing about?
Unfortunately it's also a little too easy to wind up with corporatism or oligarchy when government is afforded too much power to regulate.
- Websites should not have ToS
- Websites may have ToS, users should be able to violate it without consequences if they don't like it ( = roughly current state of affairs)
- Websites may only have ToS derived from some criminal act
- something else?
As of now deep down in my heart I don't really believe in intellectual property. I do believe in respect, and in giving credit where credit is due.
But when all is said and done, I don't believe in copyrighting a number. I don't believe that it is anyone's right to dictate how bits that enter devices I own are used.
How do protect inventions and businesses, you ask? I say:
* If you have a secret you don't want to be distributed, don't share it with anyone you cannot trust. Trained models are a good example.
* Provide a service that only provides a limited amount of information per unit time. The user is free to use the information they obtained however they wish, but even a thousand users wouldn't have a way to copy your whole database in a reasonable amount of time. Google search, for example.
* Alternatively, if you are not sure you trust someone, take collateral from them. (e.g. "It is legal to share this information about me, but then it is also legal for me to share this other information about you [that you probably don't want shared]", or "If you share this information about me, you will be kicked out from the company/platform/etc.", or an Ethereum smart contract that causes you to be fined if someone else demonstrates that they got a piece of information that I shared with you.)
* Build businesses that have stronger value propositions than restricting how information is used. Network effects, physical hardware, good service and support, good use of massive amounts of back-end data that only you have, are all options.
Numbers can and are used to represent anything and everything. This has implications far beyond DRM. For instance: I don't believe you have the right to use my photo to make a defamatory Facebook account in my name.
- claiming that I am you
and not in copying numbers.
> I don't believe that it is anyone's right to dictate how the knives I own are used.
And yet the law still dictates you keep your knives away from my chest. Point being: your freedom of using your bits is dictated by the same laws that dictate your freedom of using knives.
If I spot a bot scraping my data, I should have the right to block it.
I wonder if the same ruling would apply to companies using Twitter's data feed? If so, it would be important in breaking open data silos.
The judge who issued this injunction - Edward Chen, is also the judge presiding over the Uber drivers as independent contractors class action case.
This is not quite right.
One of the requirements to get a PI is a likelihood of success on the merits ;)
It's also possible that the judge is giving them affordance before killing their business to prevent appeals.
and cases involving Silicon Valley companies are very often filed here, so quite a lot of the high-profile industry matters end up getting heard by the same judges!
The fact that this judge refrained from doing so may signal that the judiciary is finally willing to bring some nuance and rationality to their interpretation of extremely broad statutes like the CFAA. It's a positive signal, even if ultimate victory remains unlikely.
/me is not a lawyer
Imagine I sued a bank under the theory that the locks on their vault doors are illegally preventing me from opening them. As part of the pleliminary injunction, the judge rules that the bank must remove all locks until the case is decided.
Point is, it's not just "maintain the status quo", it's "give them your data for free". IANAL but I don't think preliminary injunctions should change the status quo by intruding on what is plausibly someone's private property.
But what if you only chose to view some of the content (e.g. block ads). What if you apply your own styles to change the way that information is displayed? You're just changing the way the browser represents that data. You're not redistributing it as your own at this point. What if you store that data, but don't republish it; just used it in some each algorithms?
There are a whole lot of interesting grey areas here, but many that already have precedents that side more with the copyright holders.
I'd suspect LinkedIn could argue about the network they create is a creative work and would be covered, but the facts about each person might not be copyright-able.
I'm well aware of the historical precedent, but that number was rather arbitrary even then - it was what a bunch of people agreed upon, based on their ideas and experience, and given the environment. It's doubly arbitrary today, considering how much the environment has changed. Is a 14-year copyright on software reasonable, for example, or too long.
Rather than making it a hard cut-off point, it would be interesting to come up with a scheme that attempts to capture the spirit of term limits.
Consider: why are copyright terms even a thing? Well, copyright is a monopoly on a thing that is not naturally restricted; it does not exist in the absence of society, and is therefore a privilege granted by that society. By itself, copyright is meant to encourage creativity in the interest of public good, and at the same time, to provide some means to derive profit from one's creative expression. So there are two conflicting interests at play here - the desire of the creator to be rewarded for the fruits of his labor, and the desire of the society to enjoy growing, constantly enriched culture. The copyright term, then, marks the point at which the latter trumps the former.
Instead, what we could do is capture the fact that the interests conflict. For as long as you hold copyright, you're effectively denying society the ability to freely enjoy the culture that you have enriched. Why, then, not tax the copyright accordingly? You could consider it a kind of intellectual property tax, but with a twist: the longer copyright is held, the more the interests of society are infringed, and the larger the compensatory payment required to maintain the copyright.
So we could start with a grace period of a couple of years that is completely free, then it starts growing steadily. For some really popular work that makes significant profits, the author could easily afford payments to maintain copyright for a decade or two (or however long - that is something that can be dialed arbitrarily). For things that are too obscure, payments would cease shortly, and they would fall to public domain. There wouldn't be such a thing as "abandonware" anymore.
What use to put the money to? Many possibilities there. Publicly sponsored arts and art education is an obvious choice. Another interesting example would be offering bulk sums of money to authors of culturally important works to surrender their copyrights sooner, so that the public can enjoy them.
There's no "murky water" in how the web works. It's very clear and precise, and anybody can learn how it works. It has to be precise and well defined, because computers can't operate any other way.
If Linkedin doesn't want "public" profile data to be accessible to everybody then they need to stop calling it public and put it behind some kind of access control.
Though that doesn't appear to be the path LinkedIn is using to fight it.
The standing to sue is an issue for user generated content, yes.
So if a company releases a product that predicts likelihood of an employee quitting, you think you're going to have standing to sue because an analysis of a copyrighted passage you wrote comprised 0.000001% of the source material the algorithm was trained on?
The context was that scraping doesn't always get a free pass because "facts". This specific case may skirt it because of the user generated content. Doesn't mean it's not worth mentioning for the larger context that copyright isn't black and white.
Maybe we should just ban User Agent strings and be done with it.
Is each request a profile page incremented (/users/1, /users/2, etc)
or dozens of requests a minute (faster than a typical user would read)?
Is static content (particularly images and CSS) being downloaded too or just the HTML content?
Sometimes the referrer HTTP header can give clues too - though you have to be careful there as that's as unreliable as the user agent header.
However if you're really paranoid about scrapers you can also throw in some honeypots. eg a fake user (/users/13) which is a user account that doesn't exist so that page wouldn't have any links from within your site. ie you only reach it if you're incrementing through the user IDs. Or perhaps a link within your HTML which doesn't render so it's only reachable via automated scripts that don't check what links are rendered inside the display view. Anyone that gets ensnared in your honeypot could then be put on a temporary IP blacklist. Though the danger of doing this is you accidentally blacklist good crawlers if you're not careful about setting appropriate robots rules.
There were websites at the time that would display just fine in Firefox, but would refuse to display anything if they detected a non-IE browser.
Yes, I realize that it's not that simple, but I think browsers would have tried much harder to adhere to standards if we had done it that way.
Private entities own and operate all(most of) the servers, services and conduits, and that does need to be paid for and maintained.
I'm not saying I agree with Linkedin in this particular scenario, but this is about two commercial for-profit entities arguing over money, so let's not make it about something it's not.
And are MORE than happy to send the content of their servers to unsolicited, uninvited, anonymous guests on mere request. No-one is forcing them to do so!
No one should be forcing them to send their content to anyone.
They claim they should be allowed to discriminate at their discretion who they respond to, since they own and operate the servers.
This "no one is forcing you to send your content" goes both ways..
If one side is going to say they're entitled to receive the content on request, the other side wants to be able to say they're entitled to refuse to answer that request..
How is it not a "pubic space"? They publish publicly visible A records for their site as well as route their public IP space to transit providers in order for the public to be able to reach their site.
A more accurate comparison would be that you put up an advertisement on a billboard along a busy street and then decided to tell people who passed by that they weren't allowed to take a picture of it.
And to continue with this absurdity you feel entitled to enforce who can or can not look at your billboard because despite it being publicly viewable its your advertisement on the billboard.
There is no "public space" on the Internet.. There's no un-owned territory or resource that is free to use or metaphorically "stand around" in to take those pictures from.
You are consuming privately-owned resources in all your online activities, and as such some will argue that they can decide to limit your consumption of those resources at their own discretion.
Again, I am not siding with either party here, just trying to dispel this notion that "public space" - in the way we understand public space to exist in the physical world - exists on the internet.
In your example, no one is controlling your right to take photos or stand around and look in any direction you choose.
When you use the Internet, a private entity is allowing you to transit through their network and access sites, a different entity is allowing you to access and receive their content, etc..
LinkedIn owns the server you are accessing when you (or others) go to their site, and they are spending resources servicing those requests, and - they claim - can decide how and when they choose to do that..
What rights should scrapers have that they don't right now? Keep in mind that a lot of the scraping going on is just some other private company abusing access and hoping to gather and use the information for their own private profit. How many companies are scraping StackOverflow for example and doing nothing but attempting to copy it and draw traffic to their own site? I can't stand copycat sites, they fill my search results with junk. I would assume the majority of scraping that is currently happening is not doing the public any good.
> I don't want to see the internet partitioned away and owned by a few companies, especially when that information is often called a "public profile".
This sounds like you're suggesting that LinkedIn or Facebook calling your profile a 'public profile' means that the law should treat it as a public service due to use of the word 'public', is that what you mean? The word public may be overloaded here. I can see why tax funded projects should be publicly accessible, but I have a hard time seeing why private companies should be compelled to provide access to anything at their own expense.
Arguably, if they do a better job getting that information in results to people who need it in search, they may be performing a service there as well. (A lot of decently informative sites have absolutely awful search/visibility.)
That may well be true, but that value doesn't mean anyone should just be able to take that value from the company that put up the effort and investment to collect the data, and turn around an use it for their own profit. Nor does it mean that a company shouldn't be able to serve the data to whomever it wants and/or restrict access from whomever it wants. Value to the consumer is still not a reason to compel private companies to offer public services. It would be valuable to both of us if Google gave us free money, but no court is going to compel them to do so just because of the potential value to you and me.
It seems bad, btw, if we choose to rely on private companies to keep the only backups of our personal data. If a site going down has a negative effect on my life, and takes down data with it that I need, it might be an indication that I shouldn't have kept my data there.
Also true that decentralized information is not a bad thing, as a generic ideal or a data backup plan. But for a business, decentralization in this context means loss of profit, as well as possibly theft, cheating, and copyright violation.
Also, the increased value that comes from a company folding will be used against you by these private, for profit scrapers. They can and will hold their copy ransom for more money, if possible.
> Arguably, if they do a better job getting that information in results to people who need it in search, they may be performing a service there as well. (A lot of decently informative sites have absolutely awful search/visibility.)
What is the argument in favor of this being legal? It is currently not legal, and the law currently does not give any credit for 'doing it better'. Why should it?
It isn't being made publicly available in that sense. LinkedIn only offers the data to site visitors (unregistered users) under the guise of a license.
> we are choosing to make our own data publicly available
This isn't true. Putting data on LinkedIn is not making it publicly available, it's sharing a copy of your data with LinkedIn, and allowing them to do whatever they want with it. Those are the terms you agree to when you register.
> Our data shouldn't be what a corporation's profit (or loss of profit) is based on to begin with.
I agree, in an ideal world, but LinkedIn does profit on your data (as do Facebook, Google, Microsoft, etc.). And we are willingly sharing our data with them and allowing this to happen. There are all kinds of crappy trends with data and privacy happening, and lots of people raising red flags. Your choice is to not use those services. If you don't want LinkedIn to use your data for their profit, then don't share your data with LinkedIn. If you share your data with LinkedIn, then LinkedIn now has the right to use your data to their own advantage.
At best, it's a burden for no solid gain for society. At worst, there will be loopholes used to DoS businesses because they can't shut down individuals due to law-given rights, and that will lead to court fights.
These rights would do nothing but save scraper authors from learning to obfuscate their actions.
If one makes information "public" but don't really want to share it, then the public is fully justified in taking it.
That's why you routinely publish your bank account, social insurance, and credit card details online, right?
LinkedIn has full control over this, it's their site. What they are fighting for is the ability to choose who gets public access to various pieces of information; which its member do not get control over.
Showing different results to google than you do to users is called cloaking and it's not allowed
Apparently, we had something similar at Demand Media before the panda update.
IIRC, this stuff varies quite a bit from region to region, even within a single metropolitan area. Attempting to simultaneously comply with multiple independently developed rulebooks was ... fun.
I can't wait for shipyard startups to disrupt the housing market. /s
The free speech argument is certainly a dud here.
The argument that has likely had any weight at all would be the antitrust/unfair competition one.
A click wrap will not change that.
It's almost certainly about linkedin's repeated claims about how they don't own these, they are public info, and they want to make them public, and now is turning around and saying "just kidding!", and trying to put someone out of business who depended on that, all so they can start their own analytics product.
As long as that is true then they will likely not run in to issues. Other issues are not for blocking them and case can be made that it's a separate issue. Defending against common internet attacks is an easy case to make to a Judge. He can't be expect LinkedIn, in this case, to kill their service so someone can scrape.
HiQ argued that LinkedIn has a monopoly on "the professional networking market" and is unfairly exploiting that monopoly to gain an advantage in the data analytics market. HiQ showed that LinkedIn might be developing an analytics product that competes directly with their Skill Mapper product.
What definition of "free" are you using here?
Or do they mean the 'public' profile which you see when logged in? If yes, this would be a real case because this is awesome data I would like to scrape and which you could build interesting business cases with.
I mean it is completely crazy, it is not LinkedIn data it is OUR data
That is actually dangerous. Why some startup or some judge can tell me to whom I can serve content and to whom I cannot?
I was going to do some experiments with larger datasets from businesses in a region, but quickly found that's not possible.
If it becomes illegal to block crawlers, then Google is gonna get hammered with bot traffic.
It will also mean that google won't have to be the front-end to search results, and anyone can build on top of it, which could kill google ad revenue because then you could create anonymous google searches.
Here's a community crawler you can use:
Google requires sites to send the crawler the same content as someone clicking a link on a Google results page would see, so even if some sites get creative covering it up with blurred boxes and similar dark patterns, the data is there in the markup.
Not sure how this complies with google's requirements, I suspect if you're big enough you get a custom arrangement. However, that doesn't explain how hiQ are getting the data.
Recently I was setting up my new phone and thought about installing their app and I thought to myself, why?
Eventually that thought came back to me when I was attempting to update my profile and simply decided to delete it entirely.
This is the same outcome most of us wanted between Swartz and JSTOR, and perhaps with Malamud and PACER. No technical control can be in the right place, but we can hope for a common understanding (maybe eventually law?) that terms of service may demand or prohibit some things but not anything.
On the other hand, privacy is an issue too. LinkedIn lets you download a spreadsheet with the email addresses of all your connections, and if you have a lot of connections you will regularly get e-mail messages from life coaches, "managing directors", software development outsourcers, "SEO experts", and all kind of BS artists.
Anyone know whether this is right?
And BTW in case you're not aware, if you hold data from any EU citizens you'll be required to comply with the GDPR regardless of where you're located.
I tried a year ago and obviously it was impossible.
How can Linkedin argue that Google be allowed to scrap but other third party cannot?
For instance, "buried in 6 layers of obfuscated XML" and "accessible in O(N^3) time" would both be implementations that are not "blocking" the data but they would still be extremely difficult to use.