They all ended up in a huge collection (773 million records) containing email/password pairs from many different sources: https://www.troyhunt.com/the-773-million-record-collection-1...
With so many password variations for a user, you can do credential stuffing to crawl all the private accounts of an email to build a pretty complete profile of the person (not just correlate some linkedin profile like in this post). I am sure someone out there is already doing this for profit.
There was a LSC-Signature header that was a sha1_hmac("POST%2Fosc%2Fpeople%2Fdetails$auth_token$unix_timestamp", "aa15bd5f089eb93a5b2b4a0e11443cb78e44f34d"); which I reversed from the Social Connector DLL, I never found it posted online, but others must have done the same.
nonpublic profiles (or nonpublic data from public profiles) from, say, FB would be different.
To humans, not machines. No one joined LinkedIn to be marketed to by random jabroni's or to be added to a CRM for BS intro emails that don't have opt out links despite being automated.
I’m not sure how many people feel like me or feel something else. But I don’t think it’s possible to say people post public profiles only for humans to read. I’m really glad my profile gets harvested by google and DDG and have a public email on purpose.
The GDPR is very relevant here - restrictions around the storage and processing of personal information is the whole point of it.
While this might seem like a grey area (given the information is public), the GDPR is actually very clear here - you cannot store and process PI without the consent of those individuals.
The EU is not a world government, and the GPDR should not apply to non-EU citizens. Europe cannot regulate what I do here in America- the law is not simply applicable. (And I say this as someone that supports more tech company regulation here in the US!) Things like France trying to apply your 'right to be forgotten' to the entire world's Google search results are extremely troubling.
Don't apply your country/region's laws to non-citizens, please :)
One could also argue that by having a fine structure that disproportionately affects small businesses (thus consolidating power, money, and personal data in the hands of a few large businesses), GDPR doesn't protect you even on those sites that are subject to it. Some might say that it is actually a privacy killer. But I'll leave that discussion for another day.
That's simply untrue. From fines already levied we've seen small businesses getting fines of a few thousands, while BA is getting a fine of a couple of hundred million pounds.
That seems like the very definition of disproportionate to me.
> Due regard should however be given to the nature, gravity and duration of the infringement, the intentional character of the infringement, actions taken to mitigate the damage suffered, degree of responsibility or any relevant previous infringements, the manner in which the infringement became known to the supervisory authority, compliance with measures ordered against the controller or processor, adherence to a code of conduct and any other aggravating or mitigating factor. The imposition of penalties including administrative fines should be subject to appropriate procedural safeguards in accordance with the general principles of Union law and the Charter, including effective judicial protection and due process.
The proportionality is not in GDPR or any individual law, but set out in the framework treaty under which all EU laws function.
> (23) In order to ensure that natural persons are not deprived of the protection to which they are entitled under this Regulation, the processing of personal data of data subjects who are in the Union by a controller or a processor not established in the Union should be subject to this Regulation where the processing activities are related to offering goods or services to such data subjects irrespective of whether connected to a payment
...so if you're offering any goods or services to non EU citizens who are in the EU but you are a non EU company, GDPR still applies if the processing relates to offering them goods and services.
> (22) Any processing of personal data in the context of the activities of an establishment of a controller or a processor in the Union should be carried out in accordance with this Regulation, regardless of whether the processing itself takes place within the Union
> (24) The processing of personal data of data subjects who are in the Union by a controller or processor not established in the Union should also be subject to this Regulation when it is related to the monitoring of the behaviour of such data subjects in so far as their behaviour takes place within the Union.
So monitoring of EU data subjects by non-EU companies and processing data relating to their activities in the EU are definitely covered by GDPR even if you don't intend to offer them goods and services.
Text above quoted from the English text of GDPR as at https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CEL...
"Whereas the mere accessibility of the controller’s, processor’s or an intermediary’s website in the Union, of an email address or of other contact details, or the use of a language generally used in the third country where the controller is established, is insufficient to ascertain such intention, factors such as the use of a language or a currency generally used in one or more Member States with the possibility of ordering goods and services in that other language, or the mentioning of customers or users who are in the Union, may make it apparent that the controller envisages offering goods or services to data subjects in the Union."
In other words, don't offer a site in EU languages, accept EU currencies, or ship to the EU and GDPR does not apply (unless you are based there).
If the author of the parent post isn't in Europe, why is it relevant.
This came up most recently in the Manning/ Snowden leaks as while the leaks were illegal, using the info is not illegal. There was a lot of press and here’s a decent post by a law professor explaining legality, https://jonathanturley.org/2016/10/17/cnn-it-is-illegal-for-...
Data is different from stolen property in that it isn’t property. Once data is made public it is no longer proprietary so it can be used legally even if it was originally obtained illegally. This is different if you were paying for non-public stolen data, but no one here is talking about that.
The UID->email data dump was not, AFAIK, legal though.
Here, the opinion strongly suggested the CFAA does not prohibit scraping of publicly available data:
"It is likely that when a computer network generally permits public access to its data, a user’s accessing that publicly available data will not constitute access without authorization under the CFAA." (Opinion at 33.)
Since this was a preliminary injunction, this passage won't be binding on other courts; however, it certainly will be cited as persuasive precedent. So will more policy-oriented passages like the following:
"giving companies like LinkedIn free rein to decide, on any basis, who can collect and use data—data that the companies do not own, that they otherwise make publicly available to viewers, and that the companies themselves collect and use—risks the possible creation of information monopolies that would disserve the public interest." (Opinion at 36.)
> I emphasize that appealing from a preliminary injunction
to obtain an appellate court’s view of the merits often leads
to “unnecessary delay to the parties and inefficient use of
judicial resources.” Sports Form, 686 F.2d at 753. These
appeals generally provide “little guidance” because “of the
limited scope of our review of the law” and “because the
fully developed factual record may be materially different
from that initially before the district court.”
The opinion does cite other 9th circuit decisions that imply that the 9th circuit believes that the CFAA does not prohibit scraping, but also explicitly notes that the CFAA is not the only relevant law.
> We note that entities that view themselves as victims of
data scraping are not without resort, even if the CFAA does
not apply: state law trespass to chattels claims may still be
I don't see where my claims overstep what is laid out in that opinion.
Edit: The opinion does indicate that there is a good chance that the 9th circuit will eventually rule that public scraping is not covered by the CFAA, but even if the 9th circuit Court does make that ruling, that still would not mean that scraping is legal under other laws.
Where I disagreed is with your statement that the case "does not purport to provide any precedent on the legality of scraping." It does. It is persuasive precedent that the CFAA does not bar scraping publicly available data.
The first quote you mention ("I emphasize...") is from one judge's concurring opinion, not the court's full opinion. Further, the "little guidance" part of that quote doesn't mean that the opinion provides "little guidance" in general. The concurring judge was making the point that the parties shouldn't have delayed a full trial while waiting for the appeal. By "little guidance," he meant that the appeal provides "little guidance" to these particular parties about how a full-fledged trial will play out.
The language is very consistent and careful about not doing that because the court did not rule on that matter.
> In common law legal systems, precedent is a principle or rule established in a previous legal case that is either binding on or persuasive for a court or other tribunal when deciding subsequent cases with similar issues or facts. Common-law legal systems place great value on deciding cases according to consistent principled rules, so that similar facts will yield similar and predictable outcomes, and observance of precedent is the mechanism by which that goal is attained.
There was no principle or rule established here regarding the CFAA, thus no precedent that must be considered by other courts.
Courts generally try to restrict their rulings to the minimal needed to decide any particular case.
If the court had made a ruling, they would not make a point of qualifying all the statements about the CFAA the way they did.
The case did establish "principle[s]" that are "persuasive" for courts deciding subsequent cases. Put it this way: Say someone gets indicted for violating the CFAA by scraping a public site. You bet their attorneys will cite hiQ v. LinkedIn as persuasive precedent for dismissing the indictment. And the court, "when deciding" that case, absolutely will consider the Ninth Circuit's statement that it's "likely" that accessing "publicly available data will not constitute access without authorization under the CFAA."
Here's another point: When the Ninth Circuit decides a case, it chooses whether the decision is "published" or "unpublished." The Ninth Circuit rules expressly say that "unpublished" decisions are not precedent.
> Ninth Circuit Rule 36-3(a): "Not Precedent. Unpublished dispositions and orders of this Court are not precedent...."
Here, the Ninth Circuit chose to issue hiQ v. LinkedIn as a published case. If the Ninth Circuit wanted the case not to be precedent, it would not have done so, and easily could have made it "unpublished."
You might hit an extra jailtime jackpot.
>During my work on the start-up, I developed techniques that allow me to collect and cross-reference a lot of personal data including data from LinkedIn.
From his bio "...leading the evolution of startups and enterprises to achieve the highest level of security and compliance."
I don't think it's meant to be a security element, but to disambiguate same name collisions, right?
...contains links to all public LinkedIn profiles. I looked for a bunch of people I know with public profiles and they weren't in there (and neither was I).
>The link to the LinkedIn user profile was missing and personal information was lacking. As a result, it was not very useful.
I think he is downplaying the value of that hacked database. Without it what would he have? userid and profile url combos...
Don't they already? There have been SOOO many breaches in this area that I rather doubt there are many active emails that don't have some publicly available dataset linking them to first and last names. The valuable thing here is linking that data to the LinkedIn profile ID.
... serving up over unsecured http
Finally, the title of this article says he did this for “fun and profit”. By the author’s own admission, the “profit” part is missing here. He claims the company “...went out of business without getting funding”.
So in other words, he has access to the main LinkedIn website, found a link to a database from an old data breach, and used to work for a now defunct company. None of that translates to “Harvesting LinkedIn data for fun and profit”.