
Harvesting LinkedIn data for fun and profit - stremovsky
http://cloudinvent.com/blog/harvesting-linkedin-data-for-fun-profit/
======
zawerf
This is a pretty tame use of the 2012 Linkedin breach. The breach also
contained unsalted hashes which has mostly been cracked by now.

They all ended up in a huge collection (773 million records) containing
email/password pairs from many different sources:
[https://www.troyhunt.com/the-773-million-record-
collection-1...](https://www.troyhunt.com/the-773-million-record-
collection-1-data-reach/)

With so many password variations for a user, you can do credential stuffing to
crawl all the private accounts of an email to build a pretty complete profile
of the person (not just correlate some linkedin profile like in this post). I
am sure someone out there is already doing this for profit.

------
bwblabs
I use to (ab)use their Outlook Social Connector (OSC) from the now gone API
([https://outlook.linkedinlabs.com/osc/people/details](https://outlook.linkedinlabs.com/osc/people/details)),
they stopped it in 2015. I used it just to get names and profile images for
easy onboarding (like gravatar).

There was a LSC-Signature header that was a
sha1_hmac("POST%2Fosc%2Fpeople%2Fdetails$auth_token$unix_timestamp",
"aa15bd5f089eb93a5b2b4a0e11443cb78e44f34d"); which I reversed from the Social
Connector DLL, I never found it posted online, but others must have done the
same.

------
gumby
I don't see the problem with collecting public profiles. They are, you know,
public, and people entered their own data in the interest of propagating it.

nonpublic profiles (or nonpublic data from public profiles) from, say, FB
would be different.

~~~
tomnipotent
> and people entered their own data in the interest of propagating it

To humans, not machines. No one joined LinkedIn to be marketed to by random
jabroni's or to be added to a CRM for BS intro emails that don't have opt out
links despite being automated.

~~~
riku_iki
Emails are not exposed in public profiles..

~~~
Breza
That's true to a point. Most workplaces use a standardized format for email
addresses. If you have somebody's name, you can send them official looking
spam.

------
jszymborski
Just a reminder that using data obtained illegal (e.g. the 2012 LinkedIn hack)
is also illegal (depending on your jurisdiction, IANAL, et cetera...)

~~~
gumby
At least in the western United States, scraping is fine:
[http://cdn.ca9.uscourts.gov/datastore/opinions/2019/09/09/17...](http://cdn.ca9.uscourts.gov/datastore/opinions/2019/09/09/17-16783.pdf)
. LinkedIn was even a party to this case!

The UID->email data dump was not, AFAIK, legal though.

~~~
shkkmo
That is not at all what that ruling says. The ruling just upholds an
injunction until the case is decided. It explicitly does not pupport to
provide any precedent on the legality of scraping.

~~~
calny
That's not exactly right. You're correct the ruling only upheld a preliminary
injunction. Like you imply, that's a provisional remedy before trial, subject
to change. But in practice the court is unlikely to change its views about how
the CFAA operates. And rulings about preliminary injunctions are frequently
cited as precedent.

Here, the opinion strongly suggested the CFAA does not prohibit scraping of
publicly available data:

 _" It is likely that when a computer network generally permits public access
to its data, a user’s accessing that publicly available data will not
constitute access without authorization under the CFAA."_ (Opinion at 33.)

Since this was a preliminary injunction, this passage won't be binding on
other courts; however, it certainly will be cited as persuasive precedent. So
will more policy-oriented passages like the following:

 _" giving companies like LinkedIn free rein to decide, on any basis, who can
collect and use data—data that the companies do not own, that they otherwise
make publicly available to viewers, and that the companies themselves collect
and use—risks the possible creation of information monopolies that would
disserve the public interest."_ (Opinion at 36.)

~~~
shkkmo
I am no legal expert, but the judge explicitly warns against reading too much
into him upholding the injunction:

> I emphasize that appealing from a preliminary injunction to obtain an
> appellate court’s view of the merits often leads to “unnecessary delay to
> the parties and inefficient use of judicial resources.” Sports Form, 686
> F.2d at 753. These appeals generally provide “little guidance” because “of
> the limited scope of our review of the law” and “because the fully developed
> factual record may be materially different from that initially before the
> district court.”

The opinion does cite other 9th circuit decisions that imply that the 9th
circuit believes that the CFAA does not prohibit scraping, but also explicitly
notes that the CFAA is not the only relevant law.

> We note that entities that view themselves as victims of data scraping are
> not without resort, even if the CFAA does not apply: state law trespass to
> chattels claims may still be available

I don't see where my claims overstep what is laid out in that opinion.

Edit: The opinion does indicate that there is a good chance that the 9th
circuit will eventually rule that public scraping is not covered by the CFAA,
but even if the 9th circuit Court does make that ruling, that still would not
mean that scraping is legal under other laws.

~~~
calny
Your point that other laws might (or might not) apply is a good one that
people should note. "Trespass" is one. LinkedIn apparently alleged violation
of the DMCA too. It chose not to press those issues in the appeal, so the
opinion isn't direct authority regarding those other laws.

Where I disagreed is with your statement that the case "does not purport to
provide any precedent on the legality of scraping." It does. It is persuasive
precedent that the CFAA does not bar scraping publicly available data.

The first quote you mention ("I emphasize...") is from one judge's concurring
opinion, not the court's full opinion. Further, the "little guidance" part of
that quote doesn't mean that the opinion provides "little guidance" in
general. The concurring judge was making the point that the parties shouldn't
have delayed a full trial while waiting for the appeal. By "little guidance,"
he meant that the appeal provides "little guidance" to these particular
parties about how a full-fledged trial will play out.

~~~
shkkmo
Please show me where there is a definitive statement in the opinion that the
CFAA does not apply to scraping?

The language is very consistent and careful about not doing that because the
court did not rule on that matter.

~~~
calny
Something doesn't need to be "definitive" to qualify as precedent. See the
quote I mentioned above at page 33 ("It is likely..."). That's precedent.

~~~
shkkmo
"Precedent" has a specific meaning here, in the context of legal cases:

> In common law legal systems, precedent is a principle or rule established in
> a previous legal case that is either binding on or persuasive for a court or
> other tribunal when deciding subsequent cases with similar issues or
> facts.[1][2][3] Common-law legal systems place great value on deciding cases
> according to consistent principled rules, so that similar facts will yield
> similar and predictable outcomes, and observance of precedent is the
> mechanism by which that goal is attained.

There was no principle or rule established here regarding the CFAA, thus no
precedent that must be considered by other courts.

Courts generally try to restrict their rulings to the minimal needed to decide
any particular case.

If the court had made a ruling, they would not make a point of qualifying all
the statements about the CFAA the way they did.

~~~
calny
Sorry for just getting back to this.

The case did establish "principle[s]" that are "persuasive" for courts
deciding subsequent cases. Put it this way: Say someone gets indicted for
violating the CFAA by scraping a public site. You bet their attorneys will
cite hiQ v. LinkedIn as persuasive precedent for dismissing the indictment.
And the court, "when deciding" that case, absolutely will consider the Ninth
Circuit's statement that it's "likely" that accessing "publicly available data
will not constitute access without authorization under the CFAA."

Here's another point: When the Ninth Circuit decides a case, it chooses
whether the decision is "published" or "unpublished." The Ninth Circuit rules
expressly say that "unpublished" decisions are not precedent.

> Ninth Circuit Rule 36-3(a): "Not Precedent. Unpublished dispositions and
> orders of this Court are not precedent...."

Here, the Ninth Circuit chose to issue hiQ v. LinkedIn as a published case. If
the Ninth Circuit wanted the case _not_ to be precedent, it would not have
done so, and easily could have made it "unpublished."

------
radiusvector
Where's the profit part? How did you monetize stolen data?

~~~
skrebbel
It's a figure of speech.

------
tomquirk
Here's a friendly Python library that is ideal for this:
[https://github.com/tomquirk/linkedin-
api](https://github.com/tomquirk/linkedin-api)

------
Domenic_S
> _Get rid of duodecimal profile ids. Obscurity is not a solution here._

I don't think it's meant to be a security element, but to disambiguate same
name collisions, right?

~~~
applecrazy
This is true. The profile URL is customizable, and the profile id can be
removed from the url.

------
stremovsky
Hi, People started to look themselves in the LinkedIn index. It is country-
based. I updated the article with more examples. For example:
[https://il.linkedin.com/directory/people-a-1/](https://il.linkedin.com/directory/people-a-1/)
[https://www.linkedin.com/directory/people-a-1/](https://www.linkedin.com/directory/people-a-1/)
[https://uk.linkedin.com/directory/people-a-1/](https://uk.linkedin.com/directory/people-a-1/)
[https://de.linkedin.com/directory/people-a-1/](https://de.linkedin.com/directory/people-a-1/)
[https://fr.linkedin.com/directory/people-a-1/](https://fr.linkedin.com/directory/people-a-1/)

------
trackofalljades
I don't really understand the author's claim that...

[https://il.linkedin.com/directory/people-a-1/](https://il.linkedin.com/directory/people-a-1/)

...contains links to all public LinkedIn profiles. I looked for a bunch of
people I know with public profiles and they weren't in there (and neither was
I).

~~~
stremovsky
This specific subdomain lists people in Israel (IL.linkedin.com). There are
other subdomains for other countries.

------
bwb
I think I missed it, but how did he get their emails? That part I didn't
understand as I was hoping LI didn't expose that...

~~~
papreclip
>Searching on Google, I found the database from the LinkedIn 2012 hack. Each
record had a user id and an email without additional information.

>The link to the LinkedIn user profile was missing and personal information
was lacking. As a result, it was not very useful.

I think he is downplaying the value of that hacked database. Without it what
would he have? userid and profile url combos...

~~~
Avamander
You can also enumerate users based on phone numbers, you don't need the
database in that case, 10k numbers per account, probably also somehow
resettable but I haven't spent that much time on it because LinkedIn didn't
find it an issue.

------
dlphn___xyz
this seems pretty useless...

~~~
giarc
He took two data sources (LinkedIn and LinkedIn data hack) and combined them
to get first, last, email, linkedin profile ID. Imagine a spammer having
millions of active email addresses with first/last.

~~~
shkkmo
> Imagine a spammer having millions of active email addresses with first/last.

Don't they already? There have been SOOO many breaches in this area that I
rather doubt there are many active emails that don't have some publicly
available dataset linking them to first and last names. The valuable thing
here is linking that data to the LinkedIn profile ID.

------
piqufoh
> For the past 15 years I’ve been leading the evolution of startups and
> enterprises to achieve the highest level of security and compliance.

... serving up over unsecured http

------
VeryHacker
That creepy. Good catch OP

------
downandout
This is clickbait and does not belong on the front page of HN. The author says
he scraped _public_ profile URLs and names. When you make your profile on
social websites public, then you have chosen to...make them public. He then
claims he has emails from LinkedIn, but those emails are from an old data
breach, and he even admits that the emails are limited to those found in a
2012 data breach.

Finally, the title of this article says he did this for “fun and profit”. By
the author’s own admission, the “profit” part is missing here. He claims the
company “...went out of business without getting funding”.

So in other words, he has access to the main LinkedIn website, found a link to
a database from an old data breach, and used to work for a now defunct
company. None of that translates to “Harvesting LinkedIn data for fun and
profit”.

