
LinkedIn loses appeal over access to user profiles - isalmon
https://www.reuters.com/article/us-microsoft-linkedin-profiles/microsofts-linkedin-loses-appeal-over-access-to-user-profiles-idUSKCN1VU21W
======
pixelmonkey
The summary here is that LinkedIn tried to argue that it could prevent
scraping of public LinkedIn profile data under their ToS, but the courts have
ruled that if data is public and provided by users, it can be scraped/crawled,
that is, it isn’t LinkedIn property. This is generally a positive outcome for
people/companies turning web text and HTML into structured data, e.g. tools
like Puppeteer and Scrapy can be used more freely on sites like LinkedIn,
Twitter, and Reddit. Now, you might still get into trouble if you re-publish
that data, but you can, at least, safely use the data ”internally”, and the
act of scraping/crawling (politely) is not, per se, something unlawful.

~~~
eagsalazar2
Not sure "isn't LinkedIn property" is accurate here. They still retain
ownership and control of redistribution just like any other IP. This is more
of a philosophical question about whether "viewing" itself is a violation of
their ownership rights and really about the definitions of "viewing" and
"public" in the context of the internet.

Seems like they've simply determined that viewing any freely accessible URL is
"public" and that "viewing" does include scraping. This seems like a very
reasonable determination as it maps pretty neatly to how we think about
viewing public content IRL where I am free to drive down the road (for profit
or pleasure) and record publicly viewable signage and activities and use that
data any way I see fit.

~~~
rebuilder
Maybe it's more accurate to say "any publicly linked URL"? IIRC, charges have
been successfully brought against people for e.g. iterating through user
identifiers in URLs to gain access to other users' data. (Do correct me if I'm
wrong on that count!)

~~~
komali2
Some kid was charged for that but in my opinion it was stupid. URL to me means
part of the UX. If you search on Google using a query parameter directly
instead of entering the query in their search box, should that count as
wrongful use?

~~~
rebuilder
Stupid or not, that's a matter for the lawmakers. What I'm saying is that, as
far as I know, a ruling that any publicly accessible URL is fair game would
contradict previous rulings.

Now, this is based on my very patchy memory of sensationalist reporting of
legal matters in a jurisdiction I don't reside in, so there's probably some
wiggle room there ;)

------
echelon
This is fantastic. I would like to see wider legislation allowing scraping of
IMDB, Genius, Reddit, Facebook, and Google made legal. These services receive
free input from users. The data should remain free.

Edit (sort of off topic): There's still value in the building and providing
services at scale, but this lowers the barrier to cross the moat for small
players. The first step is data liberation. Then we can work to bring down the
other cost barriers. It's a lot easier to build services that scale in 2019
than it was in 2005.

The semantic web was misguided in 200X, but we might want to take another
swing at it in the future.

~~~
polygot
If you add .json to the end of a Reddit URL, it will return JSON data. For
example:
[https://www.reddit.com/r/ubuntu.json](https://www.reddit.com/r/ubuntu.json) .
It also works with comment threads and posts.

~~~
polygot
Also, it outputs XML and RSS too:
[https://www.reddit.com/r/ubuntu.rss](https://www.reddit.com/r/ubuntu.rss) and
[https://www.reddit.com/r/ubuntu.xml](https://www.reddit.com/r/ubuntu.xml)

~~~
ahbyb
xml and rss seem to be the same exact output

------
perspective1
I'm torn. On the one hand, scraping helps break down walled gardens. On the
other, we're talking about personal details being used in novel ways that no
LinkedIn user probably understands. I doubt any LinkedIn user writes their
profile expecting HiQ to scrape it, assign a "flight risk" score and alert
your bosses.

~~~
Nextgrid
The users agreed to publish their details publicly on LinkedIn. It’s normal
that anyone can access those details and use them however they like.

~~~
smohare
There is a broader ethical discussion of how to treat data, nominally public,
that is increasingly collected, persisted and analyzed indefinitely by
adversarial agents. It seems clear to me that a more nuanced categorization is
required. This data is not public in the same sense as an uttered word was in
a town square a hundred years ago.

Imagine being denied job opportunities because some company has analyzed the
careers of the last 25 generations of your ancestors and deemed your lineage
to be inadequate?

~~~
pergadad
You mean maybe for your LinkedIn general profile info to be public, but not
for the old profile data or the metadata of when your changed what to be
public. It's not per se secret or hidden but it's also not intended to be kept
and processed.

This is also a clear case where GDPR would come in. This is personal data,
whether intentional or not, and the scraper is obliged to conform to EU laws
if they scrape data on EU citizens - including eg information rights and
deletion.

------
undefined3840
I recently learned from a recruiter that one license for one recruiter for
LinkedIn is $10k a year, so that is what they are protecting.

------
phs318u
I’m a very active user of LinkedIn, effectively cultivating my “professional
brand” on it. I’ve been contracting for years and use my network to find gigs.
While I don’t have an issue with the business that HiQ are in (informing
businesses of employee flight risk), I do believe there’s a qualitative
difference between data that I publish for consumption by human eyeballs for
free (a use of my data that I’ve authorised), and someone harvesting such data
and en-mass for commercial purposes that I have not authorised. HiQ have not
asked for my permission to use my data, they have not made any commitments
about how they will use and not use my data. Given that they have access to my
contact details (even via LI itself), they are capable of contacting me to
request permission to use my data.

~~~
CosmicShadow
What HiQ did was scrape public data, so if you have your LI profile set to
public, then anyone can access it and do what they will with it, just like if
you posted a print out of it on a bulletin board in a mall. It's in the open
and is free game for whtever. You can make your entire profile or just aspects
of it private, meaning people need to login to LI to see your stuff, which
then protects you under the TOS.

I think profiles were default public so you could be found on Google and for
SEO purposes for both you and LI.

You'd be hard pressed to find a public profile accessible anymore on LI
anyway, even with public settings, you'll hit an authwall 9 out of 10 times.

~~~
phs318u
I understand what HiQ have done. I'm saying I believe there's a material
difference between public data for consumption by individual human beings, and
systematic commercial harvesting. I appreciate that in the US, there may be no
legal distinction between types of consumption of public data. Public data is
public. However, I'm arguing that any commercial use or of my data beyond
fair-use, should require my permission and an explanation of how my data will
be stored and treated, so that I can be assured that my rights (over further
unauthorised use) are preserved.

EDIT: It occurs to me that HiQ's success over LinkedIn does not necessarily
imply they would be successful against actual LI users in a GDPR-like
jurisdiction. Also, what if LI turned around and allowed each user to specify
a style of CC license under which their specific data is published (by LI on
behalf of the user). If I specified a non-commercial license variant, would
that disallow HiQ's actions (without seeking permission)?

~~~
CosmicShadow
I can't say I know much about licensing and/or GDPR stuff, all I know is that
if it's public, I don't have to agree to anything and I can do whatever I
want, which is great for me and my business. From the other side, yes it sucks
that people can take my stuff and profit from me and there is nothing I can do
about it and no way to enforce it and I probably don't even know it's
happening. (sounds like ad tracking!)

The way things work in North America at least to my understanding is that it
doesn't matter what license you use, I don't have to agree to it to scrape it
and use it if there is no click wrapper. I guess if you caught me explicitly
using it in a certain way, I could get in trouble, but that is not easy. What
you propose sounds reasonable, but I don't know how it would be enforced or if
it would still stop people. I'm owed 30k in consulting wages and I can't even
make it worth my while to pursue that from a legal standpoint, let alone try
and sue some unknown and/or potentially massive company or scattered random
ghosts across the interwebs.

------
danielrhodes
LinkedIn has played a very poor strategy here. The value of the service should
be in the network, which is quite defensible. Instead, they’ve made the value
in the profiles, which is not defensible. Few people curate their network on
LinkedIn because you can't see profiles unless you are closely connected, so
you are incentivized to add as many people as possible, thus devaluing the
entire network. Then they go and sell unlimited access to profiles to
recruiters and sales people. Thus, when other services come around and scrape
their data, which LinkedIn needs to make somewhat publicly available for SEO
juice, it becomes an existential threat.

If you look at Facebook, there is some limited profile data publicly
available, but they will go to the wall to prevent people from seeing how
those people are connected. In addition, they started from a very walled-off
position, so they didn't become reliant on SEO traffic.

------
crazygringo
Question:

This seems to mean LinkedIn can't _sue_ to prevent scraping.

I assume it's still legal for them to implement technological anti-scraping
measures? So the two companies can play cat-and-mouse if they wish with rate-
limiting, IP addresses, etc...

~~~
thomascgalvin
An earlier ruling actually ordered LinkedIn to stop attempting to block the
scraping using technological measures, too.

~~~
perl4ever
Has robots.txt been outlawed now? In what jurisdiction exactly?

~~~
sjg007
I don't believe that robots.txt has ever had the backing of law.

------
tempestn
This blog post is an excellent summary, and covers what was actually decided
and what is still unknown:
[https://blog.ericgoldman.org/archives/2019/09/ninth-
circuit-...](https://blog.ericgoldman.org/archives/2019/09/ninth-circuit-says-
linkedin-wrongly-blocked-hiqs-scraping-efforts.htm)

------
lr4444lr
What cracks me up about this is how these massive companies go to such lengths
to call themselves mere platforms in order to avoid liability for content, and
then when someone actually takes the content in this case they cry, "Foul!
That's _ours_!" Can't have it both ways.

~~~
genidoi
Linkedin tried to argue that if they put data behind a login wall, then it no
longer falls under the wide umbrella of "public data" and so it's "theirs".
Previous cases already established that if a crawler can see the data without
any session cookies then its okay. This ruling extended that to any data that
can reasonably be accessed by any member of the public.

There will probably be more cases like this as the upper bound of what "public
data" means; At what point does publicly aggregated data stop being public
data? And do attempts that companies make to prevent that data from being
captured (ip limiting, captchas, login walls) count as immoral/illegal, since
they are restricting the public from accessing a public good?

~~~
mafuy
> Previous cases already established that if a crawler can see the data
> without any session cookies then its okay.

I'm interested in this, but I'm not sure how to learn more - can you give me a
hint?

~~~
genidoi
Do you mean crawling without cookies or the legal case?

------
playing_colours
I do not like a hide and seek game with who viewed your profile functionality:
upgrade to a paid subscription to see who viewed, upgrade to another tier to
hide that you looked at someone.

It looks like the lack of imagination or business prowess to come up with more
advanced, valuable, and less annoying ways for monetisation. If only they
could make it easier to connect people with matching mutual interests, more
flexible than plain traditional job board and the database of CVs.

~~~
gnicholas
You don’t have to pay to hide that you viewed someone’s profile. Maybe if you
want to see who viewed yours, but also keep your browsing private — but it
seems more reasonable to charge for that sort of functionality.

------
datelinereader
FYI, this article is from a month ago and this general story was discussed
here at the time (linking the official announcement):

[https://news.ycombinator.com/item?id=20920753](https://news.ycombinator.com/item?id=20920753)

------
xupybd
After finding this
[https://github.com/Greenwolf/social_mapper](https://github.com/Greenwolf/social_mapper),
I strongly recommend against having a profile photo on linkedin. It has caused
me to be far more careful about my presence on the internet.

In the post privacy age I don't want my personal opinions to come back and
haunt me. I grow as a person but the internet remembers all. If I make a dumb
mistake and it's published online that's not a problem for me in 10 years if
that fades away. But people are collecting and correlating info now. I don't
like it one bit. It means someone you've never met, in a country you've never
been to could extort you. It's getting very scary.

~~~
vesche
You can make it so your picture on LinkedIn is only viewable by people who are
connected with you. I do agree that people should be cautious about what they
post/share online however.

------
gist
I think also what most people don't realize is that linkedin's current model
makes it difficult to access someone's profile without them knowing (if they
pay for it and have the option on their account) to see who is looking at
their profile. As such the user wanting to look at a person's profile has no
privacy that they have done so. There could be many reasons someone looks at
someone else's profile (even just some kind of curiosity or mistake) so this
to me is an issue in itself.

Sure there are ways around this (you can make up a fake profile and some info
is public but normally what I run into is a request to login to linkedin to
view something that I am interested in).

~~~
scarface74
There is a setting that lets you see other people’s profile without them being
notified. You can do it with free accounts. But you also can’t see who viewed
your profile.

If you pay, you can keep your viewing private while seeing other people’s
profile.

~~~
gist
But if they pay can't they override that or are you saying that even a paid
account on linkedin can't see if you looked at their profile if you (on a free
account) have said 'don't allow anyone to see'?

~~~
scarface74
That’s what LinkedIn says. So I hope that’s the case.

------
ChrisMarshallNY
Personally, this doesn't bother me too much. I use LinkedIn specifically
because it is public. I'm an "open kimono" type of person. Not particularly
interested in hiding stuff.

However, the general principle of "Data Scraping as a Business Model" bothers
me. This is by no means the only company that does it (I suspect that MS does
it with their access to LinkedIn).

There are far more egregious instances, and many of them have ways to get
users to voluntarily cede information (can you think of a rather obvious
example?).

LinkedIn is a sandwich board. It's meant to be a public showcase. If you want
private, I suspect there are much more focused (and probably valuable) venues
that cater to particular communities.

~~~
alexandercrohde
> Not particularly interested in hiding stuff.

Well, so the company, HiQ, is basically scraping every time you update your
linked in, to tell your employer you might be about to leave.

Now maybe that's cool with you. But it seems super sketchy to me, and one
reason I deleted my linkedIn altogether.

~~~
ChrisMarshallNY
You made the correct decision.

It is not "cool" with me. It just means that it isn't a factor for me. I would
not have used LI for a job search in any obvious way.

If I keep a fairly current and active generic profile, then LI is useful, and
no one needs to know whether or not I'm looking.

------
hooloovoo_zoo
What if LinkedIn adds a visibility option in addition to public/private
profile that says "I want LinkedIn to prevent robots from scraping my
profile."? What if LinkedIn enables that mode by default? Can they then
continue preventing scrapers?

~~~
alt_f4
I think they can, but they won't because robots includes search engines and
blacklisting search engines from user profiles will very negatively impact
their metrics.

~~~
perl4ever
You're implying they can't discriminate. So does this case make robots.txt
illegal?

~~~
alt_f4
First off, robots.txt is optional. It's neither a technical nor a legal
limitation at this point.

Second, OP's argument suggests a UI option that gives or removes user consent
from all robots in general. Unless they plan to word it: "allow robots that we
like that are good for us. but disallow other robots", I don't think it's okay
to discriminate by either allowing/banning a particular robot, as that is not
what the user agreed to.

------
myth_buster
Detailed discussion from September when the decision was made.

[https://news.ycombinator.com/item?id=20920753](https://news.ycombinator.com/item?id=20920753)

------
conjectures
IP aside, anyone else concerned about the business of HiQ?

I presume what they are doing is:

* Scrape profiles.

* Calculate time delta in jobs.

* 'Predict' churn rate for (prospective) employee.

With respect to prospective employees in particular this seems likely to
entail lots of risks. Average job time delta is going to be a massively
overdetermined variable, and noisy wrt 'next job delta'. I'm worried how
they're going to sell that to employers.

------
mminer237
For anyone interested more in the law in the case without reading all 30+
pages of the opinion yourself, I wrote a brief for it last month when this
ruling was made:
[https://matthewminer.name/law/briefs/Miscellaneous/hiQ+Labs+...](https://matthewminer.name/law/briefs/Miscellaneous/hiQ+Labs+v.+LinkedIn+Corp).

------
spider-mario
> “And as to the publicly available profiles, the users quite evidently intend
> them to be accessed by others”

How is it evident that the users intend them to be accessed by scrapers and
not just humans? Since the ToS forbid scraping, it seems very reasonable to me
to imagine users making their profiles public _because of that assumption that
scraping is not tolerated_.

------
alkonaut
What is the limit for what is "user provided"? My entire facebook profile,
including my social graph is "user provided".

Does this mean that it would likely be possible for a competing network to
have a "click here to import your friend list" for example?

------
brushfoot
This is great news. The data is public; it shouldn't matter whether you hire
humans to parse it or develop a bot. LinkedIn was trying to have its cake and
eat it too.

------
Causality1
Would it really be that difficult for LinkedIn to requires users to be logged
in before viewing profiles and include anti-automation rules in the EULA?

------
donohoe
In case its not clear, this is from September.

------
mherdeg
Hmm, how does this compare versus the Craigslist/3Taps/Radpad litigation? Are
these similar issues?

------
EGreg
It sounded like this was going to be an opinion piece about how LinkedIn is
losing its appeal to users.

------
atombender
Anyone versed in U.S. law who can comment on whether the judgement in this
case sets a precedent?

~~~
gnicholas
Yes, in the 9th Circuit (western US) this is binding precedent. Elsewhere it
can be cited but is not binding.

------
Barrin92
As expected a lot of people here talking about public data and whatnot, but
that is a horrible decision.

 _" Circuit Judge Marsha Berzon said hiQ, which makes software to help
employers determine whether employees will stay or quit, showed it faced
irreparable harm absent an injunction because it might go out of business
without access.[...]

“LinkedIn has no protected property interest in the data contributed by its
users, as the users retain ownership over their profiles,” Berzon wrote. “And
as to the publicly available profiles, the users quite evidently intend them
to be accessed by others,” including prospective employers."_

This isn't some sort of empowerment of the public, it's surveillance
capitalism. No end-user in their right mind publishes data on LinkedIn with
the expectation that the information is bought up by a third party, analysed,
and then sold back to your employer in a way that exposes your personal intent
and may even threaten your job. The only thing this accomplishes is enabling
shady business models that feed of a sort of internet voyeurism, and at the
end of the day it'll lead to people turning their profiles private and making
LinkedIn more difficult to use if you're someone who is looking for
information in good faith.

~~~
jakeogh
Your argument is to let corporations effectively make law.

~~~
perl4ever
Corporations do effectively make law, at least in the US. Politicians have
neither the time nor the expertise. There have been some widely read articles
about how sometimes that law is not even freely available to the public.

~~~
jakeogh
Sounds like we agree that's a bad thing. To your first sentence, no, they
propose laws. They dont get to revise their ToS and have it be a violation of
the law when you ignore it. That would be like the EPA making a rule, because
they were granted that power by congress.

~~~
perl4ever
My impression is that effectively, they do. Officially, the laws have to be
approved, just like you have to click "ok" when you see a user agreement, but
it doesn't mean you have any effective control. Control on paper doesn't mean
control in reality, just like accounting is different from economics.

It's possible the current system is "the worst, except for all the others". I
don't think you can do without expertise in making policy, but you also can't
do without good faith/intent, so I don't know how you can resolve that.

~~~
jakeogh
If it was really that easy then we would have SOPA, ACTA, CISPA, PIPA, TPP and
about 20 other bad ideas put on paper.

All of those failed, if they had not, MegaCorp would have significantly more
power. Heck TPP had ways for a foreign corporation to sue a local government
if they didnt like them banning fracking (for example).

Same thing with NN, if it was named honestly it would be called "The More
Government Regulation of The Internet Act of 2019". Next up will be some sad
attempt at a US GDPR, but fortunatly our beautiful 1st Amendment throws a
wrench in that, it's effictively the gov telling people (corps are made of
people:) what they can and can not remember.

But in general, I agree with your sentiment, and if it's more than half a page
long (written in crayon) it shouldnt even be considered.

------
onetimemanytime
>> _that required LinkedIn, a Microsoft Corp unit with more than 645 million
members, to give hiQ Labs Inc access to publicly available member profiles._

Not sure this is a win for the web. Sure it's user submitted but the users
agreed that Linked in owns that after they submit.

------
rgross1
Are there any useful bots for scraping LI profile out there?

------
buboard
OK how does is that going to work for Facebook?

------
NKosmatos
This whole situation with public data, personal information, data scrapping,
GDPR and us putting our own info on various sites displaying them publicly and
then complaining if someone collects them and uses them, has gotten out of
hand :-( I think I’ll have to side with hiQ on this.

------
pkilgore
> September 9, 2019 / 1:34 PM / a month ago

