Hacker News new | past | comments | ask | show | jobs | submit login
LinkedIn loses appeal over access to user profiles (reuters.com)
574 points by isalmon on Oct 13, 2019 | hide | past | favorite | 163 comments

The summary here is that LinkedIn tried to argue that it could prevent scraping of public LinkedIn profile data under their ToS, but the courts have ruled that if data is public and provided by users, it can be scraped/crawled, that is, it isn’t LinkedIn property. This is generally a positive outcome for people/companies turning web text and HTML into structured data, e.g. tools like Puppeteer and Scrapy can be used more freely on sites like LinkedIn, Twitter, and Reddit. Now, you might still get into trouble if you re-publish that data, but you can, at least, safely use the data ”internally”, and the act of scraping/crawling (politely) is not, per se, something unlawful.

Not sure "isn't LinkedIn property" is accurate here. They still retain ownership and control of redistribution just like any other IP. This is more of a philosophical question about whether "viewing" itself is a violation of their ownership rights and really about the definitions of "viewing" and "public" in the context of the internet.

Seems like they've simply determined that viewing any freely accessible URL is "public" and that "viewing" does include scraping. This seems like a very reasonable determination as it maps pretty neatly to how we think about viewing public content IRL where I am free to drive down the road (for profit or pleasure) and record publicly viewable signage and activities and use that data any way I see fit.

> Not sure "isn't LinkedIn property" is accurate here.

It is very accurate. Users retain the copyright on their works in so far as their works are able to be copyrighted. Anything that is a "mere fact", and can't be copyrighted, is also not LinkedIn's property.

From LinkedIn's terms of service[1]:

> you are only granting LinkedIn and our affiliates the following non-exclusive license:

> A worldwide, transferable and sublicensable right to use, copy, modify, distribute, publish, and process, information and content that you provide through our Services and the services of others, without any further consent, notice and/or compensation to you or others.


1. https://www.linkedin.com/legal/user-agreement#rights

If I enter my employment record and my profile pic, birthdate, etc, I don't think that is the ip of linkedin. Maybe the way they display it or if they are transforming it in some way it could be considered ip. But if someone scrapes all that user entered data and then displays it somewhere else in a different format, I can't imagine LinkedIn being able to claim their ip has been infringed.

I think this all of this should be the user's choice since every company should put the user at the center of these decisions. If I want my data to be shared in any way I can simply tick a box and allow that. If I don't then keep it just for me and the people I chose to share it on that platform.

It should also be made clear to the users if that data is being used as payment for the services provided by mentioning explicitly and in a detailed way where that data goes.

I think (hope?) that's what this decision did. LinkedIn must allow scraping publicly available data, but not private data that a third party wouldn't have access to normally.

Maybe it's more accurate to say "any publicly linked URL"? IIRC, charges have been successfully brought against people for e.g. iterating through user identifiers in URLs to gain access to other users' data. (Do correct me if I'm wrong on that count!)

Andrew Auernheimer, more commonly known as weev, got all of AT&T's ipad users' email addresses at that time, by enumerating all the possible sim-card IDs, against a public facing ATT website. He was charged and convicted the Computer Fraud and Abuse Act (CFAA), and sentenced to 41 months in federal prison that. His sentence was vacated after 13 months due to a technicality of the venue; that judge did not address the substantive question on the legality of the site access.

Weev may be an odious person, but everyone has rights in a court of law, even white supremacists.

> His sentence was vacated after 13 months due to a technicality of the venue; that judge did not address the substantive question on the legality of the site access

So the way the American legal system works is:

  if(venue == correct && facts == bad) {
  } else {

If the venue is not correct, the facts of the case are not evaluated. If you go read some lawsuits, you'll see that the first page or two is an argument about why the judge reading it is the correct judge to read it.

Generally, that is the way it works, but it is foolish to try and understand the legal system like it's software. If the venue is incorrect, the judge may more or less tell them to get lost. That's not the same as "not guilty". A lot of rules are adhered to to make sure that courts don't get gummed up with meaningless cases and to make sure that judges with the appropriate authority handle the appropriate cases.

You are right; I wanted to give a general idea. And if you've ever written software for Itanium, you'd know that relying on evaluation rules in an if statement is a dangerous thing to do!

> If the venue is not correct, the facts of the case are not evaluated.

More precisely, the facts of the case are not evaluated by that court. Usually the case will be transferred to a different venue (i.e., federal court in a different district) or dismissed and refiled in a different forum (e.g., state court instead of federal court).

In Mr. Auernheimer's case, had he been successful in his improper venue motion, he probably would have faced prosecution in either his home district or the district where the AT&T servers were located. The result of that trial might have been the same, but there wouldn't have been a vacatur.

Some kid was charged for that but in my opinion it was stupid. URL to me means part of the UX. If you search on Google using a query parameter directly instead of entering the query in their search box, should that count as wrongful use?

Stupid or not, that's a matter for the lawmakers. What I'm saying is that, as far as I know, a ruling that any publicly accessible URL is fair game would contradict previous rulings.

Now, this is based on my very patchy memory of sensationalist reporting of legal matters in a jurisdiction I don't reside in, so there's probably some wiggle room there ;)

No, it should not. But what if you try some SQL injection to do something nasty?

The modern law system distinguishes between result and intent.

If I guess your password in the password form input, should that count as wrongful use?

If I rifle through your personal papers because your door was open, should that count as wrongful use?

I think that's fine, but I also think the end-user should decide. With Google (edit: I meant Facebook) I'm able to determine whether or not I want to show up in search results. This shouldn't be an absolute is or isn't public situation.

LinkedIn already allows discreet control over your profile's public visibility along with the ability to micro-manage some of it, the URL you're looking for: https://www.linkedin.com/public-profile/settings

You can decide to not use linked in, and use a service that does not make profiles public.

"If your apple looks a little banged up, eat an orange"

Even better, the decision here is only concerning profiles of people who have elected to make that profile public. It's very simple to make your LinkedIn profile private.

The concern I had is that the court forces LinkedIn profiles public regardless of user settings. Courts sometimes go a little further. I'm sure LinkedIn will do their best to not allow private profiles to be crawled.

The challenge for LinkedIn is that they still want google to crawl them.

This is about the copyright on the items that people post, i.e. creative works, right? But what if LinkedIn collects facts (where you work, your age, etc.), wouldn't that be covered by sui generis property right (better known as database copyright)?

Does this judgement say anything about that, i.e. whether it matters that users contributed the facts in their collection (so I'm not talking about posts, descriptions, etc.) rather than that they collected it themselves and therefore get a form of property right?

Edit: wait, database copyright is not a thing in the USA. Of course they wouldn't say anything about that.


> But what if LinkedIn collects facts (where you work, your age, etc.), wouldn't that be covered by sui generis property right (better known as database copyright)?

I don't think so.

> Under the Copyright Act, a compilation is defined as a "collection and assembling of preexisting materials or of data that are selected in such a way that the resulting work as a whole constitutes an original work of authorship." 17. U.S.C. § 101 [1]

The thing is, LinkedIn is not authoring the compilation. The individual users are.


1. https://www.bitlaw.com/copyright/database.html

LinkedIn may be the author of the compilation because they curate the database by removing fake profiles and encouraging users to complete their profiles. Also, the graph of connections between profiles may constitute a non-trivial organization method which takes the database out of the trivially-organized databases which were held uncopyrightable in the past. (e.g., Feist v. Rural[0])

In any case, this decision was mostly about upholding the lower court's granting of an order preventing LinkedIn from blocking hiQ's scrapers for the duration of the lawsuit. HiQ could still lose on the copyright questions or other issues.

[0] https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....

My understanding is that the contract (TOS) portion is not decided. This decision stated that Linkedin does not have a protected property interest in the profiles, so it can not claim copyright there. It's possible they could claim things like compilation copyright; that's is as yet undecided. Also, the appears court only dealt with the CFAA issue I believe; there's still the contract (TOS) to consider, as well as a possible trespass claim.

Now, the CFAA was the only criminal statue involved, so I guess that supports what you said, that scraping is not unlawful. There still may be liability though, and using the data only internally would not necessarily protect from that. It remains to be seen.

"it can be scraped/crawled, that is, it isn’t LinkedIn property"

I thought it was pretty established that putting something on a website didn't eliminate your copyright. Has that changed now?

To me, it seems like common sense would be that if you make a public website, you are implicitly permitting some copies, but surely it's not all or nothing?

Facts and tables are not copyrightable. The phone numbers in a phone book are not copyrightable, merely their presentation order[0]. If you were to copy, say, the linkedin website, or the linkedin branding, or the name linkedin, or any of their ads, those would be eligible, but the simple collection of names, emails, and phone numbers is ineligible for copyright.

0: https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....

This depends on jurisdiction though. In the European Union specifically there exists sui generis legislation that grants certain rights to the assembler of a database [1]. However, it’s a more interesting situation when the database keeper just provides a means for individuals to fill in their own data.

[1] https://en.wikipedia.org/wiki/Database_right

> I thought it was pretty established that putting something on a website didn't eliminate your copyright. Has that changed now?

No, if anything, that supports the decision.

To the extent that the material is copyrightable, it belongs to the users, who have chosen to make it public; copying incidental and necessary to that access is allowed under an implied license doctrine. Microsoft's efforts to restrict access had nothing to do with copyright, but ToS.

Perhaps it depends on intent. Clearly, the creators of the content, and those who posted the content, did so for the sole intention of making it public and usable outside the LinkedIn system. Their posting of it on LinkedIn is incidental; what site is used or who owns it is largely irrelevant to them, whereas such things clearly do matter to any company or person creating and posting their own unique content to their own site.

At one point, to fight scraping, Craigslist changed their terms so that users assigned them copyright of listings rather than just a license. It didn't work well for them, but it's an interesting approach.


My understanding is that Facebook uses similar clauses to disallow web scraping. Does that mean Facebook is fair game too?

I'm pretty sure you would get a big GDPR fine if you start taking data people agreed to put on Linked-in without their express permission.

This is fantastic. I would like to see wider legislation allowing scraping of IMDB, Genius, Reddit, Facebook, and Google made legal. These services receive free input from users. The data should remain free.

Edit (sort of off topic): There's still value in the building and providing services at scale, but this lowers the barrier to cross the moat for small players. The first step is data liberation. Then we can work to bring down the other cost barriers. It's a lot easier to build services that scale in 2019 than it was in 2005.

The semantic web was misguided in 200X, but we might want to take another swing at it in the future.

If you add .json to the end of a Reddit URL, it will return JSON data. For example: https://www.reddit.com/r/ubuntu.json . It also works with comment threads and posts.

Wonderful feature also used by Trello https://trello.com/b/rq2mYJNn/public-trello-boards.json

Now that is an ergonomic API.

xml and rss seem to be the same exact output

I have adopted this in other projects and added the functionality there as well; it is a brilliant idea.

Yeah no need to scrape Reddit, their content is accessible via their API.

PRAW is also a great python reddit "scraper" that allows you to pull data via their API very easily.

Another side of this is that the entity doing the scraping is more often than not another company. Which means that if your proposal is implemented, a user can voluntarily give their personal data to Google/Reddit/Facebook etc but that company then has to make the user's personal data available to another company.

It's not quite like that. The first company cannot prevent scraping by individuals or another company of information that it already shows to everyone. Which, to me, is a good thing. My 2c.

Eh. I want my picture and name uploaded to LinkedIn, since it's a professional network and people use it to find me for good reasons. It may seem dumb, however not having a LinkedIn with a good picture can genuinely hurt your career.

I do NOT want my picture run through facial recognition software, or my name/email sold to marketers who will add it to a drip campaign.

Then don't make the data public. You can't have the cake and eat it too. Scraping is irrelevant here - a human can just as well take your picture from your LinkedIn page and include it in their face-recognition DB.

No they can't, not legally.

How's that? Obviously they can't keep the photo. But I don't see what would stop them from "viewing" the publicly available photo and saving markers that let them recognize the face again. After all, that's what any person does when they look at a photo.

The huge difference is having a human do it at scale is cost prohibitive.

PIIs and biometrics are special. So if I upload my photo to LinkedIn, I want it to be available when viewing my LinkedIn profile, but I expect that any other entity that scraps it off LinkedIn can't process it without my explicit consent (thanks to GDPR). Similarly with other data that's about me, a person.

But all other data, I'd argue, should be fair play. If an e-commerce sites publishes a list of products and prices, I believe it's desirable for other parties to be able to scrap it and process it, e.g. for offering a price comparison service.

Exactly. I don't like or "enjoy" LinkedIn but I do find it useful professionally.

Now it sounds like this ruling implies that by creating a profile on one platform, I have to accept that every company that comes along can include me in their corpus.

Maybe I should be able to set a pass-through GDPR flag on my profile such that third parties (subject to that regulation) will have to exclude me from their datasets.

So Google would need to allow scraping of search results? That would be a huge change, they currently prevent that pretty aggressively.

This is a problem because you're talking personal data, not because of scrapping. Personal / personally identifiable data is special and special protections apply to it. But regular data would fare just fine under GP's proposal.

It only applies to data displayed publicly, though. If Facebook and such started requiring logins to see personal data would that be such a bad thing?

I'm not certain, but it kind of sounds like even things behind a login are still scrapable. Assuming the general public can get a login easily anyway. Basically, just requiring an account is not enough to forbid scraping.

When you talk about data here these are people. This HiQ software is actually a bit scary. What if it gives a false signal which ends in an employees termination? Data on people should not be freely attainable, the person should give explicit access. If I don’t want HiQ processing my information (I don’t) then they shouldn’t be able to. Especially now with some employers requiring a LinkedIn profile.

Reddit has a decent API

The golden rule is to use the API before you start raw scraping.

for IMDb, they have a lot of data that is easily accessible, not sure what is missing though: https://datasets.imdbws.com/...

Only for personal and non-commercial use, which is probably not what startups need.

process it on your personal computer and use the output in your startup

That intermediate step doesn't get around the license.

it depends

Why would they have to make it available to startups in an easily accessible manner?

IMDB was originally crowdsourced, wasn't it?

So, have the data available. But I see no reason why they should have to go out of their way to change their site to make it easy for someone to get it.

It started as lists and shell scripts to query them in the newsgroup rec.arts.movies.

It's already legal. Adding law adds restrictions.

You're not wrong. In the general case though, adding law can mandate that a certain already occurring activity must be done.

What gives you a rightful claim to information that I gave to someone else, if neither I nor they consent?

You did consent. This is about information you gave to LinkedIn and told them to give to the general public.

I'm torn. On the one hand, scraping helps break down walled gardens. On the other, we're talking about personal details being used in novel ways that no LinkedIn user probably understands. I doubt any LinkedIn user writes their profile expecting HiQ to scrape it, assign a "flight risk" score and alert your bosses.

User privacy shouldn't be dependent on draconian anti-scraping laws.

Besides, LinkedIn is already sharing every last bit of their users' information with the highest bidder.

To me this is conceptually the same problem as DRM - with your position similar to those trying to build DRM systems.

One can’t both hand over data freely to a service (in this case Linkedin) and also subsequently prevent all sharing of that data. Or to put it another way, you can’t both put your information on a public billboard hoping a recruiter sees it to offer you a job AND keep it strangers private from people you hope won’t misuse it.

The users agreed to publish their details publicly on LinkedIn. It’s normal that anyone can access those details and use them however they like.

There is a broader ethical discussion of how to treat data, nominally public, that is increasingly collected, persisted and analyzed indefinitely by adversarial agents. It seems clear to me that a more nuanced categorization is required. This data is not public in the same sense as an uttered word was in a town square a hundred years ago.

Imagine being denied job opportunities because some company has analyzed the careers of the last 25 generations of your ancestors and deemed your lineage to be inadequate?

You mean maybe for your LinkedIn general profile info to be public, but not for the old profile data or the metadata of when your changed what to be public. It's not per se secret or hidden but it's also not intended to be kept and processed.

This is also a clear case where GDPR would come in. This is personal data, whether intentional or not, and the scraper is obliged to conform to EU laws if they scrape data on EU citizens - including eg information rights and deletion.

I don't agree to republish them. Just because they are publicly accessible on SiteA doesn't mean I have agreed to have them be published on SiteB does it?

That isn't what's really being ruled on here, though. Republishing the data is still restricted via copyright, but the data may exist in an internal database that SiteB uses to do X.

The question at hand is whether or not SiteB (or more appropriately CompanyB) is able to automatically download the content on SiteA or have to make an intern manually copy and paste the data into a spreadsheet.

There are privacy settings though, and recruiters used to be able to see more than other non-contacts. I'm not sure if that is still the case, but so what is being shared is not always exactly obvious to the user.

What if there could be a robot.txt kind of setting that users could use to prevent being scraped?

Pointless unless you want a one world governement to enforce a law monoculture.

I'd personally call HiQ's business model bottom feeding.

However restricting access to public information on the internet will benefit only the established titans. So this ruling is great news.

Users know their information is public and they have the option to make it private on Linkedin. If Linkedin is worried about the privacy of their users they should let them know about the risks of having a public profile.

The consequences of information leakage can be zero for an indefinite amount of time before something surfaces with catastrophic effect. Everybody knows this, because everybody sees it happen constantly, even though it is relatively unlikely it will happen to you, today.

Human beings are hardwired to do things that they see others doing unless there is a really clear connection between the actions and disaster. There has to be another mechanism to deal with probably small, but unquantifiable risks.

Same here. On one hand, this lessens the monopoly power of large tech companies, on the other hand, it gives users less control over their data.

IMO if you set up a profile on LinkedIn there's a pretty clear expectation that your bosses will be able to see it.

Doing so in Europe is a clear GDPR violation.

I think that's a reasonable balance - you can scrape data, but not personal data without consent of the scraped person.

I'd say it's not only reasonable, it's also "carving nature at its joints". The reason scrapping personal data is problematic is because it's personal data, not because it's being scrapped - so protection should be applied from the direction of personal data regulations.

I recently learned from a recruiter that one license for one recruiter for LinkedIn is $10k a year, so that is what they are protecting.

I’m a very active user of LinkedIn, effectively cultivating my “professional brand” on it. I’ve been contracting for years and use my network to find gigs. While I don’t have an issue with the business that HiQ are in (informing businesses of employee flight risk), I do believe there’s a qualitative difference between data that I publish for consumption by human eyeballs for free (a use of my data that I’ve authorised), and someone harvesting such data and en-mass for commercial purposes that I have not authorised. HiQ have not asked for my permission to use my data, they have not made any commitments about how they will use and not use my data. Given that they have access to my contact details (even via LI itself), they are capable of contacting me to request permission to use my data.

The difference isn't as clear as you are making it out to be.

If you have a public LinkedIn profile, should an employer be able to look at it without your explicit consent and reach out to you for job opportunities (or disqualify you from one)?

Should the employer be able to pay someone else (say a recruiting agency) to look at LinkedIn profiles on their behalf?

Should the recruiting agency be able to use automated tools (which scrape public profiles) that make things easier for them?

What HiQ did was scrape public data, so if you have your LI profile set to public, then anyone can access it and do what they will with it, just like if you posted a print out of it on a bulletin board in a mall. It's in the open and is free game for whtever. You can make your entire profile or just aspects of it private, meaning people need to login to LI to see your stuff, which then protects you under the TOS.

I think profiles were default public so you could be found on Google and for SEO purposes for both you and LI.

You'd be hard pressed to find a public profile accessible anymore on LI anyway, even with public settings, you'll hit an authwall 9 out of 10 times.

I understand what HiQ have done. I'm saying I believe there's a material difference between public data for consumption by individual human beings, and systematic commercial harvesting. I appreciate that in the US, there may be no legal distinction between types of consumption of public data. Public data is public. However, I'm arguing that any commercial use or of my data beyond fair-use, should require my permission and an explanation of how my data will be stored and treated, so that I can be assured that my rights (over further unauthorised use) are preserved.

EDIT: It occurs to me that HiQ's success over LinkedIn does not necessarily imply they would be successful against actual LI users in a GDPR-like jurisdiction. Also, what if LI turned around and allowed each user to specify a style of CC license under which their specific data is published (by LI on behalf of the user). If I specified a non-commercial license variant, would that disallow HiQ's actions (without seeking permission)?

I can't say I know much about licensing and/or GDPR stuff, all I know is that if it's public, I don't have to agree to anything and I can do whatever I want, which is great for me and my business. From the other side, yes it sucks that people can take my stuff and profit from me and there is nothing I can do about it and no way to enforce it and I probably don't even know it's happening. (sounds like ad tracking!)

The way things work in North America at least to my understanding is that it doesn't matter what license you use, I don't have to agree to it to scrape it and use it if there is no click wrapper. I guess if you caught me explicitly using it in a certain way, I could get in trouble, but that is not easy. What you propose sounds reasonable, but I don't know how it would be enforced or if it would still stop people. I'm owed 30k in consulting wages and I can't even make it worth my while to pursue that from a legal standpoint, let alone try and sue some unknown and/or potentially massive company or scattered random ghosts across the interwebs.

LinkedIn has played a very poor strategy here. The value of the service should be in the network, which is quite defensible. Instead, they’ve made the value in the profiles, which is not defensible. Few people curate their network on LinkedIn because you can't see profiles unless you are closely connected, so you are incentivized to add as many people as possible, thus devaluing the entire network. Then they go and sell unlimited access to profiles to recruiters and sales people. Thus, when other services come around and scrape their data, which LinkedIn needs to make somewhat publicly available for SEO juice, it becomes an existential threat.

If you look at Facebook, there is some limited profile data publicly available, but they will go to the wall to prevent people from seeing how those people are connected. In addition, they started from a very walled-off position, so they didn't become reliant on SEO traffic.


This seems to mean LinkedIn can't sue to prevent scraping.

I assume it's still legal for them to implement technological anti-scraping measures? So the two companies can play cat-and-mouse if they wish with rate-limiting, IP addresses, etc...

An earlier ruling actually ordered LinkedIn to stop attempting to block the scraping using technological measures, too.

I believe that LinkedIn were enjoined from using measures to limit HiQ specifically from scraping their site, not from general authorization-based measures that might have the corollary effect of limiting access by HiQ. The idea is that LinkedIn can't make the data public and freely accessible and then turn around and say that the data isn't public if you're a potential competitor who is using an automated tool to access it in bulk.

that sounds bizarre. why would they order them to do that? what if they re trying to block spammers or sth. What about pages that users want public, but not indexable, e.g. dropbox shared links

Has robots.txt been outlawed now? In what jurisdiction exactly?

I don't believe that robots.txt has ever had the backing of law.

Robots.txt doesn't restrict anything. It's just a request of what should and shouldn't be scraped by search engine spiders.

Not too hard to surpass those with things like residential proxies, randomised user agents, headless browsers, etc. Bring on the anti scraping measures...

> This seems to mean LinkedIn can't sue to prevent scraping.

Zillow and similar companies have shut down numerous startups which relied on scraping their data.

How is this different?

LinkedIn data is provided freely by its users. MLS, on the other hand (which Zillow and all other such sites/agents get their data from), is a private database.

This blog post is an excellent summary, and covers what was actually decided and what is still unknown: https://blog.ericgoldman.org/archives/2019/09/ninth-circuit-...

What cracks me up about this is how these massive companies go to such lengths to call themselves mere platforms in order to avoid liability for content, and then when someone actually takes the content in this case they cry, "Foul! That's ours!" Can't have it both ways.

Linkedin tried to argue that if they put data behind a login wall, then it no longer falls under the wide umbrella of "public data" and so it's "theirs". Previous cases already established that if a crawler can see the data without any session cookies then its okay. This ruling extended that to any data that can reasonably be accessed by any member of the public.

There will probably be more cases like this as the upper bound of what "public data" means; At what point does publicly aggregated data stop being public data? And do attempts that companies make to prevent that data from being captured (ip limiting, captchas, login walls) count as immoral/illegal, since they are restricting the public from accessing a public good?

> Previous cases already established that if a crawler can see the data without any session cookies then its okay.

I'm interested in this, but I'm not sure how to learn more - can you give me a hint?

Do you mean crawling without cookies or the legal case?

What's scarier is when they editorialize their platforms (also read censorship), therefore becoming content producers themselves. Today it's whoever you disagree with being censored, tomorrow it's your own voice.

Who's responsible for privacy then? That's another situation where you can't have it both ways - can't tell the platform they don't own the data and simultaneously hold them to GDPR.

I do not like a hide and seek game with who viewed your profile functionality: upgrade to a paid subscription to see who viewed, upgrade to another tier to hide that you looked at someone.

It looks like the lack of imagination or business prowess to come up with more advanced, valuable, and less annoying ways for monetisation. If only they could make it easier to connect people with matching mutual interests, more flexible than plain traditional job board and the database of CVs.

You don’t have to pay to hide that you viewed someone’s profile. Maybe if you want to see who viewed yours, but also keep your browsing private — but it seems more reasonable to charge for that sort of functionality.

FYI, this article is from a month ago and this general story was discussed here at the time (linking the official announcement):


After finding this https://github.com/Greenwolf/social_mapper, I strongly recommend against having a profile photo on linkedin. It has caused me to be far more careful about my presence on the internet.

In the post privacy age I don't want my personal opinions to come back and haunt me. I grow as a person but the internet remembers all. If I make a dumb mistake and it's published online that's not a problem for me in 10 years if that fades away. But people are collecting and correlating info now. I don't like it one bit. It means someone you've never met, in a country you've never been to could extort you. It's getting very scary.

You can make it so your picture on LinkedIn is only viewable by people who are connected with you. I do agree that people should be cautious about what they post/share online however.

I think also what most people don't realize is that linkedin's current model makes it difficult to access someone's profile without them knowing (if they pay for it and have the option on their account) to see who is looking at their profile. As such the user wanting to look at a person's profile has no privacy that they have done so. There could be many reasons someone looks at someone else's profile (even just some kind of curiosity or mistake) so this to me is an issue in itself.

Sure there are ways around this (you can make up a fake profile and some info is public but normally what I run into is a request to login to linkedin to view something that I am interested in).

There is a setting that lets you see other people’s profile without them being notified. You can do it with free accounts. But you also can’t see who viewed your profile.

If you pay, you can keep your viewing private while seeing other people’s profile.

But if they pay can't they override that or are you saying that even a paid account on linkedin can't see if you looked at their profile if you (on a free account) have said 'don't allow anyone to see'?

That’s what LinkedIn says. So I hope that’s the case.

Personally, this doesn't bother me too much. I use LinkedIn specifically because it is public. I'm an "open kimono" type of person. Not particularly interested in hiding stuff.

However, the general principle of "Data Scraping as a Business Model" bothers me. This is by no means the only company that does it (I suspect that MS does it with their access to LinkedIn).

There are far more egregious instances, and many of them have ways to get users to voluntarily cede information (can you think of a rather obvious example?).

LinkedIn is a sandwich board. It's meant to be a public showcase. If you want private, I suspect there are much more focused (and probably valuable) venues that cater to particular communities.

> Not particularly interested in hiding stuff.

Well, so the company, HiQ, is basically scraping every time you update your linked in, to tell your employer you might be about to leave.

Now maybe that's cool with you. But it seems super sketchy to me, and one reason I deleted my linkedIn altogether.

You made the correct decision.

It is not "cool" with me. It just means that it isn't a factor for me. I would not have used LI for a job search in any obvious way.

If I keep a fairly current and active generic profile, then LI is useful, and no one needs to know whether or not I'm looking.

What if LinkedIn adds a visibility option in addition to public/private profile that says "I want LinkedIn to prevent robots from scraping my profile."? What if LinkedIn enables that mode by default? Can they then continue preventing scrapers?

I think they can, but they won't because robots includes search engines and blacklisting search engines from user profiles will very negatively impact their metrics.

You're implying they can't discriminate. So does this case make robots.txt illegal?

First off, robots.txt is optional. It's neither a technical nor a legal limitation at this point.

Second, OP's argument suggests a UI option that gives or removes user consent from all robots in general. Unless they plan to word it: "allow robots that we like that are good for us. but disallow other robots", I don't think it's okay to discriminate by either allowing/banning a particular robot, as that is not what the user agreed to.

Detailed discussion from September when the decision was made.


IP aside, anyone else concerned about the business of HiQ?

I presume what they are doing is:

* Scrape profiles.

* Calculate time delta in jobs.

* 'Predict' churn rate for (prospective) employee.

With respect to prospective employees in particular this seems likely to entail lots of risks. Average job time delta is going to be a massively overdetermined variable, and noisy wrt 'next job delta'. I'm worried how they're going to sell that to employers.

For anyone interested more in the law in the case without reading all 30+ pages of the opinion yourself, I wrote a brief for it last month when this ruling was made: https://matthewminer.name/law/briefs/Miscellaneous/hiQ+Labs+....

> “And as to the publicly available profiles, the users quite evidently intend them to be accessed by others”

How is it evident that the users intend them to be accessed by scrapers and not just humans? Since the ToS forbid scraping, it seems very reasonable to me to imagine users making their profiles public because of that assumption that scraping is not tolerated.

What is the limit for what is "user provided"? My entire facebook profile, including my social graph is "user provided".

Does this mean that it would likely be possible for a competing network to have a "click here to import your friend list" for example?

This is great news. The data is public; it shouldn't matter whether you hire humans to parse it or develop a bot. LinkedIn was trying to have its cake and eat it too.

Would it really be that difficult for LinkedIn to requires users to be logged in before viewing profiles and include anti-automation rules in the EULA?

In case its not clear, this is from September.

Hmm, how does this compare versus the Craigslist/3Taps/Radpad litigation? Are these similar issues?

It sounded like this was going to be an opinion piece about how LinkedIn is losing its appeal to users.

Anyone versed in U.S. law who can comment on whether the judgement in this case sets a precedent?

Yes, in the 9th Circuit (western US) this is binding precedent. Elsewhere it can be cited but is not binding.

Technically not. This was just a preliminary injunction. The case itself still has to be decided. But assuming this was indicative of how the court will rule, it will then be binding precedent in the Ninth Circuit.

As expected a lot of people here talking about public data and whatnot, but that is a horrible decision.

"Circuit Judge Marsha Berzon said hiQ, which makes software to help employers determine whether employees will stay or quit, showed it faced irreparable harm absent an injunction because it might go out of business without access.[...]

“LinkedIn has no protected property interest in the data contributed by its users, as the users retain ownership over their profiles,” Berzon wrote. “And as to the publicly available profiles, the users quite evidently intend them to be accessed by others,” including prospective employers."

This isn't some sort of empowerment of the public, it's surveillance capitalism. No end-user in their right mind publishes data on LinkedIn with the expectation that the information is bought up by a third party, analysed, and then sold back to your employer in a way that exposes your personal intent and may even threaten your job. The only thing this accomplishes is enabling shady business models that feed of a sort of internet voyeurism, and at the end of the day it'll lead to people turning their profiles private and making LinkedIn more difficult to use if you're someone who is looking for information in good faith.

> No end-user in their right mind publishes data on LinkedIn with the expectation that the information is bought up by a third party, analysed, and then sold back to your employer in a way that exposes your personal intent and may even threaten your job.

Yes they do. Do you think people who are afraid of their employer finding out about something would show it on their public LinkedIn profile in the first place? If a manager or colleague who they've likely already "connected" with simply opens your LinkedIn profile in their web browser and sees the same info that hiQ sees, then it's game over. If you don't want your employer to know, don't publish it on your public profile. It's absurd to suggest that some minimal manual effort to load a few profiles is a serious privacy defense.

Your argument is to let corporations effectively make law.

Corporations do effectively make law, at least in the US. Politicians have neither the time nor the expertise. There have been some widely read articles about how sometimes that law is not even freely available to the public.

Sounds like we agree that's a bad thing. To your first sentence, no, they propose laws. They dont get to revise their ToS and have it be a violation of the law when you ignore it. That would be like the EPA making a rule, because they were granted that power by congress.

My impression is that effectively, they do. Officially, the laws have to be approved, just like you have to click "ok" when you see a user agreement, but it doesn't mean you have any effective control. Control on paper doesn't mean control in reality, just like accounting is different from economics.

It's possible the current system is "the worst, except for all the others". I don't think you can do without expertise in making policy, but you also can't do without good faith/intent, so I don't know how you can resolve that.

If it was really that easy then we would have SOPA, ACTA, CISPA, PIPA, TPP and about 20 other bad ideas put on paper.

All of those failed, if they had not, MegaCorp would have significantly more power. Heck TPP had ways for a foreign corporation to sue a local government if they didnt like them banning fracking (for example).

Same thing with NN, if it was named honestly it would be called "The More Government Regulation of The Internet Act of 2019". Next up will be some sad attempt at a US GDPR, but fortunatly our beautiful 1st Amendment throws a wrench in that, it's effictively the gov telling people (corps are made of people:) what they can and can not remember.

But in general, I agree with your sentiment, and if it's more than half a page long (written in crayon) it shouldnt even be considered.

how did you get that out of my post? My argument is that people should make laws that ends the business model of companies like hiQ, and that LinkedIn, although obviously acting in self-interest, is legitimately defending its platform here against third parties who are trying to use public information in privacy-violating ways.

By letting ToS have the force of law...

What is a contract if not something backed by the force of law?

There's a big difference between "I want a public profile so colleagues and employers can contact me about opportunities" vs "I want a public profile so some third party I have no knowledge of can enable others to discriminate against me".

Try contacting the police if someone breaches your tos. They will tell you to get a lawyer and not bother them.

You have to sue for breach of contract. It's a civil matter rather than a legal one if breached.

hiQ didnt sign a contract. It's public info. They broke the ToS, but unless the ToS get inserted into a contract, and signed by both parties we arent even in Civil Law terratory, which is what this excellent court decision confirmed.

> No end-user in their right mind publishes data on LinkedIn with the expectation that the information is bought up by a third party, analysed, and then sold back to your employer in a way that exposes your personal intent and may even threaten your job.

Doesn't that get covered by laws such as GDPR (where applicable)? Just because I can scape your profile doesn't mean I can publish it, sell it etc (or even keep it). I can do it with your consent, and LinkedIn can't complain, isn't that it?

GDPR requires affirmative and explicit consent to data-sharing. I am not a lawyer so I'm happy to be corrected by someone who has more regulatory knowledge here, but I am reasonably certain this company would not be allowed to operate this way in Europe.

They would continue to ignore the law. I have seen the forced "consent" buttons countless times.

>>that required LinkedIn, a Microsoft Corp unit with more than 645 million members, to give hiQ Labs Inc access to publicly available member profiles.

Not sure this is a win for the web. Sure it's user submitted but the users agreed that Linked in owns that after they submit.

Are there any useful bots for scraping LI profile out there?

OK how does is that going to work for Facebook?

This whole situation with public data, personal information, data scrapping, GDPR and us putting our own info on various sites displaying them publicly and then complaining if someone collects them and uses them, has gotten out of hand :-( I think I’ll have to side with hiQ on this.

> September 9, 2019 / 1:34 PM / a month ago

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact