Hacker News new | past | comments | ask | show | jobs | submit login
How LinkedIn detects browser extensions (github.com)
382 points by siddg on Jan 8, 2019 | hide | past | web | favorite | 102 comments

The repo says "A look at how LinkedIn spies on its users"

I'm not convinced this is LinkedIn spying on users... rather, it's them protecting its users from the spammy people using these extensions. There's not a single extensions on that list that doesn't result in someone getting an unsolicited email.

Here's the full list; they're all spammy recruiting/sales extensions (nothing legit like uBlock or LastPass):

    colabo extension
    Linkedin-Hubspot Connector
    data Scraper
    Lead Generator
    Email Hunter
    Contact Out
    Linked Helper
    Get Email
    Leonard for Linkedin
    Loxo Social import
    eLink Pro
    LinkMatch for zoho CrM
    LinkMatch for zoho recruit
    inkMatch for CatS
    LinkMatch for PCrecruiter
    LinkMatch for Pipedrive
    LinkMatch for Greenhouse
    Snapaddy Grabber
    Spider for Linkedin
    Sales Lead Multiplier
    Email Finder
    Linkedin assistant Lily
    auto Connect tools Lily
    adapt Prospector
    instant data Scraper
    LinkMe tool
    Lusha (FireFox Extension)
And here's the code you can run yourself: https://pastebin.com/Ux684VtL

> There's not a single extensions on that list that doesn't result in someone getting an unsolicited email.

Nimble is just a CRM. Their extension does not crawl for email addresses, as far as I remember. Why does linkedIn need to "protect its users" from it? Isn't it rather to protect itself against the competition?

iMacros is a legit extension. But yeah, I guess there are recruiters using it to spam people.

Agree. iMacros is a completely fine macro recorder. Similar extensions like Kantu and Selenium IDE are not in the list.

well iMacros looks like another scraping tool, while those others look like testing tools that can be used for scraping.


That's a malware, isn't it?

No idea, but if it is, and this is about protecting users from malicious addons, why did LinkedIn not just report that extension to Google?

> result in someone getting an unsolicited email.

That's pretty ballsy to bring up in a defense of LinkedIn.

Right. It ought to be "... result in someone getting an unsolicited email that didn't earn income for LinkedIn".

Well, another good reason for LinkedIn to thi is to protect their revenues. I heard of headhunters who don’t want to pay their (high) monthly subscription fees and instead “hack” their system. I guess “hacking” includes using these extensions.

Right. I once worked for a company that was trying to perform a business “matchmaking” service. We considered, as part of the implementation possibility-space, scraping people’s connections from LinkedIn in order to enhance our results. But LinkedIn has many advanced anti-scraping heuristics in their backend; this is just one of many. So we scrapped that option (before ever getting around to considering the ethics of it.)

Can you link to where they say that? I would figure someone doing something so helpful for users would at least document it. There's no reason to be surreptitious when doing such a favor. One wonders if they'll start offering a LinkedIn AntiVirus download with such an altruistic approach towards protecting users from what they have installed.

I think you're misunderstanding. LinkedIn isn't protecting the people with the extensions installed; they're protecting users FROM the people with the extensions.

Where they'll happily throw the same people under the bus if the user with the extensions installed is paying for an expensive recruiter license. Curious!

Ah, as an anti-scraper/anti-bot method, every user has all these local network requests made? Maybe it's the true reason, maybe not. Transparency is key here to assume anything more than the worst. Of course any of the rest of us with a modicum of smarts would just side load a custom extension via CLI args (or we'd just browser automate, headless if not detected). Even given the most generous justification, it reeks of careless decision makers playing whack-a-mole (likely fruitlessly) with the users in the crossfire.

I think it's a cat and mouse game. The more that Linkedin publishes about their anti-spam techniques, the more information spammers have to try to evade those anti-spam techniques.

I know one company at least that sets up a proxy physically near their clients use to obscure that they have a team on the Philippines manually assisting clients with their LinkedIn profiles.

Ultimately they need to police actual negative behaviour, not the mechanics of how people are doing it. But that means potentially restricting engagement of some of their most active users as well.

It can seem that way with server-side anti-scraping techniques with brute force detection and the like. But at some point you have to accept that playing the game on the client-side needs to stop escalating once you're making dozens of local extension resource requests in a user's browser. It makes me want to publish and maintain a legit scraper for LinkedIn that replicates human interaction. They'd DMCA the repo I'm sure, but it goes to show who fights against the open web. I see a "get X, Y, and Z features for free when you use LinkedIn Desktop instead of the website" coming.

I see a "get X, Y, and Z features for free when you use LinkedIn Desktop instead of the website" coming.

...and then the scrapers replicate interaction with that app instead. UI automation isn't hard these days.

Why do we accept this argument of obscurity, when discussing security vulnerabilities proper doesn't elicit the same response?

Why is obscurity OK in these situations? Wouldn't we all benefit with removing scammers if everyone legitimate worked together in the public? Its easy to defeat a single adversary. Its mighty hard to defeat a cooperating team.

Security vulnerabilities tend to be pretty binary. Either the vulnerability is there and it's exploitable, or it isn't. And once there's a fix, deploying that fix will permanently solve that problem.

Fighting abuse is different. The abusers are using the same request endpoints that the real users are, but just in a way that the service provider doesn't approve of. (Whether it's sending spam, payment fraud, scraping, or something else). There's no single hole to plug, unless you block all the requests outright which also affects real users. Instead you have to classify the incoming traffic, find out the abusive subset, and then act on it appropriately. But unlike with vulnerabilities this doesn't solve the problem permanently.

The moment attackers find out which signals are used by a site, they can start faking them or working around the signals. As a simple example, early email spam classifiers worked by simple matching against a blacklist of highly spammy terms. So the spam adapted to using creative mis-spellings like "v1agra".

LinkedIn doesn’t publish anything about their anti-spam techniques. This was published by a third party, because it was a client-side feature that third parties could discover. Most of LinkedIn’s anti-bot logic is on the backend and completely opaque.

In my mind I am associating this with LinkedIn's failed attempt to keep scrapers off their website by suing them (ref: https://arstechnica.com/tech-policy/2017/08/court-rejects-li...)

Other companies are making money with these extensions on LinkedIn's website and LinkedIn is not happy about it


I'm failing to see how these extensions "circumvent the privacy of our members", but normal use of the website doesn't. Either you're safeguarding the information properly, or you aren't.

I am fine with huge GDPR fines to teach companies that data is a liability as well as an asset, and needs to be protected appropriately (which this measure doesn't seem to do, since it is trivial to bypass).

I'm not so sure I'm OK with you probing my browser to detect ToS violations/scraping, but not transparently mentioning it makes it worse.

> I'm failing to see how these extensions "circumvent the privacy of our members", but normal use of the website doesn't.

Of course the website does. But the website does it to provide revenue for the website, whereas the extensions probably do it to avoid generating revenue for the website.

LinkedIn is infamous for its dark patterns. They probably do this to protect their revenue model. That in this case it involves protecting the privacy of their users, makes for nice PR.

Though for that matter, their users (which include me) have at least chosen to share their information with LinkedIn. LinkedIn may scam them into sharing more data than they want (which also happened to me), which is absolutely questionable, but at least the users have chosen to do something with LinkedIn, and haven't chosen to do something with the scrapers.

To use rape as an analogy: it's the difference between a guy you wanted to have safe sex with puncturing the condom, and a stranger jumping from the bushes to pull you from your bike. Both are rape, but in very different ways.

Was I trying to defend LinkedIn here? I guess it's different shades of very dark grey.

Exactly it's not to protect their users, to protect their money. They have way too many dark patterns for me to believe they have any good intentions.

If LinkedIn has a 'delete profile' button that works, and extensions let recruiters scrape profiles and thus keep records on deleted users, who do you think is in the wrong?

If they have a "delete profile" button that works, then recruiters shouldn't be able to scrape that. It doesn't really matter, the country I live in has robust anti-spam and privacy laws. When I lived in the US, I was spammed often (CAN-SPAM is pretty weak compared to GDPR and CASL), but you don't see LinkedIn campaigning for better privacy laws in the US.

Apart from campaigning for GDPR-style protections, there are enough other solutions. For example, GitHub has a nice email forwarding feature to preserve user's privacy/email address.

LinkedIn could provide a similar option where when I opt-in, my email address is replaced by an address they provide. They could even go a step further and make it look like a real email address. Any emails to that address are forwarded to my real address, and after the initial forwarding I could reply from my real email address. Meanwhile LinkedIn could do analysis of who is sending what to these email canaries (like they transparently said they would when you opt in), and catch scrapers that way. For users, if I'm getting too much spam, I can simply request a new canary email address from them.

This is what a user-focussed solution would look like. Not some weird semi-legal hack they are doing now.

Still the recruiter can just "Save the page..."

Who's in the wrong - the scraper, obviously. Why would you think LinkedIn would be on the hook - it's not like their "Delete Profile" button can wipe Google's/Internet Archives[1] cache.

1. I haven't read their robots.txt, but nothing I've read in GDPR remotely suggests "Right to be forgotten" features that attempt to erase data internet-wide.

Could you explain what holes these extensions are using to extract the hidden personal data? Or is it that the personal data is already in plain view and these extensions are just collating it?

Probably the latter.

Otherwise LinkedIn would just fix the holes and not bother blocking extensions.

> but I do know the intention is unequivocally to protect the PII of our members

The first item from the block list is "Daxtra", who make a very widely used ATS.

Please could you explain the difference from Microsoft's GDPR perspective from when a recruiter accesses this information via a normal browser and manually enters the data into Daxtra vs when someone with the Daxtra plugin accesses it, and uses that to pull the data over?

What if I just use the same JS in browser console without installing the extension. You can't protect from this unless you make proper changes on backend.

Calling this "nefarious-linkedin" when it's obvious that LinkedIn is trying to protect itself from unauthorized data collection shows that the developer is either seeking for attention or didn't really look into the purpose of those extensions (https://github.com/dandrews/nefarious-linkedin/pull/1)

But how is this data accessible to the extension? I ‘m not an expert, but it seems that this data has to publicly available for an extension to find and parse it. Extensions don’t have magic Auth rights or credentials.

Extensions have the same auth rights as your logged-in account (the ability to see people who are out of network, for example). It’s against LinkedIn’s ToS to scrape data.

This should go both ways. It is against my ToS for LinkedIn to scrape which extensions I have installed.

I'm on the anti-LinkedIn side of this scraping debate.

But that said, LinkedIn never agreed to your ToS.

True, and I accept this is a potentially good legal refutation of this kind of argument. However, I do consider ToS-es untenable and unjust because of this power asymmetry.

If my computing node is interacting with your computing node, we should either both be able to put restrictions on the use of obtainable information or neither.

You an avoid them collecting your data by not visiting their site.

And they can avoid me storing their data by not offering it to me. Both are rather lazy arguments.

This is besides the fact that many sites (LinkedIn included) aren't very upfront about what exactly they collect. Also, after a certain point, it gets impractical to have to make this decision for each and every site you visit.

Yes, I get that the extension operates within the user's auth realm. But still it should not be able to access data you as a user cannot access. Maybe this is already enough to do damage though.

I don’t get it. How can a browser extension mine data that otherwise is inaccessible? This should be covered by basic RBAC. Or are they just convenient scrapers, saving time but otherwise not accessing privileged information. If so, the LinkedIn story about “protecting our users” seems a bit shaky.

The extensions are basically bots to collect info for the user with the extension installed, not steal info from that user. Most are either scraping email, names, and job titles as quickly as a bot can, or mass sending out messages to users based on some criteria.

Here's a video for one of the extensions https://www.youtube.com/watch?v=2XvtuZjblCc (Warning: loud music)

> The extensions are basically bots

No, most appear to be plugins for ATS/CRMs, which allow recruiters -- having found a lead on LinkedIn -- to then add them to their CRM. This is profoundly differently.

For anyone who is asking what/who LinkedIn are protecting with this, it's not the users with the extensions installed, it's to protect the other users on the sites. I poked through some of the listed extensions and most are basically bots that you can turn on that will crawl through LinkedIn pages very quickly and either collect info (like email addresses) or send out messages to other LinkedIn users.

I found this video for one of the extensions that is a good example of what I'm talking about https://www.youtube.com/watch?v=2XvtuZjblCc (Warning: Loud music)

In 2015 I wrote and publish and Chrome Extension for LinkedIn that calculated the age of a person and put that age next to the name in their LinkedIn profiles. It quickly went viral and showed up in several places including Product Hunt.

Someone from BuzzFeed reached out to me asking questions about it and then later that day wrote an article claiming that LinkedIn had asked me to take it down (until that point they hadn't). That night I received a cease and desist letter, so I took it down.

There were many valid reasons to ask for my extension to be removed, but I never got the impression that they were doing it to protect the users whose age was being augmented or at least it didn't feel that was their angle.

It felt more like "this data is ours, so back-off". Just to be clear, I'm not saying that they were rude in their communications or anything like that. But the C&D letter focused a lot on the techniques and uses of my extension and not so much on the "this violates user's privacy" or "this is not representing accurate data".

I just think that in general LinkedIn doesn't like people poking around and trying to scrape data in any way. In the end, that's their most valuable asset (users' data).

For anyone curious, I still have the website: http://www.whoisjuan.me/age-insight-linkedin/

C&D letters are written by lawyers. They don't appeal to your empathy over the PII of other users, they state facts and appeal to the legal standing LinkedIn (or $company...) has over the data being used.

That said, I have no idea of the reasons LinkedIn sent you a C&D. It could well be any of the proposed options, or something else entirely. I'm just highlighting that the language in a C&D will rarely give any indication of intent, at least not "well written" ones anyway.

>> In the end, that's their most valuable asset (users' data)

Some might say it's their only valuable asset...

> I poked through some of the listed extensions and most are basically bots that you can turn on that will crawl through LinkedIn pages very quickly and either collect info (like email addresses) or send out messages to other LinkedIn users.

I'm going to take the top ten from the list as an example:

daxtra -- Nothing like what you've described, plugin for a CRM

SalesloftProspector/SalesLoftCadence -- I don't see any crawling capability at all

discoverly -- Nothing like what you've described -- more like rapportive

Ecquire -- Nothing like what you've described, plugin for a CRM

Ebstabullhorn / EbstaSalesforce -- Nothing like what you've described -- plugins for Bullhorn and Salesforce CRMs only

ProspectHive -- apparently defunct, no idea

talentbin -- this is a social media aggregator

Entelo -- ATS plugin

Ignoring everything else, it seems a bit weird a page can make requests to an extension's assets without originating from that extension.

I guess this comes down to extensions that inject code / modify the page.

Extensions can choose if their assets are public or private, and if they reference the asset from injected code - it needs to be public.

It sounds like a better solution might be to track the injected / modified code, and only allow it to read the assets. But I'll bet there is some tradeoff i've no clue about preventing that from happening.

Imagine an extension modifying a page and adding an image. How would it allow the image to load if that wasn’t possible?

I would have hoped for some shared secret approach where the extension can generate one-time use urls for their bundled resources on demand and use those instead of easily predictable urls.

It seems that extensions like ad blockers that are explicitly targeted by such detection methods have ways for work around that (see https://github.com/gorhill/uBlock/blob/master/src/web_access...). I honestly would have expected for that to be the enforced default behavior.

I was thinking if an image is injected, it'd be injected by a script loaded from the plugin thus trusted.

It’s a logical thought but that isn’t how it works.

A script doesn’t really inject an image, it injects an image tag which contains a reference to the image. As the image gets loaded there is no check who created the tag.

> LinkedIn violates their own users' privacy in an effort to detect the usage of browser extensions. At the time of writing this, LinkedIn is scanning visitors for 38 different browser extensions.

No it is defending against malicious actors from abusing its API.

> No it is defending against malicious actors from abusing its API.

I do not really understand the concept of "abusing an API". If an API is amenable to a "bad" use, it seems entirely to be the fault of the API designers, not of its users. The designers built an API that enabled an usage that they did not want. That is their fault, how could it be otherwise?

That is exactly what LinkedIn is doing, they are preventing bad actors from calling their API essentially blacklist them. They cant be blacklisted via IP since they are scattered across the internet, so they are banning them productively. Simple and easy.

Why not simply rate-limit everyone reasonably?

Changing the name of the extension resources and any extra elements they add to the page would be enough to stop this. (It reminds me of another "trick" pages like to use: randomising the element IDs. Easily defeated by searching for other properties of the desired element.) Just like DRM, it's a stupid cat-and-mouse game, and the mice will always win...

Even if the intent by LinkedIn is legit this will soon get used by data tracking scripts to further de-anonymise people

Is that inherently wrong if the website doesn't want to serve anonymous clients?

There are already a dozen ways to fingerprint users, I don't think this is specifically revolutionary.

OK, but why does LinkedIn scan extensions?

To flag accounts that are scraping data or "revealing" email addresses.

Negative view: they're blocking people from circumventing their paid features

Positive view: they're protecting their other users from getting spammed

A lot of these are used as CRM type applications where people would love it if LinkedIn just charged for access to a more comprehensive API instead. LinkedIns messaging UI sucks, and ironically one of the reasons to want to use CRMs like Nimble to interact with your LinkedIn connections is to be able to better track communication with them so you don't spam. But of course people will use it to spam too.

If LinkedIn offered API access to messaging in a way that let CRMs work with them instead of feel forced to circumvent them I think most who want to use it legitimately would be perfectly happy to have LinkedIn impose various usage limits and peotections even if paid.

They should see this as revenue potential: there are lots of potential to get companies with legitimate reasons for more integration than the current API to upsell their customers on paid LinkedIn features if they are able to offer it in an approved way, and I bet many would be happy to let LinkedIn monitor how it's used.

If they try to block access instead, they'll find more and more companies keep offering the same, but manually.

They want to block tools that offer functionality similar to their paid offerings.

LinkedIn contains lots of personal data, a large part of which is only available to users who are signed in and/or paid members. They want to protect this information from potential exfiltration by these extensions and their backing companies.

They have zero scruples. This is a company infamous for spamming people.

And, do they need to do it or are they just data mining?

If they don't need it, isn't it illegal to collect that data under the GDPR?

I don't see the repo actually saying they're collecting this data, i.e. sending it back to their servers. It may just be a "reverse adblocker" - a list of signatures of extensions that the website's JS will try to interfere with on the user end.

Who is going to stop them? I'm sure the data is worth >0 to them (or someone)

To fingerprint you via what you cremations you have installed to track your browsing habits.

More metadata to shape information... do you have ublock, authy, lastpass, bitmoji, etc. Could be anything from metrics, to useful interactions.

Got the dropbox extension, show an option to upload your resume from dropbox directly. etc.

Blocking ads, show integrated ads through a secondary channel.

I really don't understand the downvotes.

LinkedIn's reputation isn't so great. Primarily because they harvest user mailboxes, and spam endlessly about Mirimir (for example) inviting recipients to join their associate at LinkedIn. And it's not just annoying. Sometimes it hurts people's careers.

Oh, LinkedIn's reputation is deplorable... what they did to bypass security on iOS (and I think Android too) are particularly interesting (mail proxy). I'm not saying that metadata collection is good, or that there aren't nefarious reasons... I stated that was one reason, and it could be to offer features.

I only created a linkedin account to stop all the email invites... and even then, refuse to install their app (links pervasive in mobile web) and only accept connections to those I've met personally, and very few recruiters.

Fair enough. But some people just downvote anything even neutral about something that they hate.

That's a funny story. But I have a funnier one. Not long ago, maybe the last time LinkedIn came up on HN, I created a test LinkedIn account as Mirimir. Or at least, I attempted to. Given that I use VPNs, I got a cellphone text authentication prompt. But Mirimir doesn't have a cellphone, so I blew it off.

And here's the funny part. A few days later, Mirimir received email from LinkedIn, inviting him to join Mirimir's network on LinkedIn!

I admit one of the extensions from the list is mine. But is not as malicious or spammy as some like to picture it. Most of them are complements, addons to help the user with their CRM. I don´t know of any intended to steal data ( i believe they will use scrapers or other ways instead of asking users to pay for an extension) There are well know CRMs like Hubspot or SOHO that aim to sync data. Yes, some others are used to send messages to connections.. just as unsolicited as Inmails, the linkedin paid version( but at least is to connections). They also block extensions that block their ads and extensions like help users to filter out "sponsored " content ( we did that) Regarding GDPR , even LInkedin says Is not actually their data but the users are data controllers ( owners) https://legal.linkedin.com/dpa . Obviously, this is not Ok with LinkedIn because they are a walled garden and not an open platform. The points is they do not let the user decide, customize or adapt their experience to suit their needs. Any feature that is not in their revenues agenda, gets killed even if thousands of users cry for it ( happens regularly ) and they do not let anyone else offer it. Nobody likes spam , but is up the user no to do it - is like if your gmail will not let you send an email to more than one person at a time or be conneted to any other app ( yes, I know there are limits in gmail ). Notice that is not the legal way that Linkedin takes to stop these services because in reality, they are a monopoly ( and as pointed earlier Courts has ruled against Linkedin). Neither they use a educational or marketing path telling the users why is better FOR THEM not to use those extensions. No, they use FUD ( fear, uncertainty & doubt) to scare users and cancel the Linkedin of the people who create this "competition" ..it happened to me, to the people of hunter.io, findthatlead and many others. Mafia style. This is not a moral justification from me, it is a business decision to offer extensions to give capabilities that people want.

Looks like they are trying to block spiders and protect its users

No, they're trying to protect their LinkedIn Recruiter license revenue.

> Furthermore, there's no good reason to use web accessible resources in an extension! You can always find a solution to your problem that does not require them.

How would I e.g inject an extension-provided image into a web page without using web accessible resources?

The only ways I can think of would be copying the image to a blob or drawing it on a canvas - both seem significantly more complex than just injecting an IMG tag and would still be detectable as side effects.

I'm not familiar with writing browser extensions, but data URI comes to mind.

Ah, right, I forgot those. That's true of course.

I think you could still use them for side-effect detection (watch for images/scripts/etc with a known data uri suddenly appearing in your DOM) - but at least you couldn't actively query it without the extension doing anything.

How is a webpage able to query the local file system? That sounds pretty bad.

It doesn't, it queries the local assets of installed extensions. Chrome (and I guess other browsers?) provide a way to do this, so the HTML etc injected by an extension can reference assets shipped with the extension.

You can query static asset of extension by it's path chrome://extensionid/asset.css

I am really in two minds about Linkedin, I cancelled my account years ago after getting spammed by recruiters, this could be an attempt to clean up but looking quite sinister in the attempt

Linkedin has been an issue for years for me, because they simply disclosed your email to anyone connected. This enables some people and/or corporations to scrap profiles and build spam email databases. After being annoyed about this, I started to change my linkedin dedicated email address frequently, 4-5 times a year. The conclusion was obvious: less than a few days after the change, I began receiving spam and proposals on this new dedicated email address, thus confirming the email scraping problem.

Yesterday I went back to Linkedin to reconfigure a new email address, and found that the account settings now incorporate a setting to hide your email address to anyone (inactive by default...). I've enabled it and changed again to a new dedicated email address, to see if it is true. I hope this time Linkedin did things right.

Maybe I should try again, I am not looking to hire or be hired so not really sure if there is a point anymore

The written tone used in the repo comes of as too drastic, specially as it only reports the collection of analytics on how LinkedIn users use the website.

Is the detection result reported back to LinkedIn?

In their [Privacy Policy](https://www.linkedin.com/legal/privacy-policy#your_device_an...) they do mention they collect information on "web browser and add-ons".

This reminds me of similar approaches used in other environments. For example in the game industry, anti-cheat techniques of detecting the running software in mobile devices to flag users. How do you think this differs?

Talking of LinkedIn. Any suggestions of how I can bulk-remove contacts? I was wondering if there’s a Chrome Extension? I’m assuming all I’m missing is the motivation to script it?

One aspect is that LinkedIn is protective of plugins that incidentally cover up their own ads. Notably several entries on this list once had such grievances filed against them.

Is this issue unique to Chrome? Does it happen with Firefox?

I do have the same localStorage item in my Firefox. The shown way of decoding the content works too.

The technique of attemping to load web accessible resources does not work in Firefox. For starters, Firefox uses moz-extension: instead of chrome-extension:, that's obviously trivial to adapt to, but Chrome then uses the extension's global identifier in those URLs, while Firefox uses a locally generated identifier, specifically to avoid this sort of fingerprinting.

Do they detect the extension... What they do after that? Hide the email?

Why this is dangerous?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact