> *With such approach, the fake profiles would have to be created with many year...

mr__y · on Dec 31, 2019

>and it doesn't show others when you connected

if I actually scraped all the profiles in 2009, then in 2014 and then in 2019 I could tell whether an account is a 10+-year old account by simply checking if it is available in my 2009 snapshot. Does not matter if the social network displays or leaks profile age in one way or another. If it's not in my 2009 and not in 2014 snapshots then that profile is 5- years old. With frequent enough snapshots I would get even better timing resolotion. Now given that it's neither that hard nor that expensive to scrape or store that amounts of data, such an approach would actually be feasible.

caseysoftware · on Dec 31, 2019

Valid points. With the (no longer available) index pages on LI, you could get to most of the profiles too.

The drawbacks are:

a) Not having a profile isn't definitive. You could have missed it, it could have been locked down, or the person joined late.

b) You can't go back to build your baseline. You had to have the foresight to scrape it then or count on one of the breaches to establish who had accounts when.

The primary mitigation here would be LinkedIn (or any social network) itself. Whatever controls they had to block spidering, limit further than immediate contacts, etc would have to kick in.

mr__y · on Dec 31, 2019

>Whatever controls they had to block spidering, limit further than immediate contacts, etc would have to kick in.

On the other hand their business requires the ability to discover candidates by HR people so I guess that completely disabling search/discovery is out of question. Of course a simple limit to a number of queries or their reach would still be a huge problem for the scraper while not being a problem for most of the users and therefore not hurting the business.

Then considering that such massive scraping is probably already illegal and additionally the operation is being done by some intelligence agency meaning that legality is not an issue we can do a lot more that simple scraping using some proxies. This could include use of botnets (free resources, mich wider and more realistic pool of IPs) and/or hacked accounts (to scrape as a verified reputable user).

This all of course makes such a scrape a lot harder and probably not something that a single person with just a personal budget could do,but I believe this is still within the reach of even a small organization. And I'm 100% certain that this does not require multi-billion black budgets or large datacenters hidden underground.

>a) Not having a profile isn't definitive. You could have missed it, it could have been locked down, or the person joined late.

of course you are right with that, but then I could have full-scrapes being done once a year or even more often. While missing a profile once is obviously quite realistic and actually expected I assume that it would be unlikely that the same profile is ommited 20 times in a row given that the scraping has generally been proven to be effective.

Additionally I was initially thinking about using such data as one of the metrics not as a definitive spy-detector. Your account missing in my 2009-2017 scrapes and appearing just recently does not make you a spy but does increase a likelihood of you being so.

>You can't go back to build your baseline. You had to have the foresight to scrape it then or count on one of the breaches to establish who had accounts when.

Thats true. And even with data available from breaches might not be accurate or even be intentionally altered. But then again not everyone runs an intelligence agency