Hacker News new | past | comments | ask | show | jobs | submit login
The Fight to Mine Your Data and Sell It to Your Boss (bloomberg.com)
303 points by artsandsci on Nov 15, 2017 | hide | past | favorite | 130 comments

The poignant part of the headline here is "sell it to your boss". I think we have already lost the battle over data mining. It is happening, and will continue to happen and it seems to me there is really nothing we can do short of drastically restructuring the entire Internet.

The biggest problem is when you combine intrusive massive surveillance with a ultra-powerful entity that can be capricious, malicious, and often abuses and punishes its underlings. This is why I am far more concerned (as a selfish American, thinking for now only about the concerns of Americans) about the FBI having unlimited surveillance powers than the NSA: the FBI has a history of committing crime, horrifying suppression of speech, dissents, political movements, of destroying the lives of innocents, etc. The FBI can knock on your door with guns and throw in you a cage for decades. It has massive power over us so it is important we do not give it massive information about us at the same time.

The same is true with bosses. One technique to deal with this is to try and firewall the information off away from the powerful. This is what we did to the FBI before parallel construction (bills that say data will be collected "only for terrorism" -- a lie, obviously, as it always is, but one that made the collection palatable). The article focuses on this approach, which is in my view a lost cause, but a lost cause worth fighting for nonetheless.

The other approach is to take away power from bosses. Right now a boss has complete and total power over his employees without labor unions or worker protections (i.e. the current state of affairs in the US). It is this combination with the information that is so disgusting and dystopian. So we could, instead, talk about taking away the power, rather than taking away the information -- which would have the added benefit of solving many other social problems along the way.

I see many people argue that fighting for privacy is a lost cause. That it'd take too much effort to reverse current trends.

I think that you and others are too cynical, although I agree that it might become worse before it gets any better. If we look at history we can see that other societies had the will, if not the means, to engage in data mining on the scale that ours does through the internet and massive analysis through machine learning/statistics. The Stasi is an example of an organization that certainly displayed the will to violate privacy as much as possible.

But it also becomes apparent that the privacy violating priorities of those organizations and societies can change, in an almost Hegelian dialectic manner, to protecting privacy. These days counter culture is still going strong in Germany, relative to other parts of Europe. People that belong to a counter culture will resist tools that exert control over them, such as extensive surveillance, by definition. This is in addition to the fact that Germany is more privacy friendly than most of its European neighbours.

Things can and do change over time. Through history a pendulum swings from one extreme to another. We just need to wait until something goes horrendously wrong with the way our privacy is currently being violated before it starts to swing in the other direction.

I think privacy feelings are often misplaced. Using the internet may feel private, but it's like setting up a giant antenna on your roof and using radio to chat with someone else.

IMO, the fight should focus on things like secretly turning on the microphone on your internet connected device (cellphone etc) and listing in. That's the kind of situation where we could get useful long term precedent and most people would agree with the intent.

Because, if we are not careful that can quietly change and AI will make listing to every conversation anyone has in the county a real possibility.

I always found it crazy that they didn't carry over the hardware "recording" LED to cellphone cameras. I'd really like to see that become a thing, and maybe another LED for the microphone, wired into the circuit so that the camera and/or microphone cannot be in use without the LED lighting up. I'd basically like a row of status LEDs like an old Thinkpad for monitoring all systems on my devices.

Conversely surreptious cellphone camera use is also a big defense against police abuse of power.

Not sure I'd want to be unable to mask the camera and mic being active either.

i assume the LEDs suggested by the parent are for the protection of the device's owner.

for your use case, you might consider tape that's the same color as the material surrounding the light. though, that'd be a huge inconvenience to apply in the heat of the moment, and it'd somewhat defeat the purpose of an easy indicator if you had to peel it back every time you wanted to check the light. though... you might also worry about recording when it's in your pocket, in which case you'd still have to pull it out to look at it.

i know that police will often try to interfere with recording despite the legal precedents, but it seems pretty well accepted at this point that recording police activity is completely lawful as long as you're not interfering with said activity. again, i know police have blatantly disregarded that in practice, but it's something.

oh, another low tech option for surreptitious recording: finger over the light. still gotta remember to do it, but again, it's something.

Presumably the status lights would face the user, so unless you're filming the cops with the front-facing camera, they won't be able to see them.

Discreet hardware switch?

Yeah, this is what I want for my money if I'm expected to drop $999 on a cell phone.

I'll note that those are soft switches at least as far back as the T510, because installing Windows 10 disabled the microphone mute button.

> Because, if we are not careful that can quietly change and AI will make listing to every conversation anyone has in the county a real possibility.

Are we sure mass recording isn't already happening? It wasn't that long ago that the idea of bulk metadata collection was thought of as tinfoil hat material by many.

That's pretty trivial to detect by inspecting network traffic. Although perhaps we should make those kind of monitors easier to use and much better UX.

how many people can inspect network traffic on any given device? I don't think that's even a common skill among software developers unless you have a devops/IT focus, much less the entire populace

It is pretty common amongst System/Network engineers though. Modify a few switch configurations and setup a sniffer.

There are quite a few ways to do it.

I expect my android phone exchanges encrypted information with google for updates and other purposes. How would I know if it slipped some other information into one of the encrypted transmissions?

You could try something like TaintDroid. Marks data on your phone with a tag that is detected if said data is transmitted off the device without first being encrypted. Disclaimer: haven't personally tried it yet, but that's the general concept.

This doesn't address the concern that my phone, which Google has a large degree of control over, can leak my data to Google without me being able to detect it. As far as I can tell there is no way for me to use Google Play Services and have any assurance of privacy from Google.

One datapoint from a decade ago is Room 641A:


The biggest problem is when you combine intrusive massive surveillance with a ultra-powerful entity that can be capricious, malicious, and often abuses and punishes its underlings.

I am not so worried about that as I am about the cost. Consider: You could always get the dirt on someone if you were willing to drop a few grand on a private dick. That created a bit of friction - you had to really want to get at that person. But what if you can get all the dirt you could ever use for just $1, or some trivial cost? Someone so inclined could go on a fishing expedition on a whim. Your HR department could routinely do everyone in the company, and every supplier and customer, because why not? That's the way this will be weaponized.

Enabling the average worker to take risks like searching for a better job or maybe even starting a company of their own (if they can argue the business model to a funding entity or have sufficient savings or grant incentives) is the proper way of defusing that power.

Everyone should have access to quality healthcare (as a birth/emigration right); tax everyone, single payer (the government) negotiates with doctors in an area for the best price; everyone that needs a service done picks from among the open local doctors and just gets treated. No secondary bills, no annoying billing department, just treatment.

Second would be to actually have a national ID, and among other things, to have (an abstracted) public shipping address for registered mail to reach that person. First Class (normal post) would be able to lookup the address as part of the service. Anyone could thus mail to #NATIONALID# and not need to worry about where someone has moved or re-located.

Third is enforcing market competition for basic supplies (of all kinds, but I'll focus on housing since that's a major concern on HN). This would mean that if there isn't enough housing in an area people want to be, more would be encouraged, and if that isn't enough to match demand special arrangements would be called for to do things like buy out whole neighborhoods (at once) at something like 2-5 times current market value per unit and re-develop an area to a correctly targeted density.

It doesn't need to even be a #nationalid#. It could be like an email address you register with USPS.

Even better would be unlimited aliases to your national id where only you and USPS know who the alias points to (unless, of course, you tell someone else yourself).

Corporate surveillance is, in my life, a bigger deal than government surveillance.

I think Zeynep Tufekci described the situation really well with the title of her recent Ted Talk: "We're building a dystopia just to make people click on ads".

I don't think it makes sense to separate those two... if that data is collected by a third party and made available for sale, the government can buy it too.

Most people, myself included, wouldn't survive if directly targeted by the government - so I exclude them from my threat model. But I do hope that being wary of commercial surveillance will at least reduce the footprint I leave behind.

    > I think we have already lost the
    > battle over data mining
Who is we? GDPR comes into force in 6 months' time: https://en.wikipedia.org/wiki/General_Data_Protection_Regula...

Really? In a lot of cases, we give our consent for our data to be processed, and the terms of which - do you really think, even with a more up front disclosure statement and post-process record keeping, that for most people it'll have a large impact on data from opted-in networks? And the subsequent (consented) use of it in all manner of ways?

It's not a combative question - I'm genuinely interested as to whether I've misinterpreted GDPR requirements or not? From what I've seen, as a lot of personal data is willingly traded by people, and the greyness around "legitimate interests" of the controller, the landscape wont look massively different (just perhaps a more invasive privacy notice on first visit?)

I think I have a slightly different perspective, from various business dealings I have.

Firstly, I am elbow deep in the recruitment sector, and GDPR is a Major Fucking Deal for everyone. Software companies in the recruitment space are spending a huge amount of time and treasure making sure they are on top of the game for GDPR. Recruitment consultancies are too. The industry is taking it super seriously. Recruitment is (amazingly, and contrary to what people tend to believe) generally pretty good about this sort of stuff (with almost all shady shit coming from individual recruiters with too little oversight), so maybe the industry is over-reacting, but I don't think so.

Secondly, I am pretty aware of a lot of shady internet marketing and retargetting type shit. I don't know how that industry is reacting because I'm not close enough, but I'm pretty sure they should be in a panic if they're not.

The law as I understand it will make it difficult for companies to use catch-all privacy policies without being specific, and companies will have to make an explicit case for -- for example -- reselling your data to advertising companies if that's not explicitly what you signed up for.

Funnily enough, I work in recruitment marketing and do quite a lot with programmatic targeting and retargeting so am close to it too.

One of the benefits of recruitment is we tend to have a lot of touchpoints to gather informed consent, whether through the points in an ATS, expressions of interest, candidate contact and follow ups.

It's serious, don't get me wrong, but I think given the grey areas in definitions that there are (and the rationales developed for collecting information in the first place), if you're a responsible data controller in the first place, the impact will only be felt in a few ways.

For the purposes of advertising, it's always been sketchy to "sell" data to an ad company. Most of the time it's owned data passed along and processed by them on behalf of the original acquirer (something comparatively easy to get consent for, because it also ties into personalisation) for retargeting. Initial targeting is going to be the hard part, as the right to be forgotten is going to be hard to manage across multiple DSPs.

Analytics is going to be a *, as we'll need to delay any analytics firing until someone's opted in, but this will become blind after a while once someone's opted in or out. Persisting this option will, perversely, mean having to store more personal data around choices, but hey ho.

An approach we can take on an individual level is to be mindful of the data we are giving up. If you are employed and in the process of job hunting, then stop "liking" the article headlined "10 Reasons why you deserve better from your company".

This is giving in to surveillance: allowing it to define what you are allowed to say and do.

It is also not feasible. Sure, you can stop doing that, but there are millions of other 'tells' that you will give up through your actions online that can predict this information. Machine learning will pick up features you cannot control or even imagine that will accurately predict whether or not you are on the job hunt, no matter what you do to hide it short of just disconnecting from the Internet. Which is likely a tell itself.

That's just acquiescing to surveillance. That's not a solution.

Surveillance capitalism and government spying on the populace (enabled by the former) has gone far enough, to acquiesce to a degree like that is to give them an inch, and if you give them an inch, they will take a mile and we'll be back to where we started.

I call BS on the comments that say doing what you suggest is giving up... To me, "Liking" something implies expressing your opinion publicly, probably via Facebook. When I receive a job application, the first thing I do is check to see what the person has publicly posted. That's not surveillance. Before the general population starts to worry about surveillance, they need to think about the public image they are portraying online. I can't hire someone who posts stupid shit publicly. It could damage my business if my clients look them up.

That's not surveillance

Yes it is. No less than if you got their postal address from their CV, parked outside their house and watched who came and went. After all they volunteered their details and the street outside their house is public... right?

I mean, it is clearly "less" isn't it?

This continuum fallacy stuff really just makes privacy advocates look like kooks.

You have to use the language other people use if you want to convince them, and normal people do not draw an equivalence between someone googling their name, and someone staking out their house, because there is an ocean between them.

This continuum fallacy stuff really just makes privacy advocates look like kooks

You've clearly not read the article, in which a judge says that planting a tracking device on a car is materially different than tailing it.

As a non-USAian, I would suggest that USAians take the approach of referring to the FBI and its ilk as the Gestapo, KGB or the Stasi (or any other well known like group). Keep the references up and you might just see some changes.

It's possible it won't make changes but, hey, ya neva no wot can appen.

As a fellow non-USAian, I don't think it's helpful to equate US law enforcement agencies with Nazis and Communists.

The FBI is nowhere near as evil an organisation as the Gestapo, the NKVD, and so on. Using Nazi allegories all the time limits your ability to escalate rhetoric when they actually do something Nazi-like.

I think referring to the FBI as "the KGB" or "the Gestapo" would be counterproductive, making it harder for open-minded friends and family to hear valid concerns, and more likely that they just roll their eyes and tune us out -- similar to the effect of the euphemism "non-USAian" on this non-nonUSAian.

I stopped reading when I reached "USAian". What a stupid term.

> I think we have already lost the battle over data mining. It is happening, and will continue to happen and it seems to me there is really nothing we can do short of drastically restructuring the entire Internet.

Let's say we've lost so far, but certainly we can change. Europe, for example, has much stronger protections than the U.S. Laws cleaned up the environment in cities, brought a revolution in civil rights to billions .... certainly we can provide privacy to end users if we put our minds to it. (I'm not saying it will be easy.)

What do you mean lost? I deleted my linked in over a year ago when I learned about them selling my data to recruiters and it being used to assess my likelihood of leaving a job.

a boss only has power over you so far as you're being paid to act on behalf of the company, and that's the proper power you should have when you need to run a company, clearly having employees be able to harness some law and create a kind of capture on the employer wouldn't be ideal

you could end up with job seekers that are just seeding for predatory lawsuits, or people end up squatting on a company to get whatever they can regardless of what cost/value impact they are having on the company, terrible for both businesses and competitive job seekers alike

even if they made a law that says "you can't fire people for things discovered using data mining" you now run into the situation of having to provide some reason for firing someone, which is a messy situation, because you shouldn't need to provide proof or reasoning for no longer wanting to work with someone, it would make it dangerous to work with anyone that showed any potential risk for making things difficult, and we don't want to make it more risky and expensive for people to get jobs

>a boss only has power over you so far as you're being paid to act on behalf of the company

This is obviously untrue. You can be fired for any reason, anytime. Whether what you're doing is at work or not. Your boss can see you at a protest and fire you. Your boss can call you at 2am on your personal line and fire you if you don't pick up the phone. Your boss can fire you for not liking the right music or restaurants, or for refusing to go to certain parties (it's just not a good culture fit).

> clearly having employees be able to harness some law and create a kind of capture on the employer wouldn't be ideal

No, it wouldn't be ideal, but it would be very, very good. Asking for something ideal is a little Utopian, so I'll take the massive improvement you are suggesting.

>people end up squatting on a company to get whatever they can regardless of what cost/value impact they are having on the company

You have just described most bosses and corporate owners.

>you now run into the situation of having to provide some reason for firing someone, which is a messy situation, because you shouldn't need to provide proof or reasoning for no longer wanting to work with someone

This is how it works in nearly every developed country in the world, and how it used to work in the US. It works just fine. There is no need to give bosses ultimate complete power over all aspects of their employee's lives. There is no reasonable justification for it I have ever seen except ideological devotion to the unquestioned authority and power of the wealthy.

I don't know how many times I say it, but I'm glad I don't live in a country where somebody would think it acceptable to fire somebody for the type of music they like. I know you are joking, but even the fact that you think this is in any way acceptable implies that you are happy with the situation.

I so no joke there.

I believe the paragraph you are referring to was a statement of fact, not an approbation. I've read it as: "here's what US employers can do, and why it sucks big time".

At the same time, I feel that a boss should not be able to deprive someone of their livelihood on a whim. I feel there should be an actual reason why the person is being fired.

Absolutely. Privacy is basically security through obscurity. It's never going to stop a malicious government or even employer. We need better mechanisms to prevent such entities from becoming malicious in the first place.

Privacy is more than just security. It's about being able to act freely without being judged and controlling one's personhood.

I'm going to posit that most of us are spoiled by how much attention we get from recruiters, which is why the attention on this article seems to be on the legality of scraping someones site are not.

The focus should be on how disturbing it is that a company is using metrics like "independence from employer brand" to take the power out of the hands of the worker and put it into the hands of the corporation who already wields so much power and influence over our society.

Programmers are lucky, a last bastion of decent treatment by corporations. Companies like HiQ are looking to tip the scale back in favor of the corporation, run by people like Mark Weidick who want to be a useful and well kept pet and identify as a "Silicon Valley entrepreneur. Hollywood wanna-be." (twitter)

I fear secretive preemptive firing and hiring. "Talks" from your manager, based on encroachments into your private internet browser. This will force developers to combine their personal identity with their corporate identity (which far too many developers do already wearing their respective companies t-shirt like a big walking free advertisement) and curate their online life to reflect how grateful they are to the lords of their fiefdom.

Give it time, and Snow Crash is going to look like a utopian vision.

The day I wipe my ass with dollar bills will be one of the saddest days in my life. (Note: Employer, NSA, FBI, DHS, this one's for you!)

> Programmers are lucky, a last bastion of decent treatment by corporations

Hah, maybe in the Bay area - in the rest of the world we're treated like scum, just the same as everyone else.

The neural network said so

I think the two main questions raised in this case:

1. Is it always ok for someone to build a bot to do something which can be legally done by hand? Example: Building a LinkedIn scraper that tracks all public data. Or using a GPS tracker to track a car, instead of manually following it

2. Is it anti-competitive practice, and a violation of anti-trust laws, for LinkedIn to allow the general public, and other companies like Google, access to its public data, but ban others such as HiQ?

On question 1, I tend to lean towards LinkedIn's position. Just because something can be legally done by hand, shouldn't automatically mean that we should allow it to be done at massive scale by automated scripts. I wouldn't want companies having the right to surveil the movements of every citizen 24/7, just because they have the right to follow someone on foot, and I think a similar argument can be made against HiQ.

On question 2 though, I agree with HiQ. LinkedIn's attempt to ban HiQ doesn't seem like an attempt to protect their users, but rather, an anti-competitive attempt to kill off a potential competitor, and secure the market for themselves.

> I wouldn't want companies having the right to surveil the movements of every citizen 24/7, just because they have the right to follow someone on foot

I feel like there's an inherent contradiction there, and the right thing to do isn't to "lean into" the contradiction, but rather to resolve it in the other direction. That is: we probably need to change what is legal for private individuals to do, if we want to effect change in what is legal for automation to do.

After all, at its most basic, we've got "automation" like Mechanical Turk, or people who you can hire to stand in a line for you to buy a new iPhone. Any law against automation won't work out if exceptions like that still exist; and those exceptions can't be stopped except by changing what deals it is legal for a human to make.

>I feel like there's an inherent contradiction there, and the right thing to do isn't to "lean into" the contradiction, but rather to resolve it in the other direction. That is: we probably need to change what is legal for private individuals to do, if we want to effect change in what is legal for automation to do.

It can't be solved this way, because somethings can not be made illegal or if they can, they cannot be enforced at the individual doing it manually level.

Some individual can always sit at a street corner and read the license plates of the cars that pass. How one can make that illegal?

But systematically mass-collecting such data could easily be made illegal.

>After all, at its most basic, we've got "automation" like Mechanical Turk, or people who you can hire to stand in a line for you to buy a new iPhone.

That still has costs and limitations -- so it's nowhere near any competition for automation.

I'd say making their automated version illegal is a good starting point.

> Some individual can always sit at a street corner and read the license plates of the cars that pass. How one can make that illegal?

Have you heard of conspiracy laws?

We disincentivize gangs from using drug mules, by making it hard for them to find willing drug mules, by making it illegal to be a willing drug mule—the drug mule will, if they were complicit in the act, be charged with conspiracy to commit a felony.

This (apparently) works to decrease the prevalence of drug-muling, even though it's very hard to detect drug mules (which is the whole point of drug mules.) The law only really gets applied when you end up finding a mule through some other investigation (i.e. busting the gang itself.) But that still happens often enough to scare all the other potential willing drug mules.

This is how I'd see individuals sitting on street-corners counting license plates being charged: not for doing anything that is illegal prima facie, but rather for their willing complicity in a conspiracy to commit the novel crime of, say, building a database through espionage/surveillance without a license.

In other words: if the company's business model is based on a crime, then they're a criminal organization; and it's illegal to profit from dealings with a criminal organization, so you're doing something illegal by doing what they say, even if the thing they're asking you to do isn't illegal.

> That is: we probably need to change what is legal for private individuals to do

It is already illegal for a private individual to follow you 24/7. That is harassment. An extant law can, with modifications, potentially be applied to the GPS tracker scenario -- we only need apply the same rules binding individuals to their scripts as well.

Yes, the two questions are in fact quite orthogonal.

- Should data truly intended to be public be scrapable by HiQ (or anyone)? The answer has to be yes.

- Should you be able to use "public" data to seek out essentially private information behind the user's back (like what HiQ is doing)? The answer has to be no.

On this latter question, there is a notion of continuous consent and discoverability. There should be a feedback trail all the way back to the original provider of data (the user) about who is accessing the data and for what purpose, even if (or especially because) it is public; and this fact should be made known to the accessor.

There has to be a semblance of symmetry in how a public exchange of information should take place. Both the provider and the accessor should be in the open if it is truly public. Then the user can take actions to affirmatively provide continuous consent or withdraw.

>1. Is it always ok for someone to build a bot to do something which can be legally done by hand?

I'd say no. Technology is a multiplier, and if the same thing that took painstaking work and devoting resources to be done manually can be done automatically, it can be the difference between a democracy and a police state.

(E.g. the police targeting some suspects by tailing them and have some people listen to their conversations, and everybody in the country monitored 24/7).

The kind of crappy arguments usually taken as OK in courts however see this otherwise, as if 1000x automated something is the same as 1x.

As a LinkedIn user, though, I don't want HiQ to have access to my data. I'm certainly not going to claim LinkedIn is an angel of a company and has its users' best interests at heart, but its users voluntarily gave LI their information with the expectation that it would be used according to LI's privacy policy and ToS. If that doesn't include "allowing 3rd parties to scrape the site without LI's or the user's consent", then under what grounds does HiQ believe that they have the legal right to use and transform my data for commercial purposes?

Haven't the courts already decided that if things can be done by hand then they can be automated? For example license plate scanners were ruled legal because officers can view that information in public, but the automation allows the police to follow everyones movements with only a few machines set up. This ship has sailed

On the other hand the Supreme Court ruled 9-0 that "just because law enforcement can follow someone manually" doesn't mean it can also put a GPS tracker on whoever it wants, without a warrant.


I think we as the society are ultimately responsible for setting certain "rights" in stone. For instance, should the government be able to record everything anyone says "in a public place", even while on the phone or having a "private conversation" (but in public) with a friend, through highly advanced CCTV cameras?

Sure you could argue that "because it's a public place, then yes, we can conclude that the government should be allowed to do that". But that doesn't necessarily have to end the discussion. We, the society, can decide that "HELL NO, that's not acceptable, and we'll put anyone who tries to do that in prison."

Automated license plate scanning is probably legal for local municipalities (Neil v Fairfax County, SC appeal pending).

That doesn't mean it's legal for private entities to do so, nor is necessarily legal for such data to be, for example, aggregated nation-wide.

See https://www.theatlantic.com/politics/archive/2014/02/mass-su... for Conor Friedersdorf's article which is attributed with bringing down the last attempt for a "Homeland Security" database of where you were last summer.

The correct solution for this concern is to make it legal to obscure the IDENTIFYING part of the license plates (but require that state and 'tabs' are still exposed) while it is parked.

Moving the obscurity would constitute modification of private property and should require either a valid documented probable cause or a warrant for the search.

You're probably correct in how it will turn out, but laws can change. One precedent does not mean nothing will ever break it. It does make it harder, though.

Traditionally this is controlled by robots.txt, which does let you control which search engines can scrape your site. It's not enforceable, but generally considered fair play to respect it.

I think it's well within a website's rights to control who scrapes them. The problem with LinkedIn (and other social media sites) is that they don't let their users control this, in turn. It should be up to each user to decide whether they want to appear in search engines, and which ones.

It's pretty bizarre that this was never brought up in the article. The idea that users should control their own information was entirely ignored.

>Is it always ok for someone to build a bot to do something which can be legally done by hand?

Funny that you put it that way; when computers were in their infancy, people would scoff along the lines of, "...and they spent all that money/time teaching the damned thing to do something that you could get anyone off the street to do for $0.50/hr..."

Of course, before computers were invented in the first place 'computer' was a job description for a person who computes. How things change in...what, 50-80 years?

Well... 2 could easily be both, or even just what linkedin said. If HR gets an alert when you start sniffing around for new opportunities, people will sniff elsewhere. Linkedin don't want that and (coincidentally) neither do users. Either way, I don't think it's the interesting question.

On question 1, I think you are really on the money. Concise.

We're used to thinking in terms of principles when it comes to our principles, especially laws. Realistically though, our moral codes are not like physics. They're approximations that seem to work for the most part, for the situations we know about. Being a somewhat anal retentive species, we don't generally like this. It feels like the principles are unsound.

>1. Is it always ok for someone to build a bot to do something which can be legally done by hand? Example: Building a LinkedIn scraper that tracks all public data. Or using a GPS tracker to track a car, instead of manually following it

This is a thorny question. What if I build a scraper that requires me to press a button in an app to actually run a scrape? Is that "by hand"? This seems hard to effectively regulate without sliding into a criminalization of a lot of benign things programmers do on the web.

This approach strikes me as confusing precision for accuracy. It is possible to use this sort of information to formulate an extremely precise model. That apparent precision becomes believable because ... it's precise and complex and uses a lot of data. That then becomes a salable product.

However, whether this all is actionable and accurate is another and unfortunately later question. The promoters and customers of that approach might want to read this chapter in the CIA's Psychology of Intelligence Analysis:

Chapter 5: Do You Really Need More Information?


But then they might not have a product to sell or buy.

I dunno, I was accutely aware that as I prepared to find a new job by updating my resume, updating my LinkedIn profile and contact list for the first time in years, etc., that it would be very obvious to anyone watching what I was doing. I correctly assumed my employer wasn’t watching (which would have made things potentially uncomfortable for me), but with tools like this it becomes so low effort to watch for this stuff that they can afford to do it.

No, not everyone is like me and ignores LinkedIn while actively and happily employed, but I imagine enough people are and that it’s not so hard to build models of other user types that work as advertised (as long as they have access to the public data).

HiQ is a threat to LinkedIn on two fronts: as a direct competitor and as a reason for some people to opt out of using the service (though it appears LinkedIn offers similar services, so perhaps the threat is more the Streisand effect at work). That doesn’t mean HiQ should be locked out of the data of course, but it makes for an interesting and complicated situation.

A former coworker has a theory that you can accurately assess corporate morale by monitoring the volume of http requests to linkedin over time...

I'm not saying that social media doesn't exist and that anything you say can't and won't be used against you in a court of law. It will be. What I am saying is that relying on a complex model based on this is just rank silliness. Still, I guarantee that that rank silliness will be a product, it will be sold and and it will be bought.

For a big market with small bets (ads) this makes sense. For a small market with big bets (employment) it doesn’t make sense, to me at least.

I think you make the mistake of not thinking in statistics.

It doesn't matter if it's not always right, it only matters if it's better than what they have at the moment (ie: almost nothing).

Like the quoted bank said in the article, improving retention by 1% can lower costs by 100 mil per year. Offering a raise to people polishing their Linked in page sounds like low hanging fruit to me.

And of course, employees will try to game the system, which will become smarter and so on.

A statistician went duck hunting. His first shot was a foot high. His second shot was a foot low. When asked about it he said, On average, that's a dead duck.

As I said, there are markets, large markets like ad auctions, where this approach makes sense. But for HR, I'd have to see this being a demonstrable success story before I'd touch it. I think there are better and easier approaches to retention like treating your employees well rather than looking at a statistics dashboard and following its sage if soul-less advice.

I'm not willing to take that chance. There are enough shitheads in management that would see something like this and pounce on it, to where it might become an actual problem for lots of people.

Remember, we live in a world where coffee shops and Jimmy Johns (sandwich shop) decided that they needed to subject their employees to pretty draconian non-compete agreements.

What HiQ are doing will be illegal in Britain when the new data protection regulations are introduced in May 2019.

You will no longer be allowed to collect and store personal data without content.

Well now it is time to delete my LinkedIn account. There terms and services say that would not allow people to steal my information and that my information is my own. I have never agreed for my info to be resold by HiQ. Now that it is illegal for LinkedIn to protect me from HiQ's theft the only option I have is to drop LinkedIn.

I believe it's done only on publicly visible accounts, modify your privacy settings _if_ you don't want to go all the way to deactivation.

Now I'm just glad I never signed up for LinkedIn or any of these. Can't trust any data repository that's not my own.

You know, I think LinkedIn actually has a point that blocking certain scrapers is in the interest of their users' privacy. Of course, they are really just trying to stifle a competitor (glad the judge called them out on it) but I still think they should have a right to block scrapers at will.

The judges argument is fine, if HiQ had to contact every profile owner and ask for their consent to store and use their data. Otherwise, how does a individual know where their information is going?

It's interesting to see EPIC and EFF on different sides of the same issue. I don't feel like i've ever seen that happen.

I wonder if it might've been best for the EFF to just stay out of this one entirely, because it seems like sticking up for either side is defending and normalizing a shitty proposition. I see their point that LinkedIn is abusing the CFAA to stop competitive scrapers, but on the other hand, evil as LinkedIn may be, at least the users on it agreed to its terms. If I sign up for LinkedIn and give them my data, I agreed to that. If HiQ slurps it up, I'm not consenting to whatever sinister things they decide to do with it. That feels wrong and not something the EFF should be sticking up for.

(Disclosure: I used to work for Microsoft, which now owns LinkedIn, although I like to think I'd have this opinion either way)

Yes, I don't quite get EFF's point.

So they are saying that anybody should be allowed to scrape anything found online, and use that data for any purpose?

In this case it can be argued they use the data against the physical's person about who the data is.

So they wouldn't object if banks or credit companies would use the data to reject non-desirable people.

My armchair impression is that the EFF doesn't like the prescient this might set. I think you are correct in them wanting individuals to be able to scrape or otherwise mechanically process anything which they can normally view.

The real chilling effect is from the lack of actual privacy on the data it's self; and the existing asymmetry of power in the employee / employee model as we know it today.

Generally, the EFF fights for freedoms to use digital technology. Preventing bots inhibits that freedom, so it does make sense.

I am so thankful I don't have a LinkedIn, Facebook, etc. account to mine for data.

Since they can fingerprint your web travels to every page that has Google Analytics or a share widget, there's no need for you to create an account.

I use Ghostery and noscript and an ad blocker, so hopefully there isn't too much of that happening.

Micro images, and browser fingerprinting, and IP location are widely used as well. Even if you're using lynx via VPN you're still relatively identifiable since not too many users do that. Good luck!

I assure you that Facebook knows all about you whether you have an account or not.

Highly doubtful has any idea of even gender and age. They have a broad idea of localization and perhaps a pretty laughable attempt at "interests". Absolutely guaranteed to be nothing of value though.

Oh they have much more than this.

Messenger asks nicely to upload your whole address book, and since half your college buddies clicked yes they know when and where you studied, and who you knew, and your parents landline thus some idea of their location (and how wealthy it is). Ditto your colleagues at each place you worked, who all still have your old corporate email, and some have newer info...

And that's before anyone posts anything on facebook.

And yet they can't show even a semi-related ad if their entire existence counted on it.

They have the data, yes. But they will ruin society before having any idea on how to use it.

Most importantly. They can not correlate that with my IP or my browsing history. So that information is beyond useless when trying to track me and only slightly useful for researching the society in which I grew up (which facebook isn't interested in anyway).

This is also true. I don't see how they're making money off their extensive knowledge about me.

When I make the mistake of buying something online in a browser logged into facebook, then I get adds for a week trying to sell me what I just bought... this does not lead to me buying another one! And all this strategy seems to need is basic cookies, not a shadow profile.

> Highly doubtful has any idea of even gender and age.

That depends if they know that you visit websites which are also visited a lot by other males between the age group of 20 to 30, then they can almost assume that you are a male between 20 - 30.

Big woop, what are they going to do with that information? They can "target" me with an ad meant for males in their 20-30s (which again, is just a very crude guess). Which will guaranteed to be a much worse fit than just targeting to the content of that site.

No tracking needed. Much better accuracy and return.

Big woop? you just said its highly doubtful they have any idea about your demographics, I explained you how they can easily do that and much more like know your race and income bracket.

The days of just targeting you based on the content site are over, now demographics and your search/click history plays a role too.

Because their accuracy is abysmal.

Maybe they have a few percent better accuracy than guessing at random. And that might be worth a few billions. But on an individual level they just don't have the slightest clue.

edit: State of the art tracking seems to be "look at what products user x has looked at, show the exact same product everywhere after he/she probably bought it".

Now if you do have and use a facebook account that is another thing.

> Because their accuracy is abysmal.

Can you give some sources for your claim?

Not really, but it is kind of obvious. I have a very weird setup which would be quite easy to fingerprint (probably a handful in the world and likely only one in my town uses my display setup for one). I am the only one using my devices and noone else is using my IP (and I don't switch IPs either) and I don't use any ad-blockers. Yet ad networks haven't got a clue, often even get gender wrong. (And as a male i interested in tech that should probably be the easiest scenario imaginable)

I guess the most tangible data point is when google allowed you to look at what google thought if you (gender, age, interests etc.) and how bad they are at it unless you actually use their services.

It is obvious they would be far better of targeting sites than users without accounts. But that is not where their competitive edge is so they downplay that. Also, I guess they are big enough to not care about people without accounts.

Don't forget shadow profiles based on data like others mentioning you on pictures etc.

They can not correlate that data with any form of identity with any accuracy.

They can't even detect fake and duplicate accounts, which would be trivial in comparison.

> They can't even detect fake and duplicate accounts, which would be trivial in comparison.

They almost certainly can, but due to capitalism, they have a business interest in not really removing these accounts.

That would be incredibly short sighted given the problem such accounts are for them and that they do go out of their way to ruin real accounts in search for them kind of reveals that they can't do it.

Much like the twitter bots to them, lip-service will be paid, action will be taken on some low hanging fruit so they look like they're working on the problem, but they will not make a concerted, legitimate effort to clean them up.

How do they get that information? My family and friends don't really use it much (and to the best of my knowledge I've never been written about or been in any photographs that have been posted). I use things like Ghostery, noscript, ad block, etc. to keep tracking to a minimum. I don't search Google directly but only through duckduckgo. I'd be interested in hearing what other means they have of finding out information about me so I can block those as well if possible.

There are very few perfectly anonymous Internet users left, so that makes you a member of a quite elite group. Thus, you have become identifiable by your extensive efforts to be anonymized.

Well, maybe not all about you, but they have a good guess.

In a few years not having a meaningful Facebook profile might be suspicious in itself.

But you're missing out on all the valuable Services they provide!!! /s

There's a positive framing of the idea of mining (or collecting) information about yourself and selling it to your boss: it can help get you raises and promotions.

After all, when you ask for a raise you're in a wonderful negotiating position when your value is clearly quantified. When you mine/collect the information yourself, you get to choose how and when you present it. Thus you are selling the information to your boss and reaping the rewards yourself.

At first, it will help you get raises. Then over time as everyone does it, you won't get a raise unless you participate. Over time, you won't get hired at all unless you participate.

Why does HiQ have any right to my data in the first place?

Because people self-submitted their own information to LinkedIn with the desire for it to be publicly accessible.

And what is the reason people want their own information to be publicly accessible you think?

My public information on LinkedIn is public - to market myself. I didn't make it public so that another company could store it in their own databases, analyse it and sell it to "my boss".

This is a discussion touched upon in the article "What were the public squares and private rooms of the web? Who got to determine access? Should data be protected as speech?"

Anyway, as mentioned in another comment, the highly unethical work HiQ is doing will be hard to capitalize on with the GDPR coming up.

Where in that did I give permission to HiQ to do this highly unethical thing?

When you explicitly chose to broadcast that information to the public via LinkedIn who explicitly makes that information available to the general public and does not require you to log in (in contrast to say Facebook)

> highly unethical

How is a company leveraging intentionally public data unethical? That's quite a stretch. Stop making your information public if you don't want people or companies to look at it. It's as simple as that.

If you want to talk about unethical practices, let's talk about financial companies selling your transaction data to advertisers and insurance agencies, or social networking platforms re-purposing the content of your direct messages, or ISPs selling your browsing history. Those are actual things which have a reasonable expectation of privacy. These are actual instances of problematic PII abuse and what laws like GDPR are intending to curb.

I think LinkedIn's mistake was letting the lawyers fight over it. They could have just blocked the bots with a captcha and called it a day.

Captcha's are easy to beat these days, even fancy reCaptchas. They make the process harder and a bit more expensive, but they don't solve the problem.

The problem is that LinkedIn wanted to let some bots in (e.q. Google) but not competitor's.


My Boss is going to find out I volunteer at soup kitchens and help old ladies across the street. Oh, and I also love his favorite band and agree with him politically and didn't even have to gratuitously mention this around the office like a big suck up. A promotion is just around the corner!

Wasn't it just last week we saw articles on the front page almost every day about how we can't trust information from the Internet because Facebook and Google are big lying spying monopolies and sell clicks to anyone and don't responsibly curate like they are supposed to and we should all go back to reading the newspaper to get the real facts on life? Ya, pretty sure that was the gist of them. Anyway, hopefully my boss didn't read any of those. We'll just leave him believing that data mining and scraping and twitter sentiment analysis have super secret powers to reveal the hidden truth about the world.

This is stupid. While my boss has vested interest in my data, I am the only one that can benefit or change because of it.

If your employer knows you are looking for other opportunities, they could stop giving you projects and start transitioning you before you are ready, or just fire you.

That's a benefit to the employer, that probably is not a benefit to you, since you would likely want to quit on your own terms.

Another quick example is knowing that your employees are researching how to unionize, and thus knowing to bring in anti-union resources.

I'm sure there are more.

Or even create a self-fulfilling prophecy. What if someone is listed as "may leave" when they have no intent to do so. If the employer then starts freezing the employee out, that could lead the the employee eventually leaving, even if they didn't originally want to.

The goal is to figure out if you're close to quitting so you can be preemptively fired. This hands the final key away: That while the corporation without notice or reason could fire you, you could always show the same lack of loyalty and quit without notice. The goal is to deny the human that small balance.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact