Hacker News new | past | comments | ask | show | jobs | submit login
I Got Access to My Secret Consumer Score (nytimes.com)
884 points by pseudolus on Nov 4, 2019 | hide | past | favorite | 325 comments

There is a scene at the beginning of the movie Brazil [1] where a literal bug falls into a tele-type, causes an error that starts a chain reaction that frames the events of the movie. The bug is a metaphor in the movie, just some nice visual story-telling to indicate how errors in automated processes can have unintended consequences.

I honestly fear that entire lives will be ruined by AI systems mis-analyzing some data and locking people out from education, work, credit and health opportunities. For some reason, which will be mostly inexplicable to even the engineers that trained the ML system, people will get denied or flagged for spurious reasons.

As we automate everything this is inevitable. We're actively creating digital gatekeepers. 99% of the time you won't even realize it has happened to you nor would you have any recourse. It will be as innocent as an application you don't get a response from, or a simple generic "sorry" email. Then the brick wall of literal no customer service, or automated customer service that refuses to escalate you to a human tier. Then maybe if you are lucky a clueless customer service rep that compassionately explains there is nothing they can do and they wish you the best with your continued search. Fair enough their TOS likely allows them to deny service to anyone for any reason. Why would they waste the effort to figure out why their multi-million dollar system targeted you as an edge case? It works for 98% of cases which is probably good enough for them.

Brazil is a fictionalization that imagines an extreme case and the events of the movie get quite dark. The reality will be more mundane but IMO just as insidious. In the not very distant future AI will be choosing who is healthy and who is rich.

1. https://www.youtube.com/watch?v=XGge4rj4v_Y

We’re already seeing the impacts of these kinds of systems affecting many of the most vulnerable in the Australian community. I’m referring to the Dept. of Human Services’ punitive “robodebt” system.


> AI systems mis-analyzing some data and locking people out from education, work, credit and health opportunities.

College admissions? Resume screens? Mortgage applications?

We already have that at scale. There wouldn't be so much money in gaming those things if the AI wasn't doing such a poor job. You likely just happen to be someone that isn't on the wrong side of it.

> There wouldn't be so much money in gaming those things if the AI wasn't doing such a poor job.

This isn't true at all. When the process works well, those who it excludes often attempt to spend money to subvert it. There's no reason that interfering with the process should only happen when it's working poorly.

A simple example is the recent college admissions scandal. You often hear this scandal demonstrates that college admissions are just a way to let in rich people, but actually it demonstrates the opposite -- that the normal admissions process had (presumably correctly?) kept these people out and so they had to pay to subvert it.

Of course, you're not claiming that these things are just ways of separating the rich from the rest; just that they're messed up in some way or another. Which, if not very specific, seems broadly accurate. But the following claim isn't -- people on the losing end of a judgment are going to do their best to subvert that judgment whether it's correct or not, so one cannot infer much of anything about the judgment's correctness based on whether there's then an attempt at subverting it.

Call AI for what it is.

Garbage in, garbage out.


But is this really about AI? Almost all aspects of our lives are already controlled by the enormously complex soulless machines of bureaucracy we call firms and governments. Is there any essential difference between a computer bug and a clerical error? If anything, I would assume the former is easier to fix.

But that’s not how it works, it’s not one in place of another; it’s one that depends on another.

If your physical bureaucratic entity delegates decision making to an AI, you have no hope of redress unless due process was mandated.

If you cut the bureaucratic middleman then you have to build an AI that can question itself and correct errors.

AI will never do that. So what you get instead is negligence because a computer makes a mathematical computation and the human element treats it as infallble.

I agree. I think the real question (which is much broader and beyond the scope of OP) is how does humanity have to operate differently with every order of magnitude it grows in size? Humanity seems to be transitioning from single digit billions to crossing the double digit billions barrier in the foreseeable future. Until 1900 humanity had been on the order of a billion people, and we are currently transitioning towards 10x as many people sustaining the planet.

I do not mean this to be about finite resources and our planet. I am referring to the way that organizations require more layers of bureaucracy as they grow. A rule of thumb is for every 10x in people, you have another layer of management. So what will this extra layer of management be for people, not just from a government perspective but also a capitalistic corporation perspective? How do Google, Facebook, Apple, Stripe, Amazon, and all of those other companies handle so many customers? How much do they use automation, how many of their rules are archaic, how hard is it to challenge the system if you think there was an error?

I am not convinced AI can solve many of the fundamental problems humanity are facing - and I think overpopulation might actually be the problem. Modern capitalism has conditioned us to believe "growth is good" but growth for the sake of growth is no different from cancer.

The whole purpose of humans inside bureaucracy is to provide humanity, to subvert/override the systems when that makes sense. If they don't do that, and instead take wage as a motorized application-stanping arm, then they are failing horribly at their job.

You're right: this could very well be about Modernist systems incorporating racist, sexist, ableist paradigms into their rote performance.

I'm not sure we can solve this as it relates to AI/ML if we can't even solve it as it exists in human society today. Various heuristics are used, knowingly or not, to gate access to employment, education, healthcare, etc.

As an example I had to drop out of education (in the US) due to mental health issues. 6-7 years later, I'm still having problems getting back in as I wasn't able to get the problems documented in time. I also am unable to get any money as most of my classmates were despite my far above average performance pre-uni.

>As we automate everything this is inevitable.

Which is why it's important to ask the question "should we" instead of just "can we".

Even more important is not dividing people into groups that we treat differently. Everyone should have the same education, work, credit and health opportunities. If we can free ourselves from the desire to segregate and separate people in the name of progress, whether or not an "AI" guesses wrong about you is irrelevant.

It comes down to ego... people have to accept that no one can predict someone's potential or value to the world with enough certainty to make restricting people's opportunities worth it.

Humans are not much better gatekeepers. Studies have shown that for example having certain name can be serious disadvantage when applying for work.

Algorithms can be bad as well, but at least you can look into them to understand the reasons.

Maybe. They can also be inscrutable.

And that's just basic algorithms and regular human-written code.

Then you've got ML systems which are a whole 'nother level of inscrutable.

Or buggy. We're already running plenty of buggy code, but at least the errors can usually be found. If an AI overflows on some intermediate calculation sometimes because of implementation error I'm almost sure nobody will find that.

This has already happened in Australia. The AI that runs the government equivalent of social security and the tax office, has been programmed to send debt notices to tens of thousands of people on welfare. The Aussies call it "Robodebt".

Its already happening to hundreds of business owners who sell their own products on Amazon. Anyone who had a successful FBA business is now diversifying away from Amazon as it seems almost impossible to resolve mistakes caused by their algos.

Will these systems ever forgive or forget, like humans?

"The chains of tormented mankind are made out of red tape" - If Kafka were to live today, he'd write about AI bureaucracy.

We need to stop that future don’t we?

I’m curious what sort of due diligence these companies must do to authenticate you as a person prior to satisfying an information retrieval request. Given that the exchange is entirely digital, it seems plausible that there are bad actors who would pose as someone else to gain access to their personal information. What sort of liability does one of these data controllers bear when they fail to properly authenticate a person prior to handing over all their data? Is it limited to tort liability, in which there needs to be proof that the transgression ultimately led to some particular damage? Given that this data is being traded on the free market, what’s to stop abusive employers, ex-spouses and criminals from exploiting this information?

I suspect we as a society need new legislation to deal with these sort of issues. Before, much of this was incredibly difficult to impossible so legislation and regulation was entirely avoidable and relatively rare occurence could be dealt with on a case-by-case basis.

At this point, technology has enabled this sort of behavior at mass scale, now revealing far more personal and useful information about individuals.

A feasible model to work from may be go look at the healthcare industry and HIPPA requirements/liabilities and adapt as needed. Certainly not perfect but it's a good starting point for widespread data laws.

The question is, will our representatives actually give teeth to real data protection legislation (not a facade with no teeth only enacted by name) in the US or are they too deeply in bed with industry that they'll protect business rights over real people who suffer real direct damages.

So I originally studied chemical engineering, and one of the common catchphrases is, "regulations are written in blood." The reason there's a requirement to put a pressure relief valve on that tank is because at some point a tank like it didn't have one and it blew up and killed people. However, safety in chemical engineering has been evolving over centuries as we have understood more about the dangers and consequences of dealing with chemical manufacturing.

For software, it feels like we're only at the beginning of understanding the dangers and consequences of the software, business models, and technology our industry has created. Unfortunately, if it follows the same path as other engineering disciplines, we likely will experience decades or centuries of bloodshed before it is regulated appropriately.


But that's not an interpretation of that phrase at all.

It makes me think of signing inescapable contracts with demons, which seem convenient but ultimately doom you for eternity.

Nothing about 'regulations are written in blood' is about convenience. Regulations specifically exist to make things less convenient. The reason such regulations exist is because people did the convenient thing and it killed somebody.

they are convenient solutions



What the hell are you talking about? I'm not libertarian or ranting, I'm saying this is a complex issue and the "LETS REGULATE IT" feels premature given how complex and subtle the issue is, and how hard laws are to change once they're on the books.

Methinks you're projecting more than a little bit. Also your comment is some of the most toxic I've seen on HN in a little while, and I'm not exactly a ray of sunshine on here...

Definitely getting a flag from me, and I think that might be a first.

> I'm saying this is a complex issue and the "LETS REGULATE IT" feels premature given how complex and subtle the issue is, and how hard laws are to change once they're on the books.

There is a difference between "This specific regulation fails to take X into account" or even "This is a complex issue which requires carefully crafted regulation" and your stance which seems to be that regulation is impossible or undesirable due to the complexity of the issue or because it might be annoying to change later. That very much suggests your problem is with regulation itself.

Just out of curiosity, if you think regulating this is a bad idea, what do you propose to fix the problem? Appealing to the good will of companies who would gladly exploit us to make money?

Why do you think I said regulation is impossible or undesirable?

I suppose it was somewhere between when you said that government shouldn't regulate or try to fix the problem until they have a better understanding of it ("this is a complex issue and the "LETS REGULATE IT" feels premature given how complex and subtle the issue is...We need to understand the full scope of the problem, rather than try to fix what isn't fully grasped") and that you don't think the government is capable of doing exactly that. ("I have very little faith in the US government to accurately grasp the subtleties and nuances of the advertising and privacy problems")

I acknowledge that you may have just worded things poorly, or intended your words to mean something else which is why I prefaced by question with "If".

If you actually do think the US government should regulate this, how should that be done given your concerns of their inability to understand the issue and taking into account the fact that any mistakes they make will take effort to correct once it's been put into law?

Are you open to the possibility that you treated me overly harshly, and that I've never actually taken the position you seem to badly want me to have taken? Or is that beyond the pale, and this is exclusively my fault?

To try and contribute to this conversation a small amount, it makes the most sense to me to establish a Professional Engineer like system for the software industry. I don't know all of the details about how these systems work, but a friend of mine is a structural engineer and as he's described it to me it creates a personal ethical obligation that currently feels entirely absent from the software industry.

This ties the regulation to the industry, lets the experts decide what is and isn't okay, while also involving personal ethics and keeping individuals accountable for their decisions.

Keep getting called by a spam bot? A licensed engineer can be tied to that system, and can be punished.

A self driving car had a bug that couldn't tell the difference between a plastic bag and a child, forcing the vehicle into a concrete barrier? Someone stamped that code, and is now accountable.

Is a company using your tracking data to raise your medical bills based on your online food order history? That company's engineers are liable, you can sue them personally.

Again, not sure of the details about how this works in the construction industry, but when you put professionals at personal risk, they tend to care more about the outcomes, and it doesn't end up forcing the US government (specifically legislators) to understand all the details.

I'm fully open to the possibility that I've misinterpreted some of what you've said.

I wouldn't mind seeing that kind of system applied to some software companies. I'm not sure that fully solves the problem, but I really do like the idea of having someone willing to be held accountable for problems.

I think one issue with this would be that companies who keep their activity opaque enough are highly unlikely to be caught by users. A major data leak or a hacked system might be detectable, but a backroom deal to share data with a medical or insurance company would likely go undetected without a whistleblower to expose what was happening. If a bridge collapses, or a fire starts, or a floor sinks, it's a little easier to see when building/fire codes weren't followed.

I think a system like this would be best used when some rules are already in place for what a company can and cannot do. Then a company's certified security/privacy person would be responsible for making sure everything was done in full compliance with whatever guidelines were established. This would probably be needed anyway if you tried to sue the company's engineers later. If nothing they do with your data is actually illegal those lawsuits won't get very far.

I do agree that compliance checking and having that level of accountability could help developers rein in marketing teams, greedy shareholders, and stupid managers who put pressure on them to add tracking and anti-consumer code to their products.

I do worry about what it would mean for small startups, volunteer/personal projects, and single developers. It might be a more secure world if anyone who wants to write an app can't just slap a boilerplate notice that they aren't responsible for anything if you choose to use it, but my guess is that there might also be a lot fewer apps.

That's what regulation does, it restricts the players in a field to the ones who have the resources to follow those regulations. It is definitely a limiting factor on an industry, but I thought we were sort of assuming that the rampant "anyone can play" nature of software was the source of many of the industry's problems right now.

My point is that if someone has to do it, I would A) want the industry to police itself (let me finish) and B) would want individuals to be unable to hide behind corporations, to the extent that is reasonably possible. These together could, I believe, make more of a dent in the problems in the software industry today than any one specific topic of legislation (e.g. privacy laws or digital advertising laws) could.

Imagine if you could lose your software license for writing adware! Would it stop everyone/everything? No, definitely not. Would it give your average corporate software engineer a leg to stand on when they try to say no to their bosses when the business wants to start selling customer data unethically? Hell yes!

"I could lose my license if I add that feature." <- huge impact.

I mean, you saw "regulations are written in blood because people died and resulted in them getting passed" and thought a proper response was "let's not have any regulations at all because they're hard to get rid of and slow business down" so you're at least taking the libertarian position, regardless of whether you identify as a libertarian.

And no, nobody else has ever interpreted "regulations are written in blood" in that fashion, nor is that a valid interpretation of that saying. That is in fact the opposite of what the saying means.

There are wrong answers to questions, I understand that it may be uncomfortable to be told that but it doesn't change the fact that nobody else ever interpreted that saying that way, nor does it really make any sense to interpret that way.

Your jumping into semantic arguments about well I said it therefore it's a viewpoint is especially unproductive to the overall discussion. Like sure, but (a) that's not relevant, and (b) not all viewpoints are good, it's not a valid viewpoint just because it's a viewpoint. It's also where you start seriously leaning off into sealioning/tone argument instead of, you know, discussing why regulations are written in blood.

Hope this helps, I'm trying to keep this as even-keeled and matter-of-fact as possible - which certainly poses a challenge once you've poisoned the well by throwing around accusations of toxicity. That's a burden on me, everything I argue now has to be sugarcoated lest it further reinforce you as being the victim, which is of course the whole reason you dove into tone arguments.

That is part of the problem with "no offending anybody" type rules - it becomes very easy to cry "toxic" and play the victim and that lowers the quality of discourse, because nobody's opinion can ever be wrong lest they get offended.

Honestly I didn't read this past the first two words, because I don't think a continued conversation with you will be productive.

Take care, have a good day!

Not familiar with that interpretation, though there is another similar one that refers to what you mentioned: “written in stone”

>will our representatives actually give teeth to real data protection legislation

No. The majority of them don't even understand what data is being collected, how it can be used, or the technology behind any of it. Until the people in office change, the idea that our representatives can or will protect us is toothless and spineless.

Yes, because that's what they are - representatives. Of a general public.

The only way to make these chameleons (pretend to) care about an issue is to make the general public (aka voters, source of their wellbeing) care about the issue.

What trouble Congress is in! They have to be experts in law, warfare, every industry from oil refining to adtech, science (at least as far as funding it goes), ecology, economics, medicine...

In fact I would say that it's impossible to expect anyone to understand enough to know what they're legislating. Aside from being a powerful libertarian argument against the excessive involvement of government, I think it shows that we have to figure out a way to work around the fact that congressmen don't understand everything that they're in charge of, rather than trying to remedy it.

Congress used to have an office which would inform them of how things work, but unsurprisingly, proceeded to dismantle it. https://en.wikipedia.org/wiki/Office_of_Technology_Assessmen...

This is why members of Congress have staffers and various groups like http://www.loc.gov/crsinfo/ which have subject-matter experts on staff to offer advice and research arbitrary topics of legislative interest.

That's a cop-out. Data use and technology are major issues that are required understanding going forward. Understanding the oil industry is not at all the same as having a basic understanding of technology. Technology usage is only going to grow in scale.

A quick refresher on the details of how congress actually goes about their day-to-day plus the roles of the various permanent government and de facto government organs, shows "expertise doesn't scale" isn't really a problem. If it was, the whole thing would have come toppling down ages ago. So the libertarian argument is only "it's hard to do well", not "it's impossible".

I'll leave you to find your own references to get a detailed picture, but in short two basic forms of organizing do a lot of the work: big teams/staff for each congressman to help out, and special interest committees and caucuses. On top of that at least some take the idea of "representation" somewhat seriously, so if they need to know about a topic they're not experienced in, they tend to meet with supposed experts in those topics (which can come from a dedicated pool, lobbyists, state university department, and occasionally private industry owners or high level employees) for information and sometimes policy recommendations.

> I suspect we as a society need new legislation to deal with these sort of issues.

Why not ban targeted advertising altogether?

Because it has value, both to the advertiser and to the target, compared to un-targeted advertising. Advertising is an attempt to transfer information. Targeted ads means that the information is more likely to be relevant, whatever its other downsides. Let's not throw out that baby with the bath water.

"Advertising is an attempt to transfer information" is deeply misguided unless you adjust that to say "transfer very biased information". Advertising is an attempt to influence one party to give their money to another party, generally through manipulation and guile.

Targeted advertising is like the difference between spam and spear phising. The person being advertised to rarely (if ever) benefits.

> The person being advertised to rarely (if ever) benefits.

Instagram has gotten so good at targeting me that I probably click on at least 1/2 the ads because I find them interesting. I've purchased things because of those clicks too, and have enjoyed the purchases. Things I would have never known about were it not for the ads.

I benefit if Youtube shows me advertisements for desk chairs and lathes instead of beauty products and kids toys. Targeted advertisements help me discover new products and product categories, while untargeted advertising just wastes my time. This is in essence no different than companies buying advertising time in specific TV shows with a known audience.

Of course the abuses are also plentiful and dangerous. We probably do need a lot of regulation. But in general targeted advertisement is a useful thing to have.

The equivalent of buying advertising time in specific TV shows with a known audience is... buying adverts on specific Youtube channels with a known audience. Funnily, a lot of Youtubers now perform advertising of this sort - I'm sure you've seen the VPN ads.

This, like TV, doesn't require tracking of individual people who haven't consented to such.

> This, like TV, doesn't require tracking of individual people who haven't consented to such.

That's a stretch. How do you think they determine any information about the "known" audience? Obviously, they do all sorts of research to determine more about the audiences watching certain shows and channels, likely by sifting through social media, show reviews (IMDB, etc), and elsewhere to determine who's watching what shows, then looking into their profiles to determine their interests. Aggregate that data, find what they all have mostly in common, and cater to those interests. It's just a more generalized and difficult process of the same thing, and eventually it's going to become more narrowed and specific to individual targets.

The difference is that if I don’t want to let anyone know that I watch My Little Pony... I don’t have to! There’s no little black box connected to my TV that tells an advertiser that I’m watching it without asking me.

I can also have a discussion with my friends about My Little Pony and how amazing the last episode was without an algorithm picking that up, unless I publish it to the world or explicitly send it to the advertiser.

I understand your point, and I agree that it's easier to keep your television viewing habits hidden from your internet viewing habits, for obvious reasons, but I'm just pointing out that there are still ways to aggregate data to the point where they are able to determine, with significant accuracy, which ads are the most effective based on demographics, locality, cross-referencing social media, web traffic, consumer surveys, department store sales information, and probably tons of other details.

They benefit when they use valuable services for free, which is paid by targeted ads.

“Targeted advertising” is just a euphemism for the secretive creation and sale of dossiers on unwitting Internet users. It should already be illegal under existing laws.

In most cases, this isn't true. The dossiers and maintained very closely by Facebook, Google, etc in order to leverage their knowledge on a target to produce the most effective targeted ads. If they sold the dossiers, they would lose the continued revenue.

No advertising is really untargeted. Even billboards tend to carry ads for the type of driver that passes by them and broadcast TV has ads for the type of person that watches that show.

Ads targeted to an individual without that individual's consent should probably be banned. In addition, the law should make it clear that when a company does hold data about individuals, the individual should have some rights to that data including the ability to block the sharing or transfer of that data.

So, when Google buys FitBit, every user of FitBit should have to explicitly opt-in to their data being transferred to Google.

Most things has value by some definition. Question is if the sum is positive. I'm not sure that is the case for targeted advertising.

Targeting doesn't work. There is no baby.

I recently bought a set of clamps on Amazon to join two shipping containers together. Since then I've seen a bunch of related ads, such as for shipping container vents, shelving, etc. I clicked on a couple of those ads, and while I didn't buy anything, I did learn some useful information. I probably would have learned less, if anything, from untargeted ads in those same slots.

> while I didn't buy anything

And so targeting didn't work!

Yes, "targeting" can put stuff in front of your eyeballs that may occasionally have marginal utility, but for sales? Nu uh.

In the baby/bathwater metaphor, you saw some ads for strollers. Still no baby.

It didn't work for the advertiser, but it did for me. I learned about options and prices for container accessories, which will help me make future choices. I would not have been better off if those ads had been less interesting to me.

> Advertising is an attempt to transfer information.

If you want to inform me, just give me a website I can visit where I can enter my needs and provide perhaps some context information, and which deletes/forgets this information when I leave. There is no need AT ALL to figure this info out behind my back.

The use-case isn’t advertising, it’s QoS. So you collect the data and give your best customers fast access, or maybe you balance it out: your most complaining customers get preferential treatment and the most reasonable wait a little longer on the line because you know they won’t complain.

Or, framing it more broadly: mass personalization implies a worse deal on average for everyone on the atomized/consumer/low-power (individual) side, because the high-power participants (companies) will optimize their outcomes at the expense of individuals.

Isn't GDPR enforcing on this vein? You could build something similar in North America.

Looking out how the spam act played out, which allows implicit opt in compared to Europe’s explicit opt-in.

I believe US govt will always be at the mercy of corporations.

Now some things are great. Like CCPA, California’s version of GDPR. But getting it accepted throughout the US is a mammoth task.

I have hope though.

I've been wondering: if there was a world government, corporations would probably reach a point in which they are unstoppable from a law point of view. The bigger the machine, the harder to steer it in a good direction.

This is already a big (and largely unaddressed) problem with the big 3 credit CRAs, if you know enough about a person you can very easily request their credit report and get the keys to the kingdom (so-to-speak) - everything you didn't already know.

I mean, I already request credit reports for my husband without issue, for example (with his permission! - he finds it much easier to just ask me to do those things for him rather than doing them himself).

In this case, since email address is part of the report they could only send the report to the email address on file for "security," which would be a big improvement over what the big three CRAs are doing with annualcreditreport.com.

CRAs are a scam from top to bottom.

A bank makes a lot of money, but in theory, it's doing an important job which benefits society. That job is independently assessing creditworthiness. Of course, it's hard to assess creditworthiness if you don't know if someone is making a lot of loans at different places. So, there needs to be a system for credit monitoring. But credit monitoring is not credit rating. Credit rating is the one job a bank is supposed to do. Letting someone else do it undermines the whole purposes of the independent financial system. We might as well just dissolve the banks and move to a centralized planned economy if that's what we're doing, so that at least the centralized rating agencies will be democratically controlled.

So, to begin with, CRAs shouldn't exist and undermine the basic purpose of the financial system. On top of that, they are incredibly incompetent and corrupt as seen by the Equifax breach. It was clear in the early 2000s that the old system in which people would present a few pieces of relatively obscure personal identity to open a line of credit was no longer workable because the data was now subject to trivial duplication. Instead of fixing this, the industry created the concept of "identity theft" in order to falsely shift blame onto an unrelated third party.

I "had my identity stolen" a few years ago. The event had nothing to do with me, so all of the language around this is wrong. What actually happened was first a criminal learned some information about me, then Verizon chose to give the criminal a line of credit on a cellphone, then the CRAs reported that I was profligate to anyone who asked. Saying "my identity" was stolen makes it seem like I was somehow a party to any of this. "My identity" is not a property of mine; it is a property of the reliability of the CRAs' data. What actually happened was the CRAs had their data polluted by the combination of a criminal and lax identity checking at Verizon, and then the various guilty parties forced me to do their data cleanup for them.

What should have happened in the mid-00s was that the credit monitoring agencies, created systems where you can prove your identity to a notary public and get some sort of signed certificate gizmo that you can use to get a cellphone or make a car loan. But because the whole US financial system is corrupt, it instead outsourced all of the liability onto consumers.

Agreed. The whole idea of "identity theft" stinks of PR lubrication. Your identity cannot be stolen, but "identity theft" is a clever, cynical sleight of hand that obscures what really happened: credit fraud, specifically fraud that is the responsibility of the party issuing credit and the criminal propagating it.

I agree with this idea in principle, but I am unsure of how it would work in practice.

Say you open a credit card, then try to say it wasn't you..... what would a bank need to do to prove it was you? The things they would provide are already the things they have... your signature, your information, etc. What EXTRA bit would they start collecting that would prevent fraud?

Currently, banks ARE on the hook for fraud.... if you dispute fraudulent credit opened on your behalf, they have to eat the cost.

I don't quite see what the difference would be in this alternative world... currently, someone applies for credit, the credit issuer decides it is legit, and issues the credit. That would still be the same. Say it was fraudulent; the fraudster doesn't pay it off, so the issuer tries to collect... at that point, you say "hey, I didn't open this credit!"

Well, the issuer is going to say "yes you did, here is the information I have saying it was you"... how would that be different in your alternative world? Maybe you would require more verification steps... but what? Picture of you holding a sign saying you signed up for the credit? Video? What could you possibly provide that couldn't be faked for fraud? What could the issuer require?

At this point, things are no different than now. The issuer says it was you, you say it wasn't, and then someone has to arbitrate and decide who was correct.

I guess I just don't see how we can do it better (although I would LOVE to do it better!)

It has been done better in other countries and could be done better in the US if there were political will to do it. There's a whole universe of cryptographically signed certificates that we in the US don't use. The hard part is not signing a cert; it's making sure the certs are given to the correct people and having a procedure for when a cert is lost. Estonia has done this well; Korea has done this poorly. But it's quite doable if you have political will for it.

Does that then make the cert a high value target? How do you get people to protect it and use it properly? If it is stolen, does it actually make it HARDER to prove you didn't open the credit line?

This is not a theoretical topic. Other countries have electronic signature system and you can evaluate how well or poorly they are implemented: https://en.wikipedia.org/wiki/Electronic_signatures_and_law

The issuer and arbitrator are, if not the same entity outright, are so far in bed together they should be common-law spouses. Let me explain one of 2 possible outcomes based on socioeconomic factors.

A) you have money to keep a lawyer on it; No problem.

B) you don't have money for a lawyer; Tough shit.

Nothing beyond real unilateral enforcement of the system already in place is required.

> What actually happened was first a criminal learned some information about me, then Verizon chose to give the criminal a line of credit on a cellphone, then the CRAs reported that I was profligate to anyone who asked. Saying "my identity" was stolen makes it seem like I was somehow a party to any of this. "My identity" is not a property of mine; it is a property of the reliability of the CRAs' data. What actually happened was the CRAs had their data polluted by the combination of a criminal and lax identity checking at Verizon, and then the various guilty parties forced me to do their data cleanup for them.

Right -- and really, that should be some class of negligent libel on behalf of the CRAs.

> We might as well just dissolve the banks and move to a centralized planned economy if that's what we're doing, so that at least the centralized rating agencies will be democratically controlled.

In the history of central planned economies, never have they been “democratically controlled.” Despite the name, places like the Democratic Peoples Republic of Where-ever are never democratic nor are they republics.

> because the whole US financial system is corrupt

Is that actually true or just hyperbole?

It's actually true. If you don't believe me, go try to start a bank. Here's a handy link to get you started:


How does your link support your assertion of corruption?

The link doesn't. The experience you will have if you actually go try to start a bank will. I only provided the link to point you to the entrance of the rabbit hole.

I don't see how the fact that the process of becoming a bank is strictly regulated means that it's corrupt?

It doesn't. What makes it corrupt is that banks are not in fact strictly regulated, it only appears that way. But there is no way I can prove that to you in an HN comment. I would have to write a book. I can point you to a lot of circumstantial evidence but there is nothing probative in the public record (see below). The only definitive evidence I have of the corruption in the system is my personal experience, some accounts of which I have published on my blog. But if you don't put credence in my conclusion, then you will likely not put credence in my evidence either, because I can't prove any of it. The only way I can prove it is to invite you to undertake the same journey I did. If you do that, I predict you will learn first-hand the same thing I did: the system is corrupt, and whether or not you succeed in penetrating it will ultimately depend entirely on whether you have the right connections and whether they decide that you can be trusted to toe the line on the unwritten rules, the first of which is that you never talk about the unwritten rules to anyone who is not a member of the club.

Note that I never joined the club, so my theory of the unwritten rules is pure speculation. No one ever sat me down and said, "Look, son, this is how it is..." But I did spend ten years of my life on this, and during that time I accumulated a lot of evidence that I have a very hard time explaining in any other way. It eventually led me to a serious existential crisis.

BTW, it's not just the financial system. Academia is corrupt in much the same way, and in that case I did join the club so I can speak to that with some authority. That experience is one of the things that allowed me to recognize what I was seeing in the financial industry. But both academia and finance are centuries-old industries. They have become very skilled at hiding their corruption from prying eyes, and a big part of the strategy is making it appear that anyone who accuses them of corruption is a crackpot. (Which is, of course, exactly what a crackpot would say, and that, too, is part of the strategy. It's a horrible catch-22.)

So you have to decide whether to believe me or not, whether you think I'm a crackpot or not. Before you jump to a conclusion I invite you to look up my record. My life is pretty well documented on the web.

Having joined academia at one point and seen "how the sausage is made," and subsequently left for ethical reasons that I have no way of using to hold anyone to account, I totally understand this comment.

It is so hard to put into words how these systems are corrupt, because these systems create an enculturation / religion around themselves. By the time you see how the entire system works, you are powerless to simplify the mechnications that make that system corrupt (if you even choose to recognize the corruption). You can't "just start an alternative," because the system exists at a local maxima and will crush your alternative or assimilate it into the existing system.

When people are taken advantage of by these secular religions, it is so normal and engrained in the societal fabric that we almost don't have the language to expose the fundamental dishonesty and fraud of these systems. Victims will say that there may be some bad actors at the edges, but on the whole, "this is the way it's supposed to be."

Yes. Exactly.

I think this might come down to how you define corruption.

I define it the way the dictionary does: dishonest or fraudulent conduct by those in power in order to advance their own interests over those of others.

How would you define it?

the magnitude of it matters.

Order a Big Mac - does it look like the ad? Probably not. Drink a Cola, does it feel like your life has turned around, probably not. is advertising dishonest - of course, but we all know that and we learned to deal with it. Is advertising corrupt, I would not say that.

Thus for something to be truly corrupt it needs to go beyond a certain level of illegality.

There are plenty of small banks and credit unions out there thus the point that you cannot open a bank is not quite valid. Are some of the rules onerous, probably. Are some of the rules unfair and ridiculous, probably ... does it mean it is corrupt I don't think so.

> the magnitude of it matters.

The cost to consumers of financial corruption runs into the many billions of dollars.

> There are plenty of small banks and credit unions out there thus the point that you cannot open a bank is not quite valid.

I did not say that you couldn't open a bank. I said that if you tried you would see firsthand evidence of the corruption of the system.

The problem is not that the rules are onerous. The problem is that the rules are not applied evenly and transparently.

> The problem is that the rules are not applied evenly and transparently.

Of course not. Never are, again you are not saying much here. Also with the billions of wasted dollars. Of course, but that is a natural consequence of dealing with immense scope - it is going to be very inefficient and stupid. Still a far cry from actual corruption.

I feel that people tossing around the word corruption don't really understand what it means and it is a hyperbole - only undercuts the message.

A bit like the Soup Nazi in Seinfeld - he is not really a nazi in any shape or form - don't even mention real nazis in the same context.

> I feel that people tossing around the word corruption don't really understand what it means

I see. So your position is: I "don't really understand what [corruption] means" -- but you do. And because you possess the true understanding and I don't, nothing in my personal experience can possibly be evidence of corruption because you alone possess the true understanding.

Have I got that right?

> > The problem is that the rules are not applied evenly and transparently.

> Of course not. Never are

This is normalization of deviance. It might be true that the rules are never applied evenly and transparently anywhere and never have been, but it is one thing to posit this as a fact, and quite another to dismiss it as being inevitable (and hence acceptable) by saying, "Of course it's that way." No, it's not "of course." It's corruption, not just because the rules are not applied evenly and transparently, but because this is done by a group of powerful people for their own benefit at the expense of everyone else. Its inevitability is a self-fulfilling prophecy. By accepting it, you have made yourself part of the problem.

You can claim that people tossing around corruption don't understand it... but you in the first place don't understand the scenario OP is even describing (as they're unable to provide details, you couldn't possibly be making an accurate judgment). So it is far fetched for you to confidently claim OP is misusing corruption etc. here.

I am simply responding to what others also called out, that none of the evidences the poster claimed to exist were indicative of corruption,

hence the logical and reasonable assumption that the poster is misusing the term, obviously I can only comment on what is stated here,

> Letting someone else do it undermines the whole purposes of the independent financial system.

Even small regional banks have their own internal credit rating algorithms. Credit ratings from CRAs are generally consumed either in aggregates (a buyer on the secondary markets wants a traunch with an average credit rating of X) or by less sophisticated parties such as landlords.

> since email address is part of the report

I don't have a business relationship with any CRA. If they have my e-mail it isn't because I gave it to them intentionally. Nor is it guaranteed that they have a valid email that I still control.

I meant the Sift reports, not the big three CRAs. The Sift reports will contain your email address since it's reporting mostly on online services (that require an email address to use)

I wonder what the additional risk is though given that, iirc from the times I've requested credit reports, the amount of info needed to retrieve it is enough to have already stolen my identity. So in that case it seems the additional risk is low.

This problem is much bigger than just the CRAs. The fundamental problem is that the information that is used to authorize transactions (including financial transactions) is not bound to the transactions. It's re-usable. That makes phishing trivial and hence inevitable.

The situation has gotten slightly better recently because of the widespread deployment of chip cards, but these only protect POS transactions. They don't help with e-commerce or non-financial transactions like credit report requests.

Not only that, but so much "consumer data" is probabilistic - they believe it's you who's doing something, but it could have been your ex logging into Pinterest on your phone one time, which then linked across her other devices and now "your data" is actually leaking her private data from the rest of her life.

The one that offered an online form requested a copy of a government issued id to confirm identity. How long til someone poses as one of these agencies to collect data too?

I've sent a couple of GDPR Subject Access Requests out of sheer curiosity. The answer is - in my experience - 'it varies'.

One requested a copy of a photo ID or passport and were happy to accept a partly redacted copy with 'FOR PROOF OF ID ONLY - (company), (date)' over-typed on the scan in red.

Another requested I email them from an email address they had in their records, or log in to change it and resubmit the request.

Most of the others sent the data without any ID checks whatsoever.

Curiously the one who did the most ID checking used an exemption (I forget the specific GDPR Article number) to redact nearly everything which wasn't shown on their web portal. Their position was that "revealing this data (specifically messages on trouble-tickets and their staff's internal notes) would adversely affect the privacy of our staff".

Amusingly I could just log into their portal and view the trouble-ticket history (but of course, not the internal notes). I can only assume their refusal was because there were comments in the internal notes (on my or other tickets) they weren't comfortable disclosing.

> Another requested I email them from an email address they had in their records

Wow. They didn't mail it and require confirmation of receipt, they just required a (trivially forged) mail with that email address in From?

> I can only assume their refusal was because there were comments in the internal notes (on my or other tickets) they weren't comfortable disclosing.

more likely just following an established policy that applies to all such information, to take personal judgement out of the equation. almost certainly absolutely nothing to do with your “file” in particular.

>Another requested I email them from an email address they had in their records

so our email providers can get all this data? They can DKIM sign such a mail and simply never have the relevant emails show up in your emails.

What items did you redact from your ID/passport?

I sent a driving licence and redacted the licence number, then overtyped the whole thing with the text in my previous post.

My goal was to make certain that if it leaked, the overtyped expiry date should make it useless after a certain point, and the company name should make any company other than that one question the source.

As a side effect, it'd also clearly identify the source of any leaks - but that wasn't my primary goal.

I moved the text so the name and address (which they already had) and date of birth (which isn't really a secret) were clearly readable.

>My goal was to make certain that if it leaked, the overtyped expiry date should make it useless after a certain point, and the company name should make any company other than that one question the source.

>As a side effect, it'd also clearly identify the source of any leaks - but that wasn't my primary goal.

Wouldn't a malicious actor just add block text over your text saying "only for identity verification for SomeOtherCompany" ?

They could do the same for the expiration date, although I wonder if any company actually bothers to check the expiration date

I don't know how GP did it, but I would have made the text translucent and make sure it covered the entire image so that there's no easy way to obfuscated the watermark without also obfuscating the license itself.

Ahh, of course - I didn't think of that. Thanks, obvious now that you mention it :-)

translucent text is nearly trivial to remove

Great questions! This conversation is already pretty mature I think. Take a look at Bruce Schneier to see some great synthesized commentary on the questions you're posing.

There is a pretty great talk about how companies fail at this - https://www.youtube.com/watch?v=gd3DiRuvr8A ("SOUPS 2019 - Personal Information Leakage by Abusing the GDPR 'Right of Access'")

"Of the responses, 24 per cent simply accepted an email address and phone number as proof of identity and sent over any files they had on his fiancée. A further 16 per cent requested easily forged ID information and 3 per cent took the rather extreme step of simply deleting her accounts."


Examples that I've seen with GDPR requests are:

* Scan of ID

* WebID - video call where they take screenshots of ID and your face

* Cookie - kind of makes sense since 3rd party web trackers Quantcast rely on cookies to identify

If your a company you can buy the data. Just need name and address.

The liability would be the same as a GDPR dataleak, or comparable. If the data leaked is sensitive enough, you might get fines and punishments by other regulatory bodies or industry bodies (PCI audit might be failed next time, so no card processing anymore).

The victim might be able to sue for damages incurred by the identity theft.

In my experience, some companies place high dilligence on the process. A bank I requested information from sent a postal letter containing a code to my mailing address. The Germany Postal Service has a special service to allow identifying people, so the letter contained the code and then I had to bring part of the letter back to the next postal station along with my ID card.

Once they got the confirmation of the ID card along with the code sent to me, they sent a CD-ROM with the information encrypted and the password via mail.

Wow, that's a good point.

I am surprised these guys are able to operate outside the normal credit reporting laws. He’s referencing things that happened in 2009, which is well outside the usual 7 year limit for credit report data.

Clearly, these companies are going to make the argument that this is not credit report data subject to consumer credit laws, but I’m curious if that has been tested at all. I would think an enterprising lawyer could make that argument.

Seven years only applies only to certain delinquencies, not all information in general. Most "aged" reports will have information dating back well past seven years (loan origination dates, address history, former name(s) used, etc.).

Usually a closed account will stop being reported to credit bureaus 10 years after the last activity on that account. I don't know if that's due to law or custom. My own credit report contains accounts closed more than seven years ago but less than 10 years ago and it does NOT include the one account closed 13 years ago.

The impetus for a company like Sift was better fraud detection. I haven't evaluated the platform in some time, but if I remember correctly; it democratized the fraud decision process by feeding Sift data on successful orders. Clients would score their own orders based on whether or not it was returned, charged back or simply a successful no-friction order.

Something like 80% of customers have issued a chargeback, 86% of chargebacks are "friendly fraud", increasing at a rate of 41% every two years. The dollar figure I've heard is $20+ Billion in friendly fraud.

So, obviously there's an altruistic nature to why a company like Sift has this data. But, I'm of the opinion that they saw the dollar value in this type of scoring and collection system.

Well sure. And a pretty core impetus for a company like Equifax keeping data on loan repayments is also better fraud detection. That's even more true in the insurance business which uses consumer credit agency data extensively.

Fundamentally what they are doing seems like the same thing. They're keeping a file on people that corporations are using to decide if they're trustworthy.

We decided a long time ago that consumers need to have the right to see and challenge that data for accuracy, as well as to have limits on how long it can be held against them. I see absolutely no reason why those principals should not apply here as well.

And it seems at least arguable that they already do. Clearly there's arguments on both sides, but reading the basic definitions here on what comprises a credit report and a credit reporting agency, and the prohibition on reports more than seven years old, it seems like a non-frivolous case could be made:



Aren't the credit reporting laws pretty strictly scoped to making decisions about loans? I imagine these companies are pretty explicit with their clients that scores can't be used for those purposes.

They’re definitely used for rental applications and tenant evaluations, which seems like a very short hop to Airbnb.

They are also used as a factor in determining car and homeowners insurance premiums in most states. (People who pay their bills on time are less likely to make insurance claims)

> Aren't the credit reporting laws pretty strictly scoped to making decisions about loans?

Absolutely not. The Fair Credit Reporting Act is one of the laws governing background checks for hiring (in addition to credit reporting), for example.

(and if it comes out that the reports referenced in the article were used by someone somewhere to reach a hiring decision, I expect lawsuits will quickly follow)

They’re able to operate this way because everything has been secret. With the new California law it’s probably just a few months until there’s a class action against any or all of these for discrimination against multiple protected classes.

Two points: First, the very act of requesting your data is in a way confirming and verifying the accuracy of the data.

Second: Every prescription you've ever filled with insurance - and even some without - is recorded by companies like Milliman.[0] When you want to buy life insurance, health insurance, etc. they can request to see what medications you're on, have been on, etc.

[0]. https://clark.com/insurance/how-to-see-your-secret-health-cr...

how does that not violate HIPAA ? I see it talking about "you can proactively opt out with hipaa" but everything I've ever seen about HIPAA is that all "opting in" needs to be explicitly granted by the patient.

If you want life insurance then you are required to give permission to view this data about you. If you don't want to give permission, then you don't get life insurance.


>How It Works

>1 Applicants sign a HIPAA-compliant authorization, enabling insurers to retrieve their medical information

>2 Insurers electronically query Milliman IntelliScript in real-time

>3 Milliman instantly gathers information from multiple data sources

>4 Irix interprets the data and generates automated decisions based on the insurer's guidelines

But I never gave Milliman permission to hold this data, so didn't they already violate HIPAA?

I think you misunderstand HIPAA. As long as they have a business associate agreement with the pharmacies and serve some vaguely care-adjacent purpose, the pharmacy can share your data with them without your knowledge or consent.

I can't tell if you're being cynical or serious, but if the latter I can't see how this is correct.

From "Pharmacy privacy Requirements here [1], I don't think "business associate agreement [with] some vaguely care-adjacent purpose" meets the standards for information-sharing. Rather the information must be being shared as part of specific treatment for a patient (discussing actual care) or payment.

[1] https://www.uspharmacist.com/article/hipaa-privacy-security-...

GP comment is being deadly serious and not at all cynical or sarcastic.

HIPAA is a fig leaf. Cardboard covers on clipboards to inconvenience the nurses and receptionists, but a unencumbered infobahn for anyone who touches the money to drive straight through.

For decades, every visit, test, procedure, and medication you've ever had paid for by a health insurer in the US got dumped straight into MIB, where any insurer could look at it. HIPAA functionally changed this not a whit.

> HIPAA has a rule that permits disclosure of PHI for health care operations, treatment, and payment.

I think this is what the GP was referring to. It's not just for care but for payment and operations too. A few operational examples, in the case of pharmacies, they must make sure that a patient doesn't fill the same prescription twice. Health insurance companies receive clinical data from health care providers for the purposes of HEDIS reporting.

Don't forget, the second P in HIPPA is portability.

Sounds like they have hooks into the back end systems to query multiple insurance companies and pharmacies on the fly. - "Milliman instantly gathers information from multiple data sources"

Likely deep within the terms and conditions you agreed to with your private health insurance company, you did indeed give permission.

I imagine somewhere in their quoting algorithm they have a conditional like `if potential_customer.opted_out_of_hipaa then quote = max_possible_quote, msg = 'you can lower your quote by sharing your data with us'".


I'm guessing you opt-in when you request a quote.

It still sounds sketchy since it sounds like your data is maintained at a 3rd party for a purpose that is not intended (unrelated to your care) on the idea that you may one day opt-in to the unintended use.

We are all most likely explicitly authorizing this activity in the reams of fine print you sign without reading or otherwise accept by agreeing to service by your local hospital group or pharmacy combine.

> First, the very act of requesting your data is in a way confirming and verifying the accuracy of the data.

Could you explain what you mean by that? And also hypothesize some implications of 'confirming and verifying the accuracy of the data'?

It's obviously contributing more data with a high probability of being all signal and zero noise. Some government IDs even include social security #s, but you're adding a photograph.

In the sense of confirming the data, it seems likely to at least attest to the identity being a genuine one associated with that data. Rather than something like a burner phone setup with a bunch of random stuff. Of course people can also have a fake photo ID, but it seems unlikely someone using a totally fabricated existence would be querying for this information and exposing them potentially to closer scrutiny.

For example, checking if you ever had an account at companies mentioned in your data takeout - to spot if someone does shit while using your name/personal information, or some incompetent/malicious clerk mixed up data.

That doesn't verify any data though. At most, it implies a likelihood of having an account, but nothing about the specific data.

If they're uncertain of your middle name for instance, uploading your ID just confirms that they had it right or wrong.

And the history of every life insurance application an individual filled is collected and shared through a company called MiB[0] to help mitigate fraud.

[0]. mib.com

> Two points: First, the very act of requesting your data is in a way confirming and verifying the accuracy of the data.

Same dilemma as hitting the "unsubscribe" link on spam: when doing so you confirm them you have a valid email address. It's a pure act of faith, you hope that they will comply and not take advantage of you even more.

How is that even legal? Isn't patient data specially protected?

Your healthcare providers, pharmacies included, can sign BAAs with other firms that allow them to share the data. Your consent is not required.

The fact that the data is being sold to third-parties (e.g., Sift) by their collectors (e.g., Airbnb) is troubling. But what is even more troubling is that we don't know how scoring companies (e.g., Sift) are using the data and how they are generating scores.

Their models, if they are using ML, are opaque. Journalists haven't yet cracked this nut, instead just reporting on the fact that company A has bought personal data from company B. (That is likely to require anonymous leaks from the scoring companies.)

I think the hacker ethos could be applied to this problem by viscerally illustrating the threat. ('Hacker' used in the same way as the infosec community.)

Hackers could request their own data, hypothesize what could be gleaned from it, and use models (potentially academic ones trained on more general datasets) to produce derivative information.

Then hackers should make tools to make this process easier for the average journalist or consumer.

> The fact that the data is being sold to third-parties (e.g., Sift) by their collectors (e.g., Airbnb) is troubling.

It looks like Airbnb & co paid Sift instead, in addition to sending it their data: "Sift has this data because the company has been hired by Airbnb, Yelp, and Coinbase"

My guess is that Airbnb will stop using Sift after this article is published, because of bad press Sift generated for Airbnb by giving Airbnb messages back to the original user (the author of that article).

Imagine Airbnb changing their behavior because of bad press.

But the data is already out there and no longer working with the company that has it doesn't make it go away. If Sift goes out of business, this data won't go away.

I'm starting to feel like these PII databases are the superfund sites of the electronic world. Even after the company goes out of business, we're dealing with the damage decades later.

I created a free service which makes it easier to get Sift (and others) to delete your data: https://opt-out.eu/?company=sift.com. It automates the process of sending GDPR erasure requests. I sent my request this morning. Support for GDPR SAR requests (get a copy of your data) is coming soon.

Your website styles are off slightly, such that when I zoom in, then zoom back out, there's white margin at the left, next to any colored backgrounds. Also, on too small a screen, the browser extension text goes off both sides, rather than reflowing and fitting the screen size...

I just tried requesting my information from one of the links provided in the article. As part of the process I had to upload an image of my government-issued ID. After that, I was told to expect an email confirmation link that I would have to click on before they could proceed. That was an hour ago and the email has not yet arrived.

I don't really have any reason to suspect that this is a scam, but I can't help but notice that if one were to set up a phishing site for government IDs the UX would likely be indistinguishable from what actually happened here.

To get this personal information, asking for government-issued ID gives them one more data point on you and reinforcement of identity. So a win for them as well I guess?

This might be entirely legit, and legally mandated.


It sounds like "KYC", i.e., know your customer.

KYC is only to prevent banking fraud, i.e. money laundering. The SEC is the Securities and Exchange Comission, who govern banking, the trade of financial securities (stocks, bonds), etc.

These data brokers do not handle your money, and therefore do not need to "know their customers", i.e. have no legally mandated right to ask for your identification, at least according to this statute.

For one, I think it's super weird that they need a government issued document to know who I am, but are perfectly happy to sell my data, marketed as accurate, to third parties.

For the record: I finally did receive the confirmation email, two hours after making the initial request.

Did you receive any information yet?

No, but I would not expect that so soon. (A confirmation email, by contrast, generally arrive within minutes. A two-hour delay is very unusual.)

Same here. Did confirmation and sent emails to all the other services linked. Will post here when and if they reach out.

They have to verify your identity somehow, or anyone could get at the data.

Ideally you'd verify your identity to your government and they'd provide a limited time verification ID you could use for this sort of thing.

> They have to verify your identity somehow, or anyone could get at the data.

Yes, but the mere fact that I possess a photocopy of my drivers license does not prove that I'm me. That is manifestly true because they now possess a photocopy of my drivers license and they are not me.

Well, you don't have to give them a perfect copy of your DL. They now posses a heavily redacted and aggressively watermarked image of my driver's license that is unlikely to be of any use outside of this specific request.

Heh, now you tell me. :-/

[UPDATE] If the heavily redacted version was good enough to convince them that you are you, then it will almost certainly be enough for them to convince someone else that they are you should they choose to do.

Right. The redaction is to avoid giving them any information (no matter how seemingly useless) that I don't want to give them if they don't already have it (e.g. eye color, weight, etc).

The watermark is what (hopefully) limits the usefulness of the image, by stating when, who and why the image was furnished in a way that is relatively difficult to manipulate.

Identity thieves are masters of faking ID scans, their livelihood requires it. I have no doubt that an experienced person could get these reports on my behalf without me ever knowing.

It’s appalling to me how much we depend on easily photoshopped pictures to prove identity. I want a smart card photo ID in my country.

> When I told Mr. Tan that I was alarmed to see my Airbnb messages and Yelp orders in the hands of a company I’d never heard of before, he responded by saying that Sift doesn’t sell or share any of the data it has with third parties.

What a line to say with a straight face. They are the third party that he's uncomfortable sharing his data with!

With what we know about the likelihood of major data breaches, even if you liked Sift, and thought their purposes were noble, you would have to admit that they are one big sitting duck for a targeted hack that could provide you with massive amounts of aggregated information on an individual.

Could Sift be the next Equifax?

I have all of your personal data. But it's OK; I don't share it with anyone outside the Milky Way Galaxy.

At least, it will take a while for our radio transmissions to reach that far. By the time it does reach another galaxy it will be a part of history.

And the first party in the situation is being referred to as the third party.

We, in the US at least, need a privacy Bill of Rights.

Electronic transmission of personally identifiable information and the storage and mining of that data has so many permutations, and technology is so far ahead of legislation consistently, it seems like it’s time for a proper governance framework that exceeds any particular industry, and that has to be based around the individual (I think, there’s more to that).

The Sift report is based on your email address, among other things. My email box is full of random people's insurance quotes, medical and electric bills, birthday greetings, and welcome emails to services I've never heard of. We all know the company is going to spend approximately zero time to sanity check or sanitize that data firehose. Turns out the movie Brazil was a documentary.

I have this problem too. E-mail addresses are terrible identifiers.

So many services fail to implement validation loops, and their customer support teams have no process when I call. The presumption everywhere is that a provided e-mail address is correct. Even after I explain the situation, some representatives will refuse to help because I couldn't validate a birthday or last-four of a credit card.

Out of curiosity, do you have a particularly simple email address? Over the years, my various email addresses have been some combination of my name and separator characters and I've had exactly one time where I got someone else's confirmation for an airline booking. For most people, I think email is a reasonable identifier. For people whose email is "bob@gmail.com" or the like (who are probably more common on HN than in the general population) not so much, but I'm pretty sure they're the exception not the rule.

I have my first and last name at gmail.com, with no middle initial. There are approximately 1,700 people in the United States with the same first and last name. I receive e-mail intended for others daily—personal correspondence, account-related messages, utility bills, quotes for work, messages from schools to parents, minor league events, mortgage documents...

(Gmail ignores punctuation in the local part, so I receive e-mail sent to my name both with and without punctuation.)

“I don’t really care that these data analytics companies know I made a return to Victoria’s Secret in 2009, or that I had chicken kebabs delivered to my apartment“

People really should care. There is so much data about us being sold without our knowledge. A while ago there was a discussion here that your full salary history is available to be bought.

All this stuff is super creepy and you may increasingly be outnegotiated or rejected by companies that you don’t know that they have your info and that you don’t know that info even exists. For example I find it scandalous that Airbnb messages or order histories are being passed on. That’s just not ok.

Dating apps too? Wtf, imagine if they knew your preferences were same sex and you flew to Dubai or China and you’ve criticized them, they could sell that data and get you arrested or honeypot you. Great way to get dissidents overseas with other compromising information

It's a fun game: Brainstorm hypothetical horrible abuses of private daat, and then find a news article that confirm the abuse has already happened.



This is incredibly high-value data to have on politicians and other high-profile people, too. Say, suppose you had a congressperson's dating app information in this report, and the usage overlaps with the period of time they've been married and in office. Great way to compromise them.

Oh boy. Considering that the most authentication that happens seems to be based on government IDs (which are well within the capability of any government to fake in such a way that a random startup would be fooled), this could actually be a serious problem for any dissident who is high-profile enough to attract personal attention from a state.

(On that matter though, how hard is it to fake a convincing scan of a government ID? Do any GDPR data controllers actually verify with the authorities that John Example has a passport with number soandso that expires on soandso?)

I was at a loss for why Sift was collecting data like this from companies like airbnb / etc, I worked a project using them around curbing some pretty gnarly levels of credit card fraud. I think it must be that these companies are utilizing their user content fraud scanning (“content integrity”) systems. I don’t know about calling this a consumer score though, but it is truly frightening if that’s how companies are utilizing that technology. I really enjoyed working with them — so I might be biased to see the good here — but they totally pushed back on biz folks on our side when they tried to nudge on things that the technology was not designed to do. So I would imagine / hope that they would have done the same in the case of something like constructing a “consumer score”, the tech is for flagging outright fraud, not for relatively scoring how good a customer is...

A binary score is still a score.

There's absolutely a difference between a binary "fraud/not fraud" flag and a continuous variable for quality of customer. They measure different things and have very different use cases.

Is there a perceptible difference to a consumer who has been flagged as "fraud" when they are not, in fact, someone who has committed fraud?

If not, then there's no difference.

They are literally two very different types of data. This isn't an ambiguous issue: One is a continuous variable, the other categorical. Whether or not one or the other could erroneously flag someone doesn't change that, and doesn't mean they're the same. Because the nature of those differences means the use cases are different. A continuous score has much wider versatility than a black & white binary value, and allows for more nuanced use. Even if it's just in the realm of fraud detection, a continuous score allows for more safeguards, e.g., if it's low but not too low then it triggers further review rather than outright rejection.

You were replying to a comment that said "a binary score is still a score"

The comment you appear to be arguing with, is completely true. A binary score is of course a score, and in fact that's almost always how credit and trust scores are actually perceived.

From the company's perspective many scores are continuous, but from the consumers perspective that's mostly a distinction without a difference.

Usually from the consumer side you're being told you got the job or didn't, or got the credit card, or loan, or apartment, or didn't.

I never said it wasn't a score, my issue was the implications that, both being scores, they were somehow equivalent. They are not. A continuous score may be used with a threshold to perform the binary categorization. In fact that is all but guaranteed. But the continuous score, as I stated, has more possibilities for nuanced use. The binary score is derived from something continuous.

And you aren't usually told you got the loan or not. That is one possibility, but the more likely one is the continuous score of the credit rating translates into a continuous score for the interest rate. This is why the distinction between a binary and continuous score is important: t This continuous assignment of interest rate isn't possible with a single binary variable.

Categorization into groups that you will or will not do business with is scoring. Pedantically explaining what sort of scoring it is doesn't change that.

Except that sort of will/won't is the only possible result of binary variables, while continuous allow more possibilities, like assigning a continuous variable interest rate on the basis of a continuous input.

I wasn't explaining some basic aspect of statistical variables for it's own sake, in this context it is not some pedantic hair splitting, it's a relevant aspect of the types of things that can be done with this data.

First off no one is saying "you look like fraud", they're saying details of your purchase looked suspicious. This is the first place where people need to pump the brakes, there is not some dire sinister plot, you went to make a purchase, it looked weird, we want to make sure it all checks out. This happens _all the time_ with credit card purchases, historically from the card issuer, but card issuers are actually encouraging retailers to be more vigilant in the face of massive increases in card fraud, so now it's distributed. Honestly, whenever I've had the minor inconvenience / world-concept-shattering experience of having a card transaction declined & needing to go through around 15 mins of rectification, I've actually appreciated that they are running systems to prevent rampant fraud, because it's an even bigger pain in the ass to clean up after that than it is to endure some extra information checks on rare occasion.

Yes, this is why I wanted to distinguish between a binary score and a continuous one. A continuous one allows for a flag for review at one threshold or an outright refusal at another. It's exactly what credit cards do. A purchase may go through, but it is suspicious so they call you to verify. Or it could be really suspicious so they block it and you have to contact them. It's happened to me both ways.

Seriously. If companies are relying on the fraud estimates, it isn't going to be fun to be a false positive.

I mean people have been subjected to fraud scoring for decades, you get declined, you follow up, and you get in w/ a new data point for the next time.

They've been subject to fraud scoring across a large group of disparate companies for decades?

I would say that's absolutely not true at all. And to the extent that it is true, via credit reporting agencies, it's extremely aggressively regulated to allow consumers to see and challenge how that data is used, as well as to completely opt out of that system.

Credit card companies use your past history across merchants in risk scoring transactions, yes, and they also pull in data from external sources that are outside of their walls to further enrich that data.

Anyways -- this is all massively orthogonal to what I originally wrote, which was around Sift being used as a consumer scoring system. I shouldn't have "taken the bait" so that's my bad.

Not when these new automated systems don't allow for a follow-up because it would be too expensive to code or hire a human to do it.

How many companies don't have a support@____ email address? Seriously, they'll figure it out, it's really not that large of a haul to get stuff corrected, and it's the difference between "we can't take credit card payments" and "a very low-percentage of real users have some extra friction / bounce somewhere else".

Right, it's not a score, it's just a thing you have to argue with them to change. Not a score though.

Anyone know if:

1. This information was shared in accordance with the privacy policies and user agreements in place at the time the sharing started or were the policies retroactively updated?

2. Any company actually been prosecuted / sued successfully for violating their terms of use sharing data with third parties? Have any users actually received $$, or just regulators via fines?

This will be near impossible to answer of course unless you were actually involved in the sharing with Sift, but it seems to me the more this happens with all of our data, along with the total lack of enforcement of any lapses / breaches these kind of problems / proxy scores will only get worse and more difficult to reason about as a customer.

Just requested all of my data from the companies listed. I'm very curious to see the data they return.

I'd frankly be surprised if your request for records doesn't become a juicy data point on the record.

I agree.

Similar companies are the credit reference agencies - Equifax, Experian, TransUnion.

Every time you request your data from credit reference agencies, the request is logged. My log has many such entries, due to repeated checking I did while trying to get corrections sorted, and due to the third party companies I used to help with this.

So I'd expect the same to be true for the customer rating agencies.

In the case of credit agencies, they say that information ("soft" enquiries) is not used to assess credit risk - and that it's either not made available to companies that process applications, or must not be used by those companies in the assessmment.

To be honest, seeing the kinds of errors I've seen, as well as seeing the inner workings when it is being corrected, some of it shows very shoddy, and in some cases seriously unethical processes (that the companies know about).

So I simply don't believe that companies are diligent about following the "must not be used" rule for data they "may" receive and are supposed to ignore. To convince me, it would require a level of auditing, or quality of audit, that companies plainly are not getting.

And these are companies I still do business with because they are good enough. Goodness knows what to think of companies I wouldn't do business with, if I knew about them and had any choice in the matter.


How did you submit your government ID securely? Email?

The only company to request proof thus far in the process is Zeta Global which presents a secure portal to upload your ID. However, to be honest, I'm not that concerned about emailing my ID if necessary.

On the Sift Google Form:

"In order to process your rights request, Sift’s Privacy Team needs to collect the below information about you. We are unable to process your request without complete submission of this form and a copy of your valid government ID for verification purposes."

They havent provided me the Google form for Sift. I'll still submit it to them though. I've once been a Coinbase customer so they likely have it already.

Sift has since taken this down and replaced it with an "email us for instructions" text. Clearly not liking the press they're getting.


But does that content submitted securely through this form ultimately land on a publicly accessible cloud bucket just waiting to be discovered?

The data breaches from these companies are going to be an awful spectacle.

For all of these linkages there needs to be a key that associates data from disparate services together. So what’s the key? Email address? IP addresses? Credit cards? Is it implausible to try to take steps to circumvent these firms’ abilities to connect our dots?

Quite possibly a composite key comprising name, gender, age and locale.

"87% of the U.S. population is uniquely identified by date of birth, gender, postal code." [0]

[0] https://dataprivacylab.org/projects/identifiability/paper1.p...

I would expect it's more complicated than that. I doubt that they just start over every time you move to a different city.

I seem to have slipped through these cracks easily. When I do anything that involves pulling my credit, it's a hassle because I give my correct address information, however the credit agencies have out of date information.

Tips to slip through the cracks: have a different mailing vs home address, dont apply for new loans/update work info with existing creditors over more than one job cycle, move frequently. When your credit card asks how much you make, they are doing so because the agency doesnt have good info on you and they are updating their files.

The next natural step is a unified score, social credit score, no?

Well it sure seems like we managed to have a social score, the American way! Privatized.

I would be more interested in knowing how the companies scoring customers operate. If their goal is to detect fraudsters they must be able to aggregate accounts with a different name and/or email from different systems, as surely a fraudster will use a different identity on each service?

I think the point is exactly that fraudsters have no history. The system probably starts with a low score that increases as more data is collected. And then it can actually score you on the data rather than a "probably fraudulent account" flag.

>“We’re not looking at the data. It’s just machines and algorithms doing this work,” said Mr. Tan.

That's still looking at it. When are we going to stop letting people get away with this lie?

So Airbnb was sharing all your messages to hosts with Sift? Food ordering apps were sharing all information about every order with them?

Selling is probably the correct word... as in "They are selling the information like facebook sold user's private messages."

It says so in the article. Those companies actually pay Sift to do fraud detection. So no, not selling.

My focus was on them transferring it to third parties, regardless of any accompanying money flows, so sell vs share doesn't matter to me.

Since I didn't provide the relevant context in my last post, here's the quote from the article:

>As of this summer, though, Sift does have a file on you, which it can produce upon request. I got mine, and I found it shocking: More than 400 pages long, it contained all the messages I’d ever sent to hosts on Airbnb; years of Yelp delivery orders; a log of every time I’d opened the Coinbase app on my iPhone. Many entries included detailed information about the device I used to do these things, including my IP address at the time.

The fact that coinbase is sharing this information with some third party is absolutely infuriating. I almost don't believe that.

At this point assume that anyone getting your data anywhere is sharing it. Not a question of if.

Looks like Coinbase is buying the data (Sift Score) to help them make better decisions on account takeover vs. not account takeover, credit card fraud vs. not credit card fraud, ACH fraud vs. not ACH fraud.

In this context it makes perfect sense. Unless they force 2FA for every login, how else are the going to protect good users from account takeover. Same goes for buying crypto, they need a tool to help determine if someone is using a stolen payment method or not.

The article author claims that part of the information they got from sift was information about one of their own logins. So it would appear that coinbase is sharing information.

The reason I'm so flabbergasted by this is that this seems to really, really damage account security. Now there is one company that has a massive profile on me, that also knows very specific details about when I log into my account, from where, from what devices, etc.

Completely unjustifiable imo.

This should help everyone better understand if interested: https://sift.com/developers/docs/curl/apis-overview/overview

Sift makes risk predictions in real-time using your own data and data from the 100s of millions of users in Sift’s global network. Our machine learning systems identify patterns of behavior across thousands of devices, user, network, and transactional signals. These are often patterns that only a machine learning system can spot. Using Sift, businesses have stopped 100s billions of dollars of fraud worldwide.

There are many abuse use cases that Sift can stop:

Payment Protection - Reduce friction at checkout to increase revenue and stop chargebacks before they hurt your business.

Account Abuse - Stop fake accounts from polluting your service and block bad users before they harm your business.

Account Takeover - Stop bad actors from hijacking users accounts. Keep your users safe and ensure that they always trust your service.

Content Integrity - Stop spammy and scammy posts from polluting your service. Keep your users safe from malicious actors.

Promotion Abuse - Make sure you’re only rewarding real users by stopping referral rings and repeated use of promotions.

Sending Data to Sift

To use Sift, we need to know about your users, what they do with your service, and what actions you take in response to your users. This includes:

How your users are interacting on your website and/or mobile apps (eg what pages they are visiting, which devices they are using, how long they spend on each page, etc). We automatically collect this data when you add our JavaScript snippet to your website and our Mobile SDKs to your app.

What actions your users are taking, usually key user lifecycle events (eg creating an account, placing an order, posting content to other users, etc.). You will send this data from your application to Sift via our REST API.

What actions your business is taking in response to users (eg approve an order, ban user due to fraud, cancel order due to chargeback, etc). You will also send this data from your application to Sift via our Decisions API.

Coinbase is a massive fraud target.

All this makes me wonder: How long until being a privacy-minded individual screws you over in the same way that paying cash for everything screws over your credit rating?

I shouldn't have to choose between privacy and being able to return a device at Best Buy.

If you don't have a credit score or a credit score of 0 due to a thin file, you can get one by using a cash-secured loan of $500. The loan amount doesn't really matter so you can pick the smallest loan they offer. Then you're basically buying yourself a credit score for $50 or less. Don't use online lenders because the one I used (selflender.com, name & shame) kept my $500 principal despite claiming over and over that they sent a check that never arrives. If you do this, use a CU or bank branch instead so that after you pay it off, you can stare them in their money-loving bankster eyes to politely demand your $500 back.

I asked Mr. Tan how many people had requested their data from Sift since the company introduced the option to get it.

“Honestly, we haven’t seen much of a response,” he said.

Me thinks that will soon change after hitting HN

How to get your data

There are many companies in the business of scoring consumers. The challenge is to identify them. Once you do, the instructions on getting your data will probably be buried in their privacy policies. Ctrl-F “request” is a good way to find it. Most of these companies will also require you to send a photo of your driver’s license to verify your identity. Here are five that say they’ll share the data they have on you.

Sift, which determines consumer trustworthiness, asks you to email privacy@sift.com. You’ll then have to fill out a Google form.

Zeta Global, which identifies people with a lot of money to spend, lets you request your data via an online form.

Retail Equation, which helps companies such as Best Buy and Sephora decide whether to accept or reject a product return, will send you a report if you email returnactivityreport@theretailequation.com.

Riskified, which develops fraud scores, will tell you what data it has gathered on your possible crookedness if you contact privacy@riskified.com.

Kustomer, a database company that provides what it calls “unprecedented insight into a customer’s past experiences and current sentiment,” tells people to email privacy@kustomer.com.

Just because the companies say they’ll provide your data doesn’t mean they actually will.

Make it illegal to aggregate data about a person.

"You're the product"

So if/when these companies get bought out, go out of business or liquidate their assets in a downturn, then what?

Until they legislate some sort of data expiration date or "do not track lists (similar to do not call lists), it looks like the onus is on the individual to protect their interests/data.

Having your drivers license fall into the hands of a bad actor must be a complete nightmare.

Uploading a photo of government ID is also the backdoor to all sorts of things. The one that worries me is my Facebook account, including all my private messages.

A better idea: the data collectors should ask me for a notarized letter confirming my identity (the notary has seen my government ID) which references a unique support-ticket number from the data collector (avoid replay attacks). The data collector would verify the letter out-of-band with the notary too? I’d be a lot happier with that.

What else would such a pen and paper protocol need to work? The notary network is a handy piece of national infra — it can be trusted in this way? Legislation would be needed to force the data collectors to adhere to the protocol.

Notarization essentially puts a $30-$100 price tag on the operation, which meets with a lot of resistance.

Considering how many people are notaries and how easy it is to become a notary, I don't think it's any form of security. It's just an artificially created market for people to make quick bucks signing documents.

The point of a notary is that there's a witness to you signing papers. You can't serve as an unbiased witness if you're the person signing the papers.

Notaries typically cannot notarize their own documents.

So get your friend to be the notary? It seems like an incredibly insecure system to me for the 21st century world when we have much better ways, e.g. signing keys.

It shouldn’t be that expensive. Which state do you live in? I’ve had documents notarized for only nominal charges, through a notary I shared an office with in California.

Most banks and credit unions provide their customers notary services for free or low cost. Even Bank of F̶e̶e̶s̶ America provides free notary services.

That sounds high. In New York, at least, a notary can’t charge more than $2.00 in most circumstances.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact