I think the issue right now is that private user information is viewed as an asset, not a liability. If we could find a way to make it more of a liability, companies would be less likely to collect it just for the sake of having it, and they would be more proactive in securing it.
Alternatively, if it's truly an asset, can it be taxed as an asset?
If I give a company a car, that is taxed. If I give a company my data which is worth more than a car, it isn't.
Is it possible that current accounting/tax law can be interpreted so that these are viewed similarly?
Using the black market as a standard, your identity-related information isn't worth enough to be taxable.[0][1][2]
The more common data you give away is worth even less. Your "gift" is akin to giving away a few grains of sand to a glassmaker who provides a free grain counting service.
Now let's say you dumped a lot sand that we could value at $10K. Any smart sand-counting glassmaker will claim his once "free" sand counting costs $10K, which amounts to an equal, zero-profit trade.
Value is derived from user data when its used to target ads. Black market data is never used for that purpose, so its value is much lower. (A company would never take the risk of using black market data)
You (and every other responder) miss larger the point of my comment. Let's use Google as an example. Your clicks throughout the internet, like sand, don't amount to much of value. It's a very unrefined, raw material, with limited quantity. Even if Google were forced to value that raw material, they can argue they're trading it in equal exchange for whatever service they offer you, so there would still be no tax.
In any situation, potentially derived value is not taxable. A car is worth whatever a car was bought/sold for, not including some hypothetical such as whatever I could make by driving it for Uber/Lyft. What matter's here is what will actually occur in the transaction. If Equifax chooses to sell its data, that income will be taxed at whatever price Equifax chooses to sell the data.
Note this doesn't change that your "gift" of peanuts of data is not taxable because (a) your data alone isn't worth squat, (b) even if it was, you got something in exchange for it.
I'll go one step further in saying the discussion framed around clicks being interpreted as a product is incorrect altogether. I think user metadata is part of a users identity and the friction we run up against is whether it ought to be legally protected. It's currently not illegal to sit outside a restaurant and records information about all of its patrons. You'd certainly be in hot water if you tried to do that at any federal building. At some level we know collecting that data is wrong because it can be used against us. Even the judicial branch knows this and requires the storage of user data to be encrypted by security agencies. That's not conclusive proof but evidence of our general outlook on the legality of tracking people.
If we agree in a truly free society then collecting and monetizing metadata should be illegal. If we don't mind giving up that freedom then there's nothing wrong with companies creating a profile on you and tracking you no matter where you go and what you do. But the internet has spoken and we're gladly, albeit unknowingly, giving up any right of protection. I find it worrisome to think of what society will be like in another 50 years if nothing is done to curtail the fleecing of user data.
Like data in general, user data is essentially worthless until aggregated en masse and refined into insights. But when you consider how much data companies are hoarding, it doesn't take much of an assessed value to create a nontrivial taxable asset.
Also, it doesn't matter that there's an exchange happening. Sales tax and income tax are assessed on fair exchanges of goods, services, and currency.
I shouldn't have used vehicles since they are a special case in some states. At the federal level, taxes on barter income match my description. [0] Taxes are still zero if the company can claim the fair market value of the services they offer you equal the value of your clicks.
Never heard of this, probably because in my state it doesn't apply. In 42 out of the 50 states (excluding California, Hawaii, Kentucky, Maryland, Michigan, Montana and Virginia [Oregon has no sales Tax]) however, this does apply.
Still though, the example is a good one. You're trading your data for a service. There isn't anything to tax.
It isn't clear where the data comes from. A business in a non-US friendly jurisdiction can make an online business selling black market data in a way that looks legal to data warehouse groups who in turn sell it to the company using it. Once you get to the actual company using the data, there is no indication of it being black market data. The middle company, if it is buying enough data, only takes a few employees who care more about getting the data than verifying the ethics.
It's kind of like arguing a bank would never create fake accounts because the risk of doing so is too large.
Value is derived from the potential application of data. Ads as an application isn't worth much since you can still be shown ads just fine without any personal data targeting.
Black market data is worth way more because it's often more personal than just demographic markers and interests, and can potentially lead to large sums of money.
Personal information is less valuable on the black market due to the difficulty in monetising and extracting the cash. If I have your bank account login details I can move cash out of your bank, but almost no hope of sending cash from a U.K. Domestic savings account to my friendly philapines bank over the web UI. That's why Nigerian Princes still send out emails - the find the one idiot willing to walk into his bank and move the 5million that just arrived in his account
Alternatively you need real criminal gangs - dozens of people willing to walk up and down a London street withdrawing 5k at a time from a 1000 Pre-prepared cards and put the money in their rucksacks. They don't come cheap. And its still cash and still in the U.K.
Get Amazon to send you two dozen laptops to the same address with two dozen cards. All as "gifts". Yeah right. Now you gotta sell them - fences run at 10% if you are lucky.
The lowest effort are simple impersonation for loans, but still you have to take the money and move it somewhere. Into cash? Into the phillipines? See above problems. Open a credit card account? How to intercept it and the PIN number sent by post?
All in all, it's actually pretty darn hard to take personal details and monetise at the "real money" level. These things stop being scalable. You could probably fund a student lifestyle off any combination of the above but millions - not really.
Cf interesting Microsoft paper on this a few years back
You're correct, of course, but missing the implications raised by the parent poster, and they are important.
Sensitive personal data is necessary but not sufficient to rip someone off. And if you want to try to make a living ripping people off, there is even more business overhead, making the cost of sensitive personal data an even smaller portion of overall operational costs.
From the point of view of the thief, our personal data is a vital but cheap input into an operation that tends to have very high security costs, viciously expensive liquidity issues and terrible personnel problems, among other more quotidian business headaches.[1]
I suggest trying to think like a crook now and then. Trying to try on other people's lives is a useful way of shaking up one's thinking habits, empathy (don't confuse with sympathy) is always useful, and it can help you keep yourself more secure.
[1] I am leaving out things like several potential fates far worse than bankruptcy and related issues because they aren't opex-related, but they probably do effect retirement planning.
The black market value is like 10-$15 dollars for a basic credit card number I believe. But gold cards and plat cards can be several times more. But drivers ID and ss# that's worth way more for what I believe are obvious reasons. My prices might be a little off, haven't checked in ages. You get the idea. Say 130 million cards at $10 each. Isn't that almost one and a half billion. Not chump change. Ad
Well no that's incorrect. A targeted ad is worth significantly more than one without targeting. I buy ads at a $0.25 cpm and a $40 cpm, the only difference is targeting data.
It still holds the data alone isn't worth much. If you've built an ad platform with customers, reach, and the ability to target people given data, then sure, you can convert that raw material into something more valuable. And once you sell the derived product (ads), you'll be taxed on your income.
I feel like you're arguing that dirt is worth as much as the farm that one could build with it.
You can still show the ads and there are a lot of other signals and context to use. Also other than Facebook or google with strong identity, 3rd party data on the open web is next to useless. If you’re paying $40cpm for data, you’re getting ripped off.
Sorry to be that guy, but: I spend over $5m a year on rtb ads. I literally spend 50 hours a week doing this. If the money I spend doesn't produce verifiable results, I lose it.
For example, that 40cpm is to reach a pool of <1000 users who are in charge of purchasing for networks of hospitals, and my ads are for MRI machines. 3rd party data is unbelievably valuable, probably $1.5 million of my budget goes to data costs alone.
That doesn't add up. I've managed budgets a magnitude higher, know the founders of every major SSP and DMP, and now specialize in B2B marketing for F1000 companies with long sales cycles. If you're really trying to reach a pool that small, open web advertising is incredibly inefficient.
This is well understood by the adtech community and even the flashy new "ABM" companies will tell you the same. 3rd party data is universally terrible. At best, it'll work at scale (of millions) on general demographic details but will definitely not recognize 1000 people on the open web.
That kind of list might work on Linkedin or Facebook with email targeting but it would be easier to focus on niche trade sites without any data, or just use a direct sales team. That $1.5M in data you're paying for would have much better ROI with a good VP of sales.
The targeted viewer may see the ad and start thinking "we need a better modern MRI with whatever fancy feature I read about here" or "we could hire a new MRI from xyz cheaper than our current contract!".
Sometimes you can prod your potential customers into action.
That anyone would be influenced in what MR scanner they buy due to Facebook advert is amazing. It would also explain things I hear from radiographers overseas. Wow. Are you able to state which vendors buy Facebook adverts (I assume you can’t)?
Might be better off spending £5k+ on personal gifts for each decision maker than bothering with Web advertising if it's that few people you're targeting :-)
I may not be a lawyer, but gifts of $5k to induce someone to purchase a thing for their workplace feels like it should count as corruption and bribery.
If someone tried to do that to me, I’d report the attempt to the company lawyer, and I’d doubt the quality of the thing they were selling was as good as the quality of the thing the other poster was advertising.
Welcome to the advertising industry. You'd be surprised what goes on when media buyers control so much money. There are 20 year old planners with control of 7+ figure budgets for major brands - you can bet they're getting plenty of gifts.
I'm sure he meant what he said. Do the math. He was trying to point out that it would cost the same either way. But if you give the mark the money, it would have a much greater effect than 5k on silly web adds.
That honestly sounds borderline mafia or cartel style... "We're going to bribe you, you're going to take it, and since we all know it's illegal, you're on the hook with us all. Welcome to the game kid."
"None of us is as [valuable] as all of us," is a saying that has been around for decades, surely there are business rules that have cropped up to support a valuation of this scenario in the meantime.
I'm not saying anything different. Whenever a business/individual sells information at a value, they'll be taxed on the sale, just like anything else. OP mentioned selling a cars worth of his own data, which doesn't exist.
If one company sells anything to another company, that sale is taxed as income like any other sale. So yes, if Yahoo sold 3B records, that would be taxed, and the recipient company can choose to book that purchase as an asset on their balance sheet and will likely expense the purchase. Hell they could even choose to depreciate the value too for as long as they follow GAAP.
OP mentioned his data alone, which isn't worth squat unless the transaction says otherwise. Meaning, if OP sold his data to a company for a taxable amount, he would be taxed on that income.
> Wouldn't black market identities be worth MORE if they weren't so easy to get?
No. It's already illegal to buy and sell identities, so black market demand for identities is likely at a maximum already. I'm just using that number as a proxy for what your clicks on the internet must be worth. I'm basically making the assumption that Value(Clicks) < Value(Black Market Identity), which I'd say is a fair bet.
> So the more we tax / regulate it, the harder it is, the more valuable they get. Win-win for everyone.
Again, this wouldn't be true for black market identities, but let's look at clicks.
What I'm saying is that the click you give a way is worth too little to be taxed at all.
Say 1M click data points is worth $1K (which I think is still very generous given the amount of noise) that means each click is worth 1/10¢. Any company that sells the 1M clicks to another company will pay taxes on the $1K of income. So if you increased taxes, you would discourage them from selling your data to another company. This doesn't change your behavior though as a consumer. You still give away an untaxable 1 click at a time (1/10¢): which is not taxable as a gift because of the size of the amount (even 1K clicks is only $1) and because the company can easily argue they provided you with a service in exchange for that 1/10¢.
What you're looking for is a way to penalize companies for receiving data (i.e. for every data point you gain, you owe some $x in taxes.) This would need to be legislated since that's not currently how tax law works.
> Is it possible that current accounting/tax law can be interpreted so that these are viewed similarly?
Yes, that actually happens to be the status quo.
A collection of data is an intellectual property asset just like a patent, or movie rights, or your brand.
If you buy a database, you will, depending on the costs, have to deprecate it over its useful lifetime. That means your tax burden in the first year will be higher than if you blew the money on the company Christmas party. That's the same as if you bought a software license, or Coca Cola Co.
If you collect the data yourself, that mechanism doesn't kick in. The reason is that it's difficult to value intellectual properties' value unless they're traded, and it would allow for too much manipulation of a company's profits.
Now these assets aren't taxed on an ongoing basis in the way you imply. That's because no assets are, except real estate in some jurisdictions.
Does that mean that your ID number (social security for U.S. readers) is taxable upon receipt (birth or immigration)? Also we will need a birthday tax as your age (a key demographic data point) changes then. A marriage tax, moving (address change) tax, employment change tax etc. Tax law will have a concept of taxable data event much like a liquidity event.
Obviously this is a silly thought exercise but it is fun to think about.
I do not know what any of that means. Googling it led to a bunch of conspiracy sites and equally incomprehensible shady semi-legal advice and advocacy sites
The problem is you aren't "giving" a company anything. The company is observing how you interact with their products.
This is like saying by walking into a store you are "giving" the company your image on their security camera. It would take a very odd definition of "gift" to make that claim.
Sure there is some data companies are collecting of that form but typically it isn't Personally Identifiable Information and even when it is that isn't what people are worried about with the Yahoo breach.
Yahoo was "gifted" data. People explicitly gave them names, email addresses and passwords. That is what Yahoo failed to protect.
> The stolen information included names, email addresses, phone numbers, birthdates and security questions and answers.
The asset concept is interesting. If you introduce taxes into the mix then you will also need to value your asset. If you sell your asset then you need to record a fair value price for which you bought it and when... that way you can record a short or long term capital gain. The problem with digital assets is that you can easily copy them. So what does it mean to sell an asset which you actually still own/have a copy of. Its a bit tough to conceptualize - but I think there's something there... requires a bit more brainstorming.
That’s a fascinating premise. Revenues generated from the use of ones data is taxed like anything else (unless routed through Ireland ;-) ) but I don’t think assets are taxed at rest. I could be wrong.
Indirectly, they are. Governments don't let you blow all your profits on assets that are as-good-as-cash, and then claim you didn't make any taxable profits.
So if you make $X in profit and then use it to buy a tractor, then (from the government's perspective), you've just swapped $X for an asset worth $X. No change in book value, no reduction in profit, no reduction in tax liability.
You are, however, allowed to treat the tractor as an expense that's distributed over several years of its useful life, which is called "depreciating" it.
So yes, to the extent that your cash is exchanged for assets, that counts as a higher book value and higher tax liability (than if it were a pure expense). I don't know if you'd have to treat a "data purchase" more like a tractor or more like buying electricity (a pure expense) though.
> Indirectly, they are. Governments don't let you blow all your profits on assets that are as-good-as-cash, and then claim you didn't make any taxable profits.
Similar experience here:
In an earlier career my company reinvested all profits back into growth, only to learn that the taxman didn't care about such silly things. The IRS demanded the tax from the profits that had been reinvested and were no longer available.
Plus they wanted the tax from the profits of the growth that had only happened from reinvesting the earlier profits that they wanted tax from. Their demands were in excess of the actual realized profit that had been made by the company.
I reached this same conclusion from a very different angle. If you're seriously worried about the unchecked power of monopolies, and understand the effects of Metcalfe's law, we should measure the degree of monopolization of tech companies differently than we do traditional industries. Businesses are locked in to Facebook and Google the same way that businesses were locked in to doing business with Standard Oil in the gilded age. The impossibility for regular users to leave the network makes competition de facto impossible, even if the company does not actively engage in anti-competitive behavior (in many cases, they do anyway).
A tax on social software companies proportional to their network size would be an interesting proposal to solve both of these issues. It would also greatly increase the ability for 100,000 - 1,000,000 person "decentralized" social networks (like Mastodon or other competing networks) to thrive.
This is how the civil legal system is supposed to function. There needs to be some very large class action lawsuits brought against these companies, and huge awards need to be extracted in order increase the financial risk of having shitty infosec.
For PCI Compliance purposes, at least, holding user credit card information is already seen as a liability because maintaining compliance is a cost center. That's why there's been some shift towards tokenizing transactions on the fly and directly submitting to the CC company via javascript so that shopping websites never see your CC number even when you enter it on their website - even if you're scheduling future payments.
Maybe we should include other data elements under PCI or similar regulations. SSN?
I imagine HIPAA has similar requirements and associated costs.
> You really do need user accounts to run an email service
Exactly, regardless of that companies keep asking users for a whole collection of personal data, not always making it obvious which fields are actually required because it's good business for them to get as much personal data as possible.
Average users are usually unsure about a lot of this stuff and naive enough to enter their real data for fear of getting caught "lying".
This happens because companies see this data as an asset instead of a liability, from the companies view not asking for that data/tricking users into giving it away means missing out on assets.
But if you instead make the personal data a liability, by enforcing standards for keeping/sharing it with hefty fines, then fewer companies will go out of their way asking users for personal information they have no business asking for in the first place because it would put them in a position of liability for what happens with said data.
>> User accounts? Really? This is Yahoo we’re talking about. You really do need user accounts to run an email service
> Exactly, regardless of that companies keep asking users for a whole collection of personal data, not always making it obvious which fields are actually required
You literally don't need any user information to run an email service. You only need a means to identify them which could just amount to giving them a long, randomly generated password. Even the username is only necessary for the purpose of being able to identify them as a recipient, not for login itself.
> You literally don't need any user information to run an email service.
I know that and you know that the average user does NOT know that and is too good-natured to enter fake information.
There are plenty of email services out there, among them many of largest and most established ones, where the real name is a required field during registration.
Sure you can always argue "Well just enter fake details" but that's missing the point. The point being that once personal information becomes a liability, instead of something you can just haphazardly hoard as an asset, companies would be much more careful about what kind of information they are asking from the users in the very first place.
Companies abuse the goodwill of the average users by asking for more information than they should because it comes at no cost to them while at the same time being a very big asset. Even if they fail to secure these assets and a breach happens, most of the costs of that are externalized onto the users whose data actually got leaked, the consequences for the company are often only cosmetical, some bad PR/stock prices take a little downturn.
But the brunt of that will be over after a couple of weeks and after that, it's back to business as usual.
That needs to change, companies need to be held liable for:
A) Needlessly asking for and hoarding personal information
B) Sloppy treatment of information resulting in a leak
Yes, this could very well be opening Pandora's box, but something about the current state of things really needs to change.
That's dangerous schematic games along the same lines of "Metadata is harmless and can't identify anybody".
Emails can sometimes contain very detailed and very denoting user information. Trying to differentiate between users "personal information" and users "personal content" is imho a rather dangerous thing to do because who decides where to draw the lines between the two?
As a user, I expect my data, regardless of which data, to stay private unless I explicitly intent to publish it to the public or somebody else. I most certainly do not expect some employees reading through my private emails for their lunch-break entertainment.
> That's dangerous schematic games along the same lines of "Metadata is harmless and can't identify anybody".
That's a complete strawman argument that has nothing to do with what I wrote. The distinction is correct and factual in this exact situation. You are attempting to redefine terms for apparently no reason other than to argue.
Whether emails contain detailed information or not is irrelevant to the term "user information" in this context, meaning information about a user. The discussion is about whether an email service requires personal information to operate.
> As a user, I expect my data, regardless of which data, to stay private unless I explicitly intent to publish it to the public or somebody else. I most certainly do not expect some employees reading through my private emails for their lunch-break entertainment.
In the real-world, you either need to change your expectations or encrypt your data.
Sure, but you don't need first name, last name, phone number, birth date or gender. All of which are asked on the signup and of which only Gender is specified as optional: https://login.yahoo.com/account/create
On my small business we ask only for an email address, password and confirm password. Everything else is excessive.
Tax obligations can be another problem which may require an address, but often have a simpler way to resolve them by simply picking the appropriate country and state off a list or even with just a checkbox for "are you in X jurisdiction which I am required to tax?". I believe Tarsnap handles it that way.
Tax obligations can be another problem which may require an address, but often have a simpler way to resolve them by simply picking the appropriate country and state off a list or even with just a checkbox for "are you in X jurisdiction which I am required to tax?". I believe Tarsnap handles it that way.
Tarsnap has a "are you Canadian" checkbox. Unfortunately if you are Canadian I have to collect your name and address because I have to provide[0] invoices/receipts which contain this information.
Mind you, there's no requirement that you give me truthful information. If you claim to be John Smith living at 123 Main Street, you'll get an invoice which says that at the top of it. You won't be able to use it to claim a tax rebate; but if you're not running a business it's not useful for that purpose anyway.
[0] IIRC I technically don't have to provide those such invoices to everybody; merely to anyone who asks for one. But collecting the information up front and emailing PDFs to all the Canadians is much easier than handling individual requests later.
My memory of implementing COPPA compliance a decade ago was that DOB was an implicit requirement, the explicit requirement being “confirm they’re over 13; a checkbox isn’t good enough because they’ll clearly just lie.” (paraphrased, not quoted).
For a consumer mail service, you to need to know enough to let them recover their account, possibly with decades of un-backed-up correspondence with and photos of since-deceased friends and relatives, when they’ve forgotten their password, and without letting someone else recover their account. This is a hard problem.
(I’m expecting some idealized “solutions” from people with idealized beliefs about mass market tech skills.)
You also need a process for resolving ownership disputes. Facebook takes the tactic of having the person claiming ownership upload government-issued ID, which seems like it would be the only foolproof way to do so, yet they're constantly maligned for it.
> My personal data is an asset. And it belongs to me.
> Anyone who has my data for any purpose owes me my cut.
It's well established in the US that you do not in fact own your data. You don't own your school records or employment records. You don't own your medical records or your credit records. In general the best you have is a right to view those records and that's only in certain cases.
Pretty sure it's the same across the world, really. Try getting your medical records expunged in Cuba or your search records expunged in China. I bet you have the same success as in the US.
I honestly don't know how to distinguish between "me" and "about me".
These discussions always go full meta. Makes my head hurt.
Being a simple bear, I try to distill these paradoxes (freewill, love, death, what is art) down to something actionable. Hence my conclusion, after much thought and effort (eg securing medical records), that "I am my data, my data is me." and therefore I own it.
If privacy is the ability to control what is publicly known about yourself, the best (practical, prescriptive) way I can think to do that is via property rights.
---
I appreciate your reply. I'm going to revisit my beliefs, conclusions. Starting with the currently generally accepted definitions.
> I honestly don't know how to distinguish between "me" and "about me".
> These discussions always go full meta. Makes my head hurt.
You go to the store to buy a carton of eggs. The store now has data about you and your purchase. If you pay with credit card, they have a record tied to your identity. If you pay with cash, they still have a record of what you bought with your eggs, and nothing stops them from scribbling your name on the copy of the receipt they keep.
You have no right to demand that the store cease possession of this data. They might use this data (in aggregate) to determine when they need to restock eggs. They might use this data (along with other purchase records) to determine that butter should be stocked next to the eggs. They might discard this data as soon as books are reconciled or they might retain this data in perpetuity. This was the case is 1920 and it's the case now. We like to talk about "big data" as if it changed the fundamentals, but all it actually changed was the scale.
I absolutely believe in the right to privacy. But I don’t think the right to privacy extends that far. I think it’s kind of unreasonable that everyone else loses their rights to record data to protect your right to privacy. This runs counter to the first amendment and makes journalism impossible. It also makes it impossible to do things like monitor the police.
I'm torn between liking this view of personal data as property and also liking the view of Richard Stallman and the FSF that "intellectual property" is a legal fiction that we ought to resist. What does it actually mean to "own" data, and is "property" the best metaphor to represent a set of personal data control rights?
I hear ya. Pinko commie liberal me abhors the idea.
Legal fictions like "property", "money", and "rights" are practical innovations that make society work better (eg more moral, greater public good). Kinda like the tech tree in games like Civilization.
The books "The Mystery of Capital" and "Nonzero" influenced me a lot. Good starting points, optimistic, and more right than wrong.
RMS' objection to "intellectual property" isn't an objection to the concept of property in general, but rather an objection to conflating the rights that are associated with tangible property with the rights that are granted by copyright law and patent law. The canonical example is the language of "theft" used by organisations like the MPAA to refer to copyright infringement - RMS would say that illegally copying a copyrighted work doesn't deprive anyone else of their copy of said work, unlike stealing a sandwich from someone.
RMS might disagree with me here, but I think the same thing can be said in this case - we need more precise terminology that accurately describes the types of infringement when it comes to misuse of PII.
In Switzerland, anyone collecting data about other people must make a public declaration of that collection, and may not keep such records about people who disagree with being thus documented ("fiché").
Okay? People signed up for Yahoo accounts so that Yahoo could provide them email, messaging, fantasy sports, and other account-based webapps. How is their possession of the records of those accounts some kind of property crime? This isn't Equifax.
I'd buy a negligence argument, but there's no much to find fault with regarding possession.
You agree to give up your data in return for services. Yahoo mail, or gmail for that matter aren't actually free. You are trading your data for a service.
I don't understand. You believe you have the right to have no records of yourself written anywhere. You believe it is impossible to contract away this right. You've given HN an individual identifier for yourself, and also furnished your political views (a specially protected category under the GDPR) to its database.
Aren't all your comments proof of Y Combinator's human rights violation against you? Shall we have HN shut down and its operators jailed? Obviously this isn't the world we live in, but isn't it the one you're arguing for?
Maybe you can't give permission to have data treated carelessly, but it seems absurd to say you can't give permission to have data collected at all. Opening an account with Yahoo is surely consent to let Yahoo have a record of that account.
Some rights, but not all rights. You cannot (in any country I am aware of) contract away your right to life, nor turn yourself into a slave in exchange for your debts being forgiven.
The latter used to be possible, if I understand serfdom correctly.
Question is, should data rights be alienable or inalienable?
If you're talking about what rights you should have, that's fine. Currently, however, there is no defined right in the US such that service providers like google, Facebook, etc can't make use of the data they acquire from you. If you consider data on you to be more valuable than the service provided, don't use the service.
I believe EU's GDPR made some efforts in that direction, but I'm not sure it went far enough.
We need laws that give companies incentive to store very little data on us outside of what's absolutely required for the functioning of the service. And if they do store additional info, and their servers are breached, then automatic hefty fines should be paid (right after the mandatory notification to authorities and the public).
That should encourage companies to either minimize data collection or use end-to-end encryption, where most of that additional data would be stored on the client's device. This would have to exempt them from liability, and it should since the data wouldn't be on their servers if breached.
I was about to mention the GDPR - it definitely is a step in the right direction. I don't think that it doesn't go far enough - compared to previous regulations, it is quite severe, and it already is a pain to implement as it is. If it went any further, many companies would probably not even bother and somehow do their business outside the EU, or just prepare to be fined
> if they do store additional info, and their servers are breached, then automatic hefty fines should be paid (right after the mandatory notification to authorities and the public).
This is already in the GDPR - you have to notify everyone affected about breaches, and the fines can go up to 20 million or 4% of annual turnover, whichever is greater.
GDPR already does more than any existing law in either US or Europe (not sure about other countries). As every law it will be reviewed and can be made stricter. Changes are likely needed anyway as companies try to circumvent the directive with "creative" ways or incentives for users to give up privacy.
But it's a huge step forward compared to the existing situation.
This could be a voluntary insurance that companies purchase on behalf of their users. If the company suffers a breach, they will be bound to pay X amount to their users depending on the data lost.
Dress it up with a fancy badge to slap on the front of their site. Maybe a silver badge means user data is insured up to $10 each; a gold badge is up to $100; platinum up to $1000.
So Yahoo would have been insured for somewhere between $30 Billion and $3 Trillion in this scheme? That seems untenable. Good luck collecting from the bankrupt insurer.
Good point, although the report states 3 billion user accounts were breached but this doesn't mean 3 billion people. I am guessing the vast majority of accounts did not contain any sensitive information.
And maybe insurance isn't the right word; the risk should probably fall to the company holding the data, not a third party who would never be able to audit every single step to ensure there is no weak link.
The first step towards this is having useful industry standards for auditing and certification that actually work... then you can think about an insurance market where insurers force certification.
True, it wouldn't be 3 billion individuals claiming the benefit. Still the scale is so large that it would utterly bankrupt most companies to pay out for a single breach.
If the cost of disclosure was a dollar a user there's pretty much no way we'd see them voluntarily tell us they were hacked. We'd have to wait until the information got out some other way.
I think hehheh is saying that a policy like this would strongly encourage hiding breaches. No one would openly admit a breach if they knew it would kill the company. The net effect would be less transparency, not better security.
Sure, I like the thinking. In theory some of these costs will hit the errors and omissions insurance, which will drive up their costs in the long run (I know they are being absorbed by Verizon, but typically...). In turn part of the insurance evaluation would they assess the collection of the data as a risk as well as their track records in keeping it secure.
3 billion - we live in an age where half the population of the earth can exist on a service, and everyone is vulnerable.
Yes, a good chunk of these are probably duplicates for business / spam / anon accounts, but this is where the world is trending. How long is it until facebook or google have a massive breach?
waayyyy back in the days, like 2002, Yahoo Pool was pretty big and people used bots to make many many accounts that played with themselves (in something like pyramid structure) to boost accounts score. They were usually used with proxies to avoid yahoo protections.
I don't remember if i did it, but I knew how to do it.
There were also 2 big auto-aimers, hell they were fun and tourneys were fun too :)
+1. I did the same. At the time I had to do this (around 2015), Yahoo was the least concerned about identifying duplicate accounts. I was testing for an actual paying job, not some side interest investigation, mind you. Some of the services I had to test were clever enough to reject fakeinbox accounts so I used Yahoo.
> I was testing for an actual paying job, not some side interest investigation
What difference would it make?
Do you mean to imply that creating test accounts is a little bit "wrong", and would be wrong for an individual to do at home, but it's OK to do it if someone else is paying you for it?
If so, I disagree on both counts: it's not wrong to create test accounts, but if it was, it would still be wrong even if someone is paying you to do it.
We've created a catch-all *@test.company.com with AWS SES & Lambda, all forwarded to a single test@company.com (a GApps group where the QA staff had access). Took a few tries to get right, but worked flawlessly from then on, saving testers' time every single day.
I know a guy who uses a service that creates a unique email account for every service he signs up for. That way, he tells me, if he ever gets any spam, he can delete the account and it doesn't affect any of his other email accounts.
This can be done easily if you own a domain and use a service that lets you specify a catch-all address. I do this with my own domain and G Suite. Then, you don't even need to do any preparation before giving out the address.
It does sound weird to the person writing it down and I've had more than one person say something like "well, if you're just going to give me a fake address, then don't bother" before I explained myself.
One other down side is that it is not as easy to reply to mail as the other, generated identity (in gsuite, you need to create a new account in the domain to write as that username and also maybe jump through a hoop or two). Replying casually can often reveal your main identity, which is often the one you are trying to strongly protect.
+1. I also use *@mydomain.com feature in G Suite, and it's very convenient to understand which companies sell/pass email databases to others w/o my permission.
In some cases, you need to reply from that "aliased" address -- in this case, I do go to the Settings, add an alias, got a confirmation code, and confirm it. Then this new "address" is available in GMail in drop-down "From:" menu when you write a new email.
username+anything@mydomain.com is also a useful feature (as well as u.s.er.nam.e@mydomain.com – dots are all ignored in GMail; some services don't allow "+" in email address field, so you can use finite number of variants with ".").
These little tricks make GMail convenient for geeks :)
I’ve run a fair amount of email campaigns where we strip out the + if gmail is the domain to ensure it doesn’t end up in some weird filter.
Dick move, I know. Tell marketing that though.
I personally use gmail through a vanity domain and have a catch all rule, so I end up signing up with a fake email account for every domain (hn@mydomain.com) and then the catch all forwards it to my real account (me@mydomain.com).
> I’ve run a fair amount of email campaigns where we strip out the + if gmail is the domain to ensure it doesn’t end up in some weird filter.
At which point you should wind up in the "how widely can I advertise that you're a spammer and all your outbound email should all be routed straight to /dev/null for sending mail to an email address you were never given" filter.
Depends on your isp and which email provider they use. The big marketing email services generally do have the feedback loop setup with Gmail though, so yes, you are right.
> I’ve run a fair amount of email campaigns where we strip out the + if gmail is the domain to ensure it doesn’t end up in some weird filter.
Which works, until the Gmail users who bother using + addresses with filters start giving all legitimate senders + addresses and sending everything thst doesn't have one and doesn't come from Google straight to deletion (possibly with a stop by “mark as spam” en route.)
The problem is not all legitimate sites/sources will actually accept '+' as a valid email character even though the RFC says it's a valid email character.
I wonder if it could be argued that this violates anti-spam regulations. Depends on how “plus” addresses get interpreted. Are they a different recipient?
It really baffles me that people are still suggesting this as advice for spam reduction. All it takes is a third of a brain and a couple seconds of thought to realize that spammers know this is a thing and can adapt.
Well, not exactly. That would only work if your address was registered as foobar@gmail.com but not if it was registered as foo@gmail.com. Essentially, periods don't matter in gmail addresses.
Adding a . in between any of the characters (or removing, if you registered the account to have .'s included) will still go to the same email address.
But you can't add .anystring to your address and still receive the message as you can with +anystring.
This always blows the mind of the average gmail user who thinks they registered first.last@gmail.com when they find out that firstlast@gmail.com also works.
My favorite is one site I encountered that let you create and login with such an email address, but the forgot password form couldn’t handle it and would 500.
A lot of services don't allow + in email addresses. With gmail you can also insert a . anywhere you like which works more often. But sometimes, catch all addresses really help to test.
I don't think there's a standard which says that "X+Y delivers to X" - it's a configurable option in exim and you could equally well make it "XqY delivers to X" if you were wilfully perverse.
I'm not being snarky, but do you think they would tell us if they did? We have to assume they are prime targets. They might have slightly better personnel, but is that enough to out do the nefarious and the determined? And can we discount a rouge employee?
I would go further and say 3b is an order of magnitude too large. Bots aside, If there are only 3 accounts per user, our estimate is at 1b. Now, we take into account malicious agents like bots and spammers, easily carrying a bloat factor of 3-5. The closer estimation might be 100s of millions of unique human users, and maybe half of those users actually care.
TLDR+Edit: Didn't see your other post and accidentally straw manned you. Anyway, I agree it's gonna be "a lot lot less" than 3b unique human accounts.
Well 'smaller' and 'close to 50%' will have different effects, and I'm willing to bet that the number of individuals affected will be a lot, lot less than 3bn.
Serious question: at what point do we reach the "everyone is vulnerable so no one is vulnerable"?
EDIT: Or maybe not "no one is "vulnerable", but just that everyone's information is assumed compromised and our current societal infrastructure accounts for it.
> A massive data breach at Yahoo in 2013 was far more extensive than previously disclosed, affecting all of its 3 billion user accounts, new parent company Verizon Communications Inc. said on Tuesday.
Does anyone have insight on how this works? Do you just sue the pants off of the execs, or the lawyers who did due diligence, or the SREs maybe? Do the clawback the difference in goodwill + legal costs from the selling investors?
Is there recourse at all?
It'll probably the some poor schmuck SRE getting the blame, like always, right?
There'll be a small chunk of the purchase price left in escrow for a year for any extra liabilities that weren't discovered in DD. They'll be claiming that. But it won't be much.
It works like this: lawyers come up with a security checklist. Managers make sure the checkboxes are checked. Engineers are all ignored because fuck you, your opinion isn’t on the checklist.
It’s security theatrics, not actual security. And if you stand up for something more, get ready to quit because you won’t be listened to.
I used to work at aol. Neither company trusted the others networks or security processes. Integration planning meetings were like negotiating a prisoner exchange.
Looks like both Equifax (2.5m additional accounts) and Yahoo chose today as a good day to bury bad news (the papers being filled with Las Vegas, Puerto Rico, etc). Slimy moves from their PR teams.
There never is a break-in where they get 1/3 or 1/2 of the accounts. It has to be nearly all or some much smaller faction. (my own presumption based on the idea nothing large does mere 2 to 3 way replication or partition)
It depends. It's possible a company could catch a breach while the data is being dumped to s3/russia/wherever and cut it off before everything is extracted.
Another possibility is that only one particular system is breached, which wouldn't actually affect all users of a given company. If Facebook were hacked, it's possible that only the ad-buy system is compromised and not their entire user store, for example, thus exposing only people who have purchased ads and not all users.
If you store EU user data in the EU and other user data somewhere with less restrictive privacy laws, an attacker could get hold of one or the other reasonably.
On the other hand, yeah, it's much more likely the entire account database was dumped.
> It depends. It's possible a company could catch a breach while the data is being dumped to s3/russia/wherever and cut it off before everything is extracted.
At that point honest behavior would still assume all accounts were transferred. You don't know that data was not transferred earlier or it's also hard to estimate what part of data was sent successfully.
If data could be accessed it should be treated as compromised.
When I was on Facebook today, I saw an ad with a photo of a minivan, and some copy about finding a new vehicle. The ad was posted by Yahoo. When I clicked it, it took me to the search result for minivans. This company feels like an AI experiment.
A spokesman for Oath, the new name of Verizon’s Yahoo unit, said the company determined last week that the break-in was much worse than thought, after it received new information from outside the company.
Can they claw back money from Yahoo shareholders because of this?
Just an ancillary comment but Yahoo has a whole bunch of password requirements. So much so that my passwords don't cut it and I can never remember my password. And/or I need to validate every new device. Is this all just for show? Its insult to injury that they force all these things and then they get broken into.
Hopefully your experience is characteristic of most yahoo users, and this breach is less effective because people are using a unique password for their breached account.
Somewhat off-topic, but does anyone know what top-level domains are in practice "safe" to use for email addresses if we're going to migrate to our own domain?
I mean "safe" in the sense of being unlikely to cause confusion or problems with less-than-well-written software (or humans).
Obviously .com is okay, and I haven't heard of problems with .edu/.gov/.org/.net, but I'm a little afraid of getting a domain for email addresses that isn't a well-established 3-letter TLD, on the off chance that someone has hard-coded a requirement like this in their code. I'm not sure if I'm just being paranoid about this though. Any suggestions on what's considered safe?
For about 9 years or so, I've used the .CC TLD for my personal/family's email without any technical issues...though it is important to know that throughout the entire time, I've used G Suite as my email provider (used to be called google apps for your domain, etc.). So, one could speculate that perhaps my lack of technical issues was less due to the TLD that i used, and maybe because google considers my domain name "not spammy".
HOWEVER, an annoying problem that I've had over the years - and while it has diminished slightly still persists - is that people (or at least people here in the U.S.) are not used to hearing domain names that don't end in the usual .COM, .ORG, .NET...so I ALWAYS have to clarify and explain that my email ends in .CC and not .COM, etc. i find myself still doing this even today - almost a decade later - with so many lay people "being online". I sort of expect that more often with lay people more explanation is needed, but you'd be surprised how many technical people also are not as used to hearing domain names that don't end in the usual top 3. I like the .CC TLD, I really do...but having lived these last 9 or so years with having to constantly explain to people (with whom I plan to correspond with) that there are soooooo many other TLDs out there (beyond just .COM, .ORG, .NET) does get really tiring. If I had to do this all over again, I would have gone with .NET or .ORG (the .COM back then was already taken for my domain name). Oh well.
> I'm not sure if I'm just being paranoid about this though. Any suggestions on what's considered safe?
Maybe a bit, I don't think it is based in paranoia, you have technical reasons. Just recently had to strip tld from uri's, and boy was that harder then excepted!
That being said domains like co.uk, co.jp been around for a long time. I will stay away from "fancy .named" domains, but country level names should work fine.
Oh, good point! Do .info, .name, .me, .io, etc. also cause problems like custom ones? Or do you just mean the newer ones like .services and .online and whatnot?
Kind of both. The world of emails works in mystical ways! It really depends on the admin. He or she might very well block .me etc. since no serious company is going to have a .me email.
A lot of admins do block something like .services.
Your best bet is to go with a tld that is also commonly used by companies.
Also, if this is important to you, you need to check if whois privacy is supported for a specific tld.
There is an unpatched server at some IP address long forgotten and no longer used by Yahoo but still nevertheless works. The page still shows the Yahoo portal with news on the front page from when Yasser Arafat was alive. I believe the page has not been updated since 2003.
The IP address is in the 200 range. I used to remember the IP address for many years due to photographic memory even though I had only seen it briefly once. But I just cannot dig up that memory anymore.
Does anyone have a good solution to deleting a Yahoo account? I've got one that is 99.9% spam mail now but I've never deleted it because If I remember correctly someone else could open up that email in my name and continue to get my emails. They also don't support automatic email forwarding if I remember correctly. It remains as the dark spot of my email accounts.
So, let's see: We have a server farm and it is working along. We want to know right along, in real time, if it is sick or healthy. So, we do some monitoring.
There are two kinds:
(1) The first kind looks for problems never seen before. Here we get to use data of two kinds, (i) when the system was healthy and (ii) when the system was sick and we detected the problem, understood it, found out why, and tried to prevent that problem in the future.
(2) The second kind looks for problems never seen before, that is, zero-day problems. Here we have no data on the problems but likely do have a lot of data on when the system was healthy or at least seemed to be, not just on the day of the data collection but also later.
In both cases we have two ways to be wrong:
(A) Say that the system is sick when it is healthy -- a false alarm.
(B) Say that the system is healthy when it is sick -- a missed detection.
So, from (A) and (B), we get two rates and want both to be low.
We can get data on many variables at high data rates.
Now, what do we do?
Okay, it's a problem in, say, data analysis, data science, statistics, AI/ML, right?
Hmm .... What do we do?
Uh, be warned: If the false alarm rate is too high, then the monitoring will be ignored.
This still is a huge concern for us web app developers. Most people re-use their email addresses and passwords across multiple sites. One breach at one internet company affects all the others.
IMO, password reuse is the #1 web application security problem in the world right now, and there's very little in the way of accepted industry standards to mitigate it.
The statistical analysis on the password database here would be fantastic! You've likely got demographics, geolocation, age, when the password was made (going back maybe 20 years!) and more. It'd be a great research tool if it ever leaks.
I swear if your company is about to go under, the executives are just selling off the data, calling it a breach, making some bank and giving an excuse to go close down which wouldn't be their fault.
I tried to enable MFA on a Yahoo account I was helping someone with at work 24 hours ago.
Their MFA is still SMS-based, which I’m pretty sure is a bad thing. They don’t allow an app like Duo (although they do reject VOIP numbers which I guess is good).
Download Yahoo Mail app and setup. They call their MFA Account Key and it uses the Mail app to push similar to Duo. I think other apps include Account Key, but it was just being pushed out when I last worked with Yahoo. The SMS is just a bootstrap and once you have the app you can pick your second authenticator as login.
Just come out at the start honestly and say, "All 3 billion accounts affected at Yahoo", or whatever.
I feel angry when I see the numbers gradually going up, I think one reason is because I see it like they're trying to dupe us, or "cook the frog slowly".
I understand they have to protect "shock" to their stock price, or reputation, or prevent panic, but honesty is still valuable, right?
When you have a natural disaster, surely there are experts who have already mapped out such situations and they can say, roughly 20,000 homes will be destroyed in an event like this. Wouldn't it be good to start off at a big estimate and then revise down?
I hate to think this is to some extent driven my the media's need to "drip drip" out a story, instead of giving people the truth.
related to this: I got a "someone has your password" from Google and they blocked access...diff country, different device.
My question: Now I assume that one way or another they got that from Google or from one of the many hacked forums /websites (yeah, I used the same id+password in many sites). Do they try to login manually or try 10000 at a time via bots? I doubt they they went manual since they must have millions of accounts. id my user name was cpowell or hclinton I suppose but....
Since nowadays the leakages are measured by Billions here and there with critical info exposed, plus facebook/google etc can track your move and even your thoughts/opinions,your daily life in general, so I assume we officially entered a world of no privacy with no turning back. I think we need a new technical design to cope with this indeed, something like a new style of identity with biology info used, and dynamically generated tokens and such, and the browser be in anonymous mode by default, etc.
More than half of the people in the world still don't have internet access ! I believe the estimate of 3 B includes all yahoo accounts created until 2013 - maybe an opportunity to indicate the clout it once had.
Interesting to see the comments here. One thread arguing that free services should be incentivized against collecting/profiting off user data, followed by a thread lamenting the WSJ content paywall...
Why does that even work? I get technically what is going on - WSJ see the link as having been referred from Facebook. But why does the WSJ display the content for free to somebody that seems to be coming from Facebook yet look for a subscription for direct access (or from a Google search result referral) .
That looks pretty handy. I'm at work (maybe I shouldn't be on HN at all, haha...) and Facebook is blocked. But I guess that's my problem. :) I'll check it out when I get home.
Maybe me. ;-) I've been suggesting it for a while. It works wonders and has worked every time I've tried it.
It seems likely that they know about this backdoor and have opted to allow it. They must surely know the IO addresses associated with archive.is and have yet to make any effort to block then.
It's a very short article, here are the salient bits:
===
A massive data breach at Yahoo in 2013 was far more extensive than previously disclosed, affecting all of its 3 billion user accounts...
...
...Oath ... said the company determined last week that the break-in was much worse than thought, after it received new information from outside the company. ... declined to elaborate on the source of that information. Compromised customer information included usernames, passwords, and in some cases telephone numbers and dates of birth...
...
The number of individuals affected by the 2013 attack is smaller than 3 billion, because some people have multiple accounts ... Oath will immediately begin notifying the users who own the additional roughly 2 billion accounts. That is expected to take several days and occur via email...
I'd be pretty surprised if an attacker could actually get away with a lot of sensitive, actionable bulk user data from Facebook. DMs would probably be way too big in total, unless they just looked for DMs of high-profile people.
As for passwords, they're probably not stored in a very crackable format (probably some kind of super-bcrypt-esque algorithm with a pepper). Of course, they could hijack the login procedure and harvest passwords in real-time until they're detected. That would still be really bad depending on how long they can evade detection - maybe millions of passwords - but at least it wouldn't be retroactive. And the password dump could still be bad for people looking to target individuals within the dump.
Maybe advertising data could be trimmed down enough to dump the whole thing? Every ad that accounts have clicked?
> Of course, they could hijack the login procedure and harvest passwords in real-time until they're detected.
Facebook makes it really hard for people to log off. Unless one is using a shared computer, I doubt she types her password more than a couple times a year.
I know many mention facebook or google being hacked will be an even bigger deal. But I wonder, with all the online spaces google/facebook has under control (ads, analytics, cdns, dns, crawlers, your phone, etc.) if they suspect a breach, they could literally disable any website or device that tries to share that information.
Nothing would stop someone from being able to share it via the darknet. Tor, Freenet, Zeronet, etc. There's no way that news would be able to be stopped.
Plenty of leaked stuff (like credit cards) moves around on the "dark web" via TOR sites, which most of those won't impact. Not unless they went as far as to make the Android OS check that you're visiting onion sites and scanning for leaked information to block them, which would be pretty extreme.