Is it possible that current accounting/tax law can be interpreted so that these are viewed similarly?
The more common data you give away is worth even less. Your "gift" is akin to giving away a few grains of sand to a glassmaker who provides a free grain counting service.
Now let's say you dumped a lot sand that we could value at $10K. Any smart sand-counting glassmaker will claim his once "free" sand counting costs $10K, which amounts to an equal, zero-profit trade.
In any situation, potentially derived value is not taxable. A car is worth whatever a car was bought/sold for, not including some hypothetical such as whatever I could make by driving it for Uber/Lyft. What matter's here is what will actually occur in the transaction. If Equifax chooses to sell its data, that income will be taxed at whatever price Equifax chooses to sell the data.
Note this doesn't change that your "gift" of peanuts of data is not taxable because (a) your data alone isn't worth squat, (b) even if it was, you got something in exchange for it.
If we agree in a truly free society then collecting and monetizing metadata should be illegal. If we don't mind giving up that freedom then there's nothing wrong with companies creating a profile on you and tracking you no matter where you go and what you do. But the internet has spoken and we're gladly, albeit unknowingly, giving up any right of protection. I find it worrisome to think of what society will be like in another 50 years if nothing is done to curtail the fleecing of user data.
Also, it doesn't matter that there's an exchange happening. Sales tax and income tax are assessed on fair exchanges of goods, services, and currency.
And if you grow oranges on your property that you never sell, you never pay taxes on those oranges.
If you accept a trade-in on the sale of a vehicle, the allowance for the trade-in cannot be excluded from the amount on which tax is based.
For example, if you sell a car for $20,000 and accept a trade-in valued at $4,000 as partial payment, tax is based on the $20,000 selling price.
Still though, the example is a good one. You're trading your data for a service. There isn't anything to tax.
It's kind of like arguing a bank would never create fake accounts because the risk of doing so is too large.
Black market data is worth way more because it's often more personal than just demographic markers and interests, and can potentially lead to large sums of money.
Alternatively you need real criminal gangs - dozens of people willing to walk up and down a London street withdrawing 5k at a time from a 1000 Pre-prepared cards and put the money in their rucksacks. They don't come cheap. And its still cash and still in the U.K.
Get Amazon to send you two dozen laptops to the same address with two dozen cards. All as "gifts". Yeah right. Now you gotta sell them - fences run at 10% if you are lucky.
The lowest effort are simple impersonation for loans, but still you have to take the money and move it somewhere. Into cash? Into the phillipines? See above problems. Open a credit card account? How to intercept it and the PIN number sent by post?
All in all, it's actually pretty darn hard to take personal details and monetise at the "real money" level. These things stop being scalable. You could probably fund a student lifestyle off any combination of the above but millions - not really.
Cf interesting Microsoft paper on this a few years back
These criminals trade data because it makes them money, otherwise there wouldn’t be much of a makrket.
Sensitive personal data is necessary but not sufficient to rip someone off. And if you want to try to make a living ripping people off, there is even more business overhead, making the cost of sensitive personal data an even smaller portion of overall operational costs.
From the point of view of the thief, our personal data is a vital but cheap input into an operation that tends to have very high security costs, viciously expensive liquidity issues and terrible personnel problems, among other more quotidian business headaches.
I suggest trying to think like a crook now and then. Trying to try on other people's lives is a useful way of shaking up one's thinking habits, empathy (don't confuse with sympathy) is always useful, and it can help you keep yourself more secure.
 I am leaving out things like several potential fates far worse than bankruptcy and related issues because they aren't opex-related, but they probably do effect retirement planning.
I feel like you're arguing that dirt is worth as much as the farm that one could build with it.
For example, that 40cpm is to reach a pool of <1000 users who are in charge of purchasing for networks of hospitals, and my ads are for MRI machines. 3rd party data is unbelievably valuable, probably $1.5 million of my budget goes to data costs alone.
This is well understood by the adtech community and even the flashy new "ABM" companies will tell you the same. 3rd party data is universally terrible. At best, it'll work at scale (of millions) on general demographic details but will definitely not recognize 1000 people on the open web.
That kind of list might work on Linkedin or Facebook with email targeting but it would be easier to focus on niche trade sites without any data, or just use a direct sales team. That $1.5M in data you're paying for would have much better ROI with a good VP of sales.
Sometimes you can prod your potential customers into action.
If someone tried to do that to me, I’d report the attempt to the company lawyer, and I’d doubt the quality of the thing they were selling was as good as the quality of the thing the other poster was advertising.
Fortunately for me, I don’t control any budgets.
Now that’s not allowed in most places, plus you need to hire fancy salespeople to deliver those gifts.
I bet LinkedIn does.
OP mentioned his data alone, which isn't worth squat unless the transaction says otherwise. Meaning, if OP sold his data to a company for a taxable amount, he would be taxed on that income.
Wouldn't black market identities be worth MORE if they weren't so easy to get?
So the more we tax / regulate it, the harder it is, the more valuable they get. Win-win for everyone.
No. It's already illegal to buy and sell identities, so black market demand for identities is likely at a maximum already. I'm just using that number as a proxy for what your clicks on the internet must be worth. I'm basically making the assumption that Value(Clicks) < Value(Black Market Identity), which I'd say is a fair bet.
> So the more we tax / regulate it, the harder it is, the more valuable they get. Win-win for everyone.
Again, this wouldn't be true for black market identities, but let's look at clicks.
What I'm saying is that the click you give a way is worth too little to be taxed at all.
Say 1M click data points is worth $1K (which I think is still very generous given the amount of noise) that means each click is worth 1/10¢. Any company that sells the 1M clicks to another company will pay taxes on the $1K of income. So if you increased taxes, you would discourage them from selling your data to another company. This doesn't change your behavior though as a consumer. You still give away an untaxable 1 click at a time (1/10¢): which is not taxable as a gift because of the size of the amount (even 1K clicks is only $1) and because the company can easily argue they provided you with a service in exchange for that 1/10¢.
What you're looking for is a way to penalize companies for receiving data (i.e. for every data point you gain, you owe some $x in taxes.) This would need to be legislated since that's not currently how tax law works.
Yes, that actually happens to be the status quo.
A collection of data is an intellectual property asset just like a patent, or movie rights, or your brand.
If you buy a database, you will, depending on the costs, have to deprecate it over its useful lifetime. That means your tax burden in the first year will be higher than if you blew the money on the company Christmas party. That's the same as if you bought a software license, or Coca Cola Co.
If you collect the data yourself, that mechanism doesn't kick in. The reason is that it's difficult to value intellectual properties' value unless they're traded, and it would allow for too much manipulation of a company's profits.
Now these assets aren't taxed on an ongoing basis in the way you imply. That's because no assets are, except real estate in some jurisdictions.
Obviously this is a silly thought exercise but it is fun to think about.
This is like saying by walking into a store you are "giving" the company your image on their security camera. It would take a very odd definition of "gift" to make that claim.
Yahoo was "gifted" data. People explicitly gave them names, email addresses and passwords. That is what Yahoo failed to protect.
> The stolen information included names, email addresses, phone numbers, birthdates and security questions and answers.
Record companies forced plenty of DRM related BS down our throats to drag us into a "license not own" rental model. I suggest we return the favor.
Yet we manage to sell IP (or just some intangible right to use it)
So if you make $X in profit and then use it to buy a tractor, then (from the government's perspective), you've just swapped $X for an asset worth $X. No change in book value, no reduction in profit, no reduction in tax liability.
You are, however, allowed to treat the tractor as an expense that's distributed over several years of its useful life, which is called "depreciating" it.
So yes, to the extent that your cash is exchanged for assets, that counts as a higher book value and higher tax liability (than if it were a pure expense). I don't know if you'd have to treat a "data purchase" more like a tractor or more like buying electricity (a pure expense) though.
My previous, longer comment on the constrains of the tax code and how it results in needing the concept of depreciation: https://news.ycombinator.com/item?id=15060604#15061439
Similar experience here:
In an earlier career my company reinvested all profits back into growth, only to learn that the taxman didn't care about such silly things. The IRS demanded the tax from the profits that had been reinvested and were no longer available.
Plus they wanted the tax from the profits of the growth that had only happened from reinvesting the earlier profits that they wanted tax from. Their demands were in excess of the actual realized profit that had been made by the company.
A tax on social software companies proportional to their network size would be an interesting proposal to solve both of these issues. It would also greatly increase the ability for 100,000 - 1,000,000 person "decentralized" social networks (like Mastodon or other competing networks) to thrive.
Maybe we should include other data elements under PCI or similar regulations. SSN?
I imagine HIPAA has similar requirements and associated costs.
Other impacts may not have direct monetary damages, but could be even more devastating (ie Ashley Madison).
Until the courts start clamping down on negligent handling of personal data, firms will continue to cheap out on infosec.
Exactly, regardless of that companies keep asking users for a whole collection of personal data, not always making it obvious which fields are actually required because it's good business for them to get as much personal data as possible.
Average users are usually unsure about a lot of this stuff and naive enough to enter their real data for fear of getting caught "lying".
This happens because companies see this data as an asset instead of a liability, from the companies view not asking for that data/tricking users into giving it away means missing out on assets.
But if you instead make the personal data a liability, by enforcing standards for keeping/sharing it with hefty fines, then fewer companies will go out of their way asking users for personal information they have no business asking for in the first place because it would put them in a position of liability for what happens with said data.
> Exactly, regardless of that companies keep asking users for a whole collection of personal data, not always making it obvious which fields are actually required
You literally don't need any user information to run an email service. You only need a means to identify them which could just amount to giving them a long, randomly generated password. Even the username is only necessary for the purpose of being able to identify them as a recipient, not for login itself.
I know that and you know that the average user does NOT know that and is too good-natured to enter fake information.
There are plenty of email services out there, among them many of largest and most established ones, where the real name is a required field during registration.
Sure you can always argue "Well just enter fake details" but that's missing the point. The point being that once personal information becomes a liability, instead of something you can just haphazardly hoard as an asset, companies would be much more careful about what kind of information they are asking from the users in the very first place.
Companies abuse the goodwill of the average users by asking for more information than they should because it comes at no cost to them while at the same time being a very big asset. Even if they fail to secure these assets and a breach happens, most of the costs of that are externalized onto the users whose data actually got leaked, the consequences for the company are often only cosmetical, some bad PR/stock prices take a little downturn.
But the brunt of that will be over after a couple of weeks and after that, it's back to business as usual.
That needs to change, companies need to be held liable for:
A) Needlessly asking for and hoarding personal information
B) Sloppy treatment of information resulting in a leak
Yes, this could very well be opening Pandora's box, but something about the current state of things really needs to change.
Emails can sometimes contain very detailed and very denoting user information. Trying to differentiate between users "personal information" and users "personal content" is imho a rather dangerous thing to do because who decides where to draw the lines between the two?
As a user, I expect my data, regardless of which data, to stay private unless I explicitly intent to publish it to the public or somebody else. I most certainly do not expect some employees reading through my private emails for their lunch-break entertainment.
That's a complete strawman argument that has nothing to do with what I wrote. The distinction is correct and factual in this exact situation. You are attempting to redefine terms for apparently no reason other than to argue.
Whether emails contain detailed information or not is irrelevant to the term "user information" in this context, meaning information about a user. The discussion is about whether an email service requires personal information to operate.
> As a user, I expect my data, regardless of which data, to stay private unless I explicitly intent to publish it to the public or somebody else. I most certainly do not expect some employees reading through my private emails for their lunch-break entertainment.
In the real-world, you either need to change your expectations or encrypt your data.
On my small business we ask only for an email address, password and confirm password. Everything else is excessive.
Tax obligations can be another problem which may require an address, but often have a simpler way to resolve them by simply picking the appropriate country and state off a list or even with just a checkbox for "are you in X jurisdiction which I am required to tax?". I believe Tarsnap handles it that way.
Tarsnap has a "are you Canadian" checkbox. Unfortunately if you are Canadian I have to collect your name and address because I have to provide invoices/receipts which contain this information.
Mind you, there's no requirement that you give me truthful information. If you claim to be John Smith living at 123 Main Street, you'll get an invoice which says that at the top of it. You won't be able to use it to claim a tax rebate; but if you're not running a business it's not useful for that purpose anyway.
 IIRC I technically don't have to provide those such invoices to everybody; merely to anyone who asks for one. But collecting the information up front and emailing PDFs to all the Canadians is much easier than handling individual requests later.
That's what I had thought. However, users will lie about a DOB if they will lie about an age.
If it's a user experience thing, fine, but at least make it an optional field.
(I’m expecting some idealized “solutions” from people with idealized beliefs about mass market tech skills.)
i.e. when I sign up with you, I can choose my own preferred vendor to handle identity recovery.
Having my phone number isn't even good 2fA.
Anyone who has my data for any purpose owes me my cut.
Making this a property rights issue solves all the privacy & identity issues.
> Anyone who has my data for any purpose owes me my cut.
It's well established in the US that you do not in fact own your data. You don't own your school records or employment records. You don't own your medical records or your credit records. In general the best you have is a right to view those records and that's only in certain cases.
For instance, the EU and Switzerland.
These discussions always go full meta. Makes my head hurt.
Being a simple bear, I try to distill these paradoxes (freewill, love, death, what is art) down to something actionable. Hence my conclusion, after much thought and effort (eg securing medical records), that "I am my data, my data is me." and therefore I own it.
If privacy is the ability to control what is publicly known about yourself, the best (practical, prescriptive) way I can think to do that is via property rights.
I appreciate your reply. I'm going to revisit my beliefs, conclusions. Starting with the currently generally accepted definitions.
> These discussions always go full meta. Makes my head hurt.
You go to the store to buy a carton of eggs. The store now has data about you and your purchase. If you pay with credit card, they have a record tied to your identity. If you pay with cash, they still have a record of what you bought with your eggs, and nothing stops them from scribbling your name on the copy of the receipt they keep.
You have no right to demand that the store cease possession of this data. They might use this data (in aggregate) to determine when they need to restock eggs. They might use this data (along with other purchase records) to determine that butter should be stocked next to the eggs. They might discard this data as soon as books are reconciled or they might retain this data in perpetuity. This was the case is 1920 and it's the case now. We like to talk about "big data" as if it changed the fundamentals, but all it actually changed was the scale.
No entity has any right whatsoever to retain any data about me, for any purpose.
Your rights end where mine begin. My right to privacy trumps everyone else's profit motive.
Edit: Scratch that. Our disagreement is more fundamental. I believe humans have a fundamental right to privacy. You don't.
Legal fictions like "property", "money", and "rights" are practical innovations that make society work better (eg more moral, greater public good). Kinda like the tech tree in games like Civilization.
The books "The Mystery of Capital" and "Nonzero" influenced me a lot. Good starting points, optimistic, and more right than wrong.
Until something better comes along...
RMS might disagree with me here, but I think the same thing can be said in this case - we need more precise terminology that accurately describes the types of infringement when it comes to misuse of PII.
Hm. How would this work for (say) a social security number? Does it belong to you? Does every number anyone chooses to identify you belong to you?
Which itself is based on an EU directive: https://en.m.wikipedia.org/wiki/Data_Protection_Directive
This is to be replaced with an updated version between now and Brexit (that’s just going to add to the fun!): https://en.m.wikipedia.org/wiki/General_Data_Protection_Regu...
I'd buy a negligence argument, but there's no much to find fault with regarding possession.
Aren't all your comments proof of Y Combinator's human rights violation against you? Shall we have HN shut down and its operators jailed? Obviously this isn't the world we live in, but isn't it the one you're arguing for?
Maybe you can't give permission to have data treated carelessly, but it seems absurd to say you can't give permission to have data collected at all. Opening an account with Yahoo is surely consent to let Yahoo have a record of that account.
A common example is an arbitration clause, where you sign away your rights to use the courts to resolve disputes.
The latter used to be possible, if I understand serfdom correctly.
Question is, should data rights be alienable or inalienable?
We need laws that give companies incentive to store very little data on us outside of what's absolutely required for the functioning of the service. And if they do store additional info, and their servers are breached, then automatic hefty fines should be paid (right after the mandatory notification to authorities and the public).
That should encourage companies to either minimize data collection or use end-to-end encryption, where most of that additional data would be stored on the client's device. This would have to exempt them from liability, and it should since the data wouldn't be on their servers if breached.
> if they do store additional info, and their servers are breached, then automatic hefty fines should be paid (right after the mandatory notification to authorities and the public).
This is already in the GDPR - you have to notify everyone affected about breaches, and the fines can go up to 20 million or 4% of annual turnover, whichever is greater.
But it's a huge step forward compared to the existing situation.
Dress it up with a fancy badge to slap on the front of their site. Maybe a silver badge means user data is insured up to $10 each; a gold badge is up to $100; platinum up to $1000.
And maybe insurance isn't the right word; the risk should probably fall to the company holding the data, not a third party who would never be able to audit every single step to ensure there is no weak link.
Yes, a good chunk of these are probably duplicates for business / spam / anon accounts, but this is where the world is trending. How long is it until facebook or google have a massive breach?
What difference would it make?
Do you mean to imply that creating test accounts is a little bit "wrong", and would be wrong for an individual to do at home, but it's OK to do it if someone else is paying you for it?
If so, I disagree on both counts: it's not wrong to create test accounts, but if it was, it would still be wrong even if someone is paying you to do it.
It does sound weird to the person writing it down and I've had more than one person say something like "well, if you're just going to give me a fake address, then don't bother" before I explained myself.
One other down side is that it is not as easy to reply to mail as the other, generated identity (in gsuite, you need to create a new account in the domain to write as that username and also maybe jump through a hoop or two). Replying casually can often reveal your main identity, which is often the one you are trying to strongly protect.
(I do similar on my domains and the Gmail alias is easier to do than logging into Admin CP and adding aliases there)
email@example.com is also a useful feature (as well as firstname.lastname@example.org – dots are all ignored in GMail; some services don't allow "+" in email address field, so you can use finite number of variants with ".").
These little tricks make GMail convenient for geeks :)
Dick move, I know. Tell marketing that though.
I personally use gmail through a vanity domain and have a catch all rule, so I end up signing up with a fake email account for every domain (email@example.com) and then the catch all forwards it to my real account (firstname.lastname@example.org).
At which point you should wind up in the "how widely can I advertise that you're a spammer and all your outbound email should all be routed straight to /dev/null for sending mail to an email address you were never given" filter.
Which works, until the Gmail users who bother using + addresses with filters start giving all legitimate senders + addresses and sending everything thst doesn't have one and doesn't come from Google straight to deletion (possibly with a stop by “mark as spam” en route.)
First because it leaks the underlying email (you can safely assume all spammers are well aware of this feature).
Second because if you start receiving spam, you can't stop it. All you can do is to try to deal with the firehose.
A much better solution is randomly generated aliases that you can delete.
And you're confident of this how?
I'm not actually convinced this is true. It's definitely a widespread belief though.
Adding a . in between any of the characters (or removing, if you registered the account to have .'s included) will still go to the same email address.
But you can't add .anystring to your address and still receive the message as you can with +anystring.
This always blows the mind of the average gmail user who thinks they registered email@example.com when they find out that firstname.lastname@example.org also works.
I’m sure spammers have already figured that out.
I get addresses rejected as invalid when signing up for some service at least once a month.
verified Yahoo accounts Fiverr
You'll get a sense for how many people generate these for resale to spammers and other shady purposes.
TLDR+Edit: Didn't see your other post and accidentally straw manned you. Anyway, I agree it's gonna be "a lot lot less" than 3b unique human accounts.
> The number of individuals affected by the 2013 attack is smaller than 3 billion, because some people have multiple accounts ...
EDIT: Or maybe not "no one is "vulnerable", but just that everyone's information is assumed compromised and our current societal infrastructure accounts for it.
Imagine the buyers remorse
Is there recourse at all?
It'll probably the some poor schmuck SRE getting the blame, like always, right?
It’s security theatrics, not actual security. And if you stand up for something more, get ready to quit because you won’t be listened to.
Another possibility is that only one particular system is breached, which wouldn't actually affect all users of a given company. If Facebook were hacked, it's possible that only the ad-buy system is compromised and not their entire user store, for example, thus exposing only people who have purchased ads and not all users.
And a third possibility, especially given today's trend to distributed systems, is that the attacker gains access to one shard (or its dump) only.
On the other hand, yeah, it's much more likely the entire account database was dumped.
At that point honest behavior would still assume all accounts were transferred. You don't know that data was not transferred earlier or it's also hard to estimate what part of data was sent successfully.
If data could be accessed it should be treated as compromised.
Can they claw back money from Yahoo shareholders because of this?
Is proper audit capability just not seen as important at these companies?
It's relatively minor. Equifax's preliminary estimate was off by less than 2%, Yahoo's ~300%.
They are probably counting my 25+ Craigslist accounts I guess. And just maybe all the 'princes' I've been over the years. Lol
I mean "safe" in the sense of being unlikely to cause confusion or problems with less-than-well-written software (or humans).
Obviously .com is okay, and I haven't heard of problems with .edu/.gov/.org/.net, but I'm a little afraid of getting a domain for email addresses that isn't a well-established 3-letter TLD, on the off chance that someone has hard-coded a requirement like this in their code. I'm not sure if I'm just being paranoid about this though. Any suggestions on what's considered safe?
HOWEVER, an annoying problem that I've had over the years - and while it has diminished slightly still persists - is that people (or at least people here in the U.S.) are not used to hearing domain names that don't end in the usual .COM, .ORG, .NET...so I ALWAYS have to clarify and explain that my email ends in .CC and not .COM, etc. i find myself still doing this even today - almost a decade later - with so many lay people "being online". I sort of expect that more often with lay people more explanation is needed, but you'd be surprised how many technical people also are not as used to hearing domain names that don't end in the usual top 3. I like the .CC TLD, I really do...but having lived these last 9 or so years with having to constantly explain to people (with whom I plan to correspond with) that there are soooooo many other TLDs out there (beyond just .COM, .ORG, .NET) does get really tiring. If I had to do this all over again, I would have gone with .NET or .ORG (the .COM back then was already taken for my domain name). Oh well.
Maybe a bit, I don't think it is based in paranoia, you have technical reasons. Just recently had to strip tld from uri's, and boy was that harder then excepted!
That being said domains like co.uk, co.jp been around for a long time. I will stay away from "fancy .named" domains, but country level names should work fine.
Would love to hear other opinions as well.
A lot of admins do block something like .services.
Your best bet is to go with a tld that is also commonly used by companies.
Also, if this is important to you, you need to check if whois privacy is supported for a specific tld.
The IP address is in the 200 range. I used to remember the IP address for many years due to photographic memory even though I had only seen it briefly once. But I just cannot dig up that memory anymore.
There are two kinds:
(1) The first kind looks for problems never seen before. Here we get to use data of two kinds, (i) when the system was healthy and (ii) when the system was sick and we detected the problem, understood it, found out why, and tried to prevent that problem in the future.
(2) The second kind looks for problems never seen before, that is, zero-day problems. Here we have no data on the problems but likely do have a lot of data on when the system was healthy or at least seemed to be, not just on the day of the data collection but also later.
In both cases we have two ways to be wrong:
(A) Say that the system is sick when it is healthy -- a false alarm.
(B) Say that the system is healthy when it is sick -- a missed detection.
So, from (A) and (B), we get two rates and want both to be low.
We can get data on many variables at high data rates.
Now, what do we do?
Okay, it's a problem in, say, data analysis, data science, statistics, AI/ML, right?
Hmm .... What do we do?
Uh, be warned: If the false alarm rate is too high, then the monitoring will be ignored.
IMO, password reuse is the #1 web application security problem in the world right now, and there's very little in the way of accepted industry standards to mitigate it.
Their MFA is still SMS-based, which I’m pretty sure is a bad thing. They don’t allow an app like Duo (although they do reject VOIP numbers which I guess is good).