- Businesses must disclose what information they collect, what business purpose they do so for and any third parties they share that data with.
- Businesses would be required to comply with official consumer requests to delete that data.
- Consumers can opt out of their data being sold, and businesses can’t retaliate by changing the price or level of service.
- Businesses can, however, offer “financial incentives” for being allowed to collect data.
- California authorities are empowered to fine companies for violations.
I totally understand that this will impact a lot of tech companies' profits...but that's to be expected if you're making money selling people's data to third parties without their permission.
This is something I object to. It's just fundamentally stupid and doesn't make sense. The entire premise of free exchange is that I give you my services in exchange for something of value of yours. Making it illegal to withhold services if you don't give up your data is crazy. The only reason those services are being provided at all is to get that data. That's effectively a requirement that people provide services for free.
EDIT: I'll also add that it strongly favors incumbent tech companies, by explicitly carving out "selling to third parties" as a disfavored tactic. Google can monetize your data internally. Your average startup may not be able to. Specifically carving out "selling to third parties" favors large, incumbent players over small startups. But then, regulation always does.
That way it feels more transparent to the end-user, and they have to make the conscious choice of either giving a bit of their data in exchance of a free service or be prepared to pay to keep their data private.
What's the practical difference between these two scenarios?
1) I offer a free service if you let me collect your data, but charge you $3/mo if you refuse to let me collect it.
2) I offer a $3/mo service, but give you $3/mo back if you let me collect your data.
Option #1 is explicitly illegal under the new California law. If option #2 is legal, then what is the law supposed to ban?
What's meant by saying you can offer incentives is that you must present all information at the same time and that users are considered as opt-out by default.
Basically, the user must go through a sign up page that has a box that says "You may sell my data in exchange for a $3 discount" and it cannot be selected by default. You also cannot have a sign up page that has a box that must be selected to opt-out whether the box is selected by default or not.
The no-relatiation clause also has addition requirements that are not related to money. For example, you cannot refuse to allow a user to access your site because they didn't allow their data to be sold. You also couldn't impose access limitations like slowing the rate of page response or increasing the number of advertisements on the page or send spam only to users that opt-out.
The difference isn't really about charging a fee to users that won't let you sell their data. It's more about not allowing websites to harass users with increased advertising on behalf of whoever they wanted to sell your data to. It's only illegal for them to sell the data, internal use is still legal, so they're preventing a loopholes.
Either way though, I’d be happy if it were just required that there BE an opt out even with a price tag. Google can then finally tell me how much money they want for their services.
It might inadvertently assign a price to the data collected.
It also shows a direct discrimination against poor and/or young people who might not afford the service.
As opposed to what, treating everybody badly? McDonalds discriminates between people who can afford a Big Mac and people who can't, and that's not a problem.
If you don't have the money to buy a big mac, you either eat a cheeseburger on the dollar menu or go home and cook a burger yourself.
If you don't have the money for the software library, you either choose a lesser library, or go write your own.
A larger problem however is many are considering the internet a fundamental human right, which it isn't. Might be nice to have, but a fundamental right it is not.
Average revenue per user is around $25 for global users.
This brings up a related issue that not all users are equal. Even in non-data mining business models, you still have some segment of the users subsidizing another.
The problem is that then you cannot effectively offer a service level that is ad based. Say I want to offer some web services. I provide a free tier with ads (information gathering and selling) and a payed tier with no ads. This law says that I cannot charge more for the non-ads version. OK, so then I make both cost the same, either free or some fixed cost. If they are both free, what stops everyone from just using the non-ads version, it's free too. If both aren't free and cost money, what is the point of even providing the ad version, since people can pay the same and get the no-ads version. Not to mention, everyone that is fine giving their information in order to receive the service for "free" is now not able to do that.
It's not the end of the world, it's the end of some business models.
We impose these restrictions for the good of the public. If we decide that exchanging personal data for goods and services should be illegal under certain conditions, I don't see why that's any different than the precedents I mentioned.
Of course, it's all up for debate. I used it as an example because it's the status quo, so it's a good refutation of the statement that this new law is "crazy" in the sense of it being radical and unprecedented.
> - Businesses can, however, offer “financial incentives” for being allowed to collect data.
This just means your service always costs money, but you can refund the full amount for allowing you to collect data.
Not if the thing I value is my personal data. Because I, as a typical short sighted consumer, will consistently underestimate the value of my data, as well as how my data affects those around me (extreme case: my data is my social graph, which you can use to reconstruct the social graph of my acquaintances).
If I don't know the actual value of what I am giving up, I am naturally going to get swindled.
Do you, for example, not use Google because you are concerned they are getting too good a deal from you? Duck Duck Go exists.
If Friendo doesn’t have the same relationship I have with Tech.co, how is Friendo benefitting? Where is the exchange there?
Any concern regarding whether consent is likely to be withheld really just points out how stupid a company's business model is. Incumbent vs. Startup is also a false premise because Facebook is an incumbent that relies on selling user data. They do make some money from purely internal advertising, but in total Facebook only makes less than $5 per user when selling data, and that's lifetime, not per year. So the only way to make real profits from selling data is if you sell millions of people's data. Their internal data use however, is just for add targeting which they make more money from each year. This internal use makes more per user and is therefore a more viable strategy for a small company than selling data.
There are already regulations that make it impossible to sell user data that would be valuable that apply to more than just internet collected information. So, any data selling strategy a startup or incumbent company might have will either not earn significant profits or is very likely already illegal.
In any case, advertising revenue rarely ever covers a company's expenses. If a company is only staying afloat because of selling data, then that company is achieving a very poor return on investment and is wasting the time of everyone involved. Either they need better business processes, strategies or they're not offering something of sufficient value to be worth the time put into the business.
In this view, a company offering a discount in exchange for sharing is inherently deceptive.
The service isn't free. The exchange just isn't money. When money is involved a price is put forth. It's an agreed to and published exchange mechanism. When it's data on someone the price isn't shared. It's kept secret. There's always a price. It's just not always money or public. This will make more information public.
I wonder if this will start to change how business operate. Right now they are skewed against consumers towards businesses. I wonder if consumer protections, like this, will cause businesses to re-evaluate business models.
Giovanni Buttarelli, European Data Protection Supervisor “There might well be a market for personal data, just like there is, tragically, a market for live human organs, but that does not mean that we can or should give that market the blessing of legislation.”
Privacy is fundamental human right that you cant trade for. Same as you cant sign a lawfull contract that you want to be someones slave, even if you want to.
However, customers aren't well informed, and if they were, they wouldn't use these services. Currently customers are getting scammed for their data. This is why we need regulation.
The problem is that these companies advertise their products as "free," which means something very different. And they usually don't mention at all that they're collecting shit tons of personal data. It's taking advantage of people's ignorance, and it's practically fraud, IMO.
Every other business tells the customer up front what the price is, and maybe it's time tech companies started doing the same.
Without telling users what data they collect, how much they collect, how they use it, and who they give it to, there's no way for users to make an informed decision.
I never explicitly stated that I'd serve these companies by giving my data. Services cannot be assumed, why do you think there are such long Terms of Service. We as users should also have Terms of Service then.
EDIT: Several comments below point out the innumerable businesses operating in such a way as to make mine an impossible claim.
Enumerating the established methods to avoid meeting the legal parameters of price discrimination is not the same as price discrimination being legal.
My bar for following just laws is higher than the minimum required to avoid being prosecuted. I’m sure experts can help work around any future privacy laws too, so why even worry?
Surveillance already creates a different product one can charge more for, though your lawyer might advise you call it “personalization” in memos.
It is illegal to charge a different price based on a protected class like race or gender. But travel sites, for example, are notorious for charging different people different prices based on their GeoIP, whether they're on mobile or desktop, or Windows vs Macintosh.
Robinson-Patman Act says "Please read me, and understand me and the Clayton Anti-Trust Act which I amend."
From refurb's link below:
"Price discriminations are generally lawful"
They have no clue about the Robinson-Patman Act amendments to the Clayton Act.
Maybe you can show me generic ads based on your content instead of targeting them based on where I've been on vacation last month, etc.
Or simply stop building businesses based on people data. Sell something people want to pay for. Humankind has been able to do that for thousands of years, did we forget how to do it?
Couldn't a financial incentive be charging more?
> including by charging the consumer who opts out a different price or providing the consumer a different quality of goods or services, except if the difference is reasonably related to value provided by the consumer’s dat
Doesnt that second part cancel the first part? Businesses can also pay money for data ... so facebook can switch to subsciption only, but pay its users with virtual coins for viewing ads
I think it means that if you opt out of data collection, then things like personalized recommendations will no longer work, which makes the services worse. That is a part of the service directly related to the data.
sure, if it wanted to fall out of the fortune 500.
(Tangentially, they would have to cut costs to remain profitable, but I figure it should be possible for 6k employees to maintain a set of services that people are willing to pay $2.5/year for.)
If people don't find your product valuable enough to pay for, then your company should go out of business.
Companies have been exploiting people's privacy behind their back, and now that they've been called on it they're throwing a tantrum.
"Something" doesn't mean "anything". You can't offer your services in exchange for e.g. my body parts. Why are we willing to ban that but not our data?
> The only reason those services are being provided at all is to get that data. That's effectively a requirement that people provide services for free.
Well, no — they could charge money for them. But even this is a false dilemma; surely we can come up with at least one business model other than "charge directly for our service" and "surveil our users".
Because if you were to run a survey over the general population the large majority is fine not having to pay for gmail, google search, maps and other "free" services while some data may be collected doing so while a much smaller percentage thinks it's OK to sell their organs for that?
If someone went back in time to tell people that this new internet thing was going to lead to the US having more surveillance than East Germany or the Soviet Union, there would have been laws passed and systems designed from the start to prevent it from happening. The fact that it didn't happen this way, and our culture's privacy norms have been destroyed as a consequence, is more reason to act to prevent further damage, not a reason to just give up.
Sure, but if you had phrased the question more fairly and said "Should it be ok for a company to give you free services in exchange for tracking everywhere you go?" I think almost everyone would have said yes.
> If someone went back in time to tell people that this new internet thing was going to lead to the US having more surveillance than East Germany or the Soviet Union, there would have been laws passed and systems designed from the start to prevent it from happening. The fact that it didn't happen this way, and our culture's privacy norms have been destroyed as a consequence, is more reason to act to prevent further damage, not a reason to just give up.
My intuition here is the same as yours. That this is a bad thing. That privacy matters in an intrinsic sense, not an operational sense. But i'm not so sure that's really true anymore. The privacy ship has sailed off into the sunset for a while now, and at least to me, the harms don't seem that substantial. All that data is mostly used to give me ads that are more targeted to what I might want to see anyway. There certainly could be lurking latent risks about, but it's a little hard to see exactly what they are.
Then you should add shadow profiling, data banks linking together data of dozens of different companies, politically targeted advertisement that pretends to be from human users.
Yes the free market is nice and all, still the internet is a hell of a creepy place. It is undeniable that the reason this was allowed (comparing to direct mail advertisement for example) is that it was always believed to be impossible.
Most people don't understand the risks and how badly mass surveillance done by Google / Facebook / etc can be abused. So akin to situation with tobacco industry someone have to educate them about this.
This is totally dodging the issue.
Would they pay instead of allowing that collection?
Given the choice of "pay for this service", "stop using this service", "give up your data", even with a full understanding of the risks and the extent of this data, the general public has shown time and time again they will take the last option.
People do not like to pay for things.
I see way too many arguments about privacy that doesn't seem to grasp that the core of today's tech industry is built around this fact.
People will pay 5$ for 50 cents of coffee beans and enough sugar to drown a fly a day, but they will never pay 5$ a month to access nearly the entirety of mankind's collective knowledge at their fingertips in milliseconds in the form of a search index spending millions of dollars a year to index it all.
I pay a lot more than $5/month for Internet service.
I recall people were talking about turning back to piracy in response to the fragmentation of streaming services.
Disney plus will cost around 7 dollars a month, Netflix costs 13, HBO costs 15. This is about 23, 45, and 50 cents per day respectively.
Yet this change was enough to cause some people to angrily proclaim that they'll go back to illegally downloading movies and tv shows.
Look at music, this seems solved there.
Nobody can make an informed decision because nobody knows exactly what data Google collects or exactly how it's used.
And unless you've actually run a survey you can't even speculate about whether people are okay with it or not. If your claim was true, why would this law even exist? You're okay with it, but privacy is a highly personal issue.
How is this an argument against the law? It doesn't make it illegal to share data but requires that users can opt out. If said large majority is fine with surveillance, I guess Silicon Valley can relax.
"Hey, I want to use your free service but I want to go ahead and opt out of the part that enables it to be free" You don't see a problem with that?
It would be like if there were a restaurant that gave free food in exchange for filling out surveys (data collection). So you eat the free meal and then "opt out" of doing the survey.
in this case, NOPE
This is simply not true. When informed about it, a clear majority of people would prefer not to be surveilled, and are uncomfortable with unknown third parties having profiles on them, ads following them around the internet, etc.
The fact that these people continue to use these services anyway (even after being informed) is not a sign that they are 'fine' with it. It is because they have no choice. Even as an engineer, I've found it unfeasibly difficult to avoid being surveilled while continuing to live a normal life. For most people it's a non-starter. They have no choice.
Freedom to choose is essential to a free market. Regulation, done well, can increase market freedom.
At least I know where my data is. Have you ever read a GDPR popup partner list? Do the people you mention understand how gmail business model works?
No need to come up with something new, we had content based ads long before Google and Facebook took over.
Lots of us feel that the data thing should be allowed. That's all.
Okay, cool, don't opt out then. I on the other hand would prefer to see ads based on the content I'm viewing for which no other data from me needs to be passed around.
In fact, ideally, the User Agent would advertise our privileges. Like a DNT option and websites would respond with a paywall or something instead and be expected (by law) to respect the DNT. Then, on first load, browsers could say, "Do you wish to allow tracking?" and warn about the added paywall cost vs. the tracking cost.
Well, I quite like that part. It pushes content providers to look for surveillance free business models, such as content based ads that aren't personalized. Why should selling your data vs. paywall be the only options? To protect Google's ad cartel?
That seems like a weird spin. Businesses are being regulated here, not individuals. I don't see how the law would control me.
It doesn't. That's merely one way how businesses could comply with the law, the other option would be to look for better business models as I said in the comment you replied to.
I don't think they do, given that most people don't read them, and a large percentage of them are written so it's very hard to tell what they are really saying unless you're a lawyer.
It's possible people genuinely don't care. It's also possible they don't understand the implications, and/or that they trust these companies more than they ought to.
First, having to read a long block of legalese on every visit to a website (which would be necessary because those privacy polices can change without notice at any time) is unsupportable. If people actually did that, it would render the web completely unusable. The average normal person can't be blamed for saying "screw that", nor can their attitude about privacy really be inferred from it.
Second, even for those of us who are more concerned than the average person about these issues, reading privacy policies is a pointless waste of time. Once you have a lawyer interpret them for you, it turns out that the vast majority of them say the same thing -- they are reserving the right to do anything they want with my data. That means that I can safely predict what privacy policies say, so there's no need to read them.
Just because people use a service doesn’t mean they have read the terms and conditions. Therefore, your anecdote is just that.
A similar analogy is that I may not want to read my credit card bill, that doesn't eliminate the responsibility I have to pay it. Or, I may not want to read my visa card notice they send informing of a change in the APR or other conditions for service. however, my continued use of the card is the standard way one accepts new or changed TOS.
Maintaining ignorance doesn't excuse the actions.
Not really relevant to a change in the law.
Then again GDPR requires opt-in, but most sites ignore that and many make the opt out extremely time consuming.
> This is not evidence of numbers, but I have an existence proof in that I also personally know a lot of people who know and insist on not opting out.
Most people I know would interpret that as "I know I'm not the only one because". After all, I specifically say "not evidence of numbers" and I also specifically say "existence proof". I find your response baffling.
We could definitely have a system where eg all CC processing must be done through a firewalled system such that the business only ever learns that the payment was successful (unless there is a dispute).
Edit: Europe tries to approximate this with the GDPR system of “you can only use information for the purposes for which it was collected” but that doesn’t stop them from seeing your identity like I described.
Is this a real question?
For many, many people, that is _fine_.
The real issue I see here is how this law is written and how we're interpreting it. If the law makes it so that people who want to opt out of free internet services because of privacy concerns are free to do so, then we're all good. If the law is written such that people who want to avail themselves of free services in exchange for their data are no longer able to do so because that business model is broken, then I see a lot of problems.
I would argue this isn't the case. For many people it is out of sight and out of mind. They don't realize what's happening.
I came to this conclusion after talking with people about it. Numerous people didn't believe it. I had to show people documents and articles for them to believe me. Most of the people I've spoken with do not like this behavior and would rather not use a service or pay for it if they had known.
The average person is unaware. That does not make it ok.
This is where the line is drawn for most - if people from Facebook, Amazon or Google look at information other than machines. Totally fine if it's just a computer with a strong law like GDPR safeguarding it.
From my conversations with Googlers at least, it does seem they're extremely careful about insider threats.
You can easily find articles about sufficiently low-level drones getting fired from Facebook for snooping if this is an authentic query.
If you have trusted friends in tech, you can ask them even better.
Any actual data other than anecdotes?
If your business can't survive by charging a fair price for the service it provides without stealing data, then it shouldn't exist.
Please note that these are deliberately constructed as strawmen; I am not advocating for them or saying you do.
1/ Would you approve of a law which forbade companies (perhaps outside of eg healthcare or finance) from collecting or storing PII?
2/ Would you approve of a law which forbade a company which bought or sold PII (perhaps with the above exemptions) from doing any other kind of business?
3/ Would you approve of a law which required companies to explicitly price and purchase PII from consumers?
4/ Would you approve of a law holding employees or executives criminally responsible for data breaches?
5/ Would you approve of a law standardizing the requirements for anonymizing data?
6/ Would you approve of a law banning online advertising?
7/ Would you approve of a law which placed the same requirements on any company which stored PII as are currently imposed on credit agencies?
Would you approve of a law which forbade advertising of any sort?
Would you approve of a law which required service providers to offer their service for free?
Would you approve of a law which forbade service providers from offering their service for free?
Would you approve of a law banning the sale of digital goods or rights to digital goods?
Would you approve of a law banning subscriptions or recurring payments?
Would you approve of a law forbidding the use of cameras in public places?
Would you approve of a law forbidding companies from having more than $X in revenue, for some X?
No. People are entitled to speech.
> Would you approve of a law which required service providers to offer their service for free?
No. Making money is Virtuous and Good. Acquiring money by theft, deceit, etc is not.
> Would you approve of a law which forbade service providers from offering their service for free?
No. Charity is fine. In the cases people are actually paying with their data and the "free" is just legal trickery then yes. You already can't pay people in blood. Data is just as important.
> Would you approve of a law banning the sale of digital goods or rights to digital goods?
No. However, DRM should be illegal.
> Would you approve of a law banning subscriptions or recurring payments?
No. The only issue here is predatory sign-ons.
> Would you approve of a law forbidding the use of cameras in public places?
This one is difficult. At any earlier time, no.
> Would you approve of a law forbidding companies from having more than $X in revenue, for some X?
No, I wouldn't bother.
> This one is difficult. At any earlier time, no.
I suspect the best solution here is laws around how the data from public cameras is stored and processed. Cameras that store a temporary loop of data, delete it after a day or two, and don't upload that data to the cloud are one thing. Cameras that upload every image to the cloud for permanent storage where face recognition software is used to track you movement around the city or your emotional state throughout the day are a different beast.
I'm not sure what you mean by "at any earlier time". Can you clarify?
You mention predatory sign-ons. In this context it seems you view the data-for-service trade as predatory. Do you think it would remain predatory if it were explicitly priced-- eg, if Google offered you a $5 credit on a $5 service for use of your data?
Would you approve of a law explicitly allowing blood-for-service or similar?
This is intuitive because regulation + law can really put a competitive barrier for established incumbents who (and arguably, they would be the target for lawsuits here) have resources to implement and comply with these regulations.
The law does sound great as a consumer, but I think the question is still up in the air about how will it be enforced and what will be the unexpected side effects?
Definitely something to watch for.
P.S. We've been working on a developer-friendly SaaS that helps companies automatically comply with jurisdictional controls + data security / privacy controls. Feel free to email me: mahmoud - @ - https://verygoodsecurity.com and I can dive deeper to answer any questions.
We don’t scrap seatbelt and airbag regulations just because they’ve had some unintended side effects.
Regulations aren’t set in stone forever, either, and a functional legislative body can always modify and update them as their effects become more well known.
Or, well, maybe I sound too idealistic. But...I think most of us can all agree that consumer data protections are highly lacking.
Often the downsides do outweigh their utility.
Guess we don't need food safety laws anymore, prohibition was bad!
Surely you can understand the difference between prohibition and consumer-protection laws? They aren't even like, the same field of laws.
From the perspective of ordinary users, GDPR was a pretty successful law with no real downsides. Businesses have to be more careful with data, that's about it.
There is over regulation in some areas, but lets be real: it's already extremely difficult to break into existing sectors, just as a result of industrialization, brand recognition, etc. The idea of the scrappy upstart is overwhelmingly a fantasy. This regulation won't prevent any businesses worth forming from getting underway, in my opinion.
More like, "almost never do the down sides outweigh their utility, because they aren't designed by idiots".
My only point is to support your case that it's not always idealistic to believe that some regulations, even ones that have been in place for decades, even with today's questionably functional government, can go away.
The emissions check program is aimed at carbon monoxide, nitrogen oxides, volatile organic compounds, and particulate pollution. Properly functioning emissions systems ensure a vehicle operates efficiently, but the effect on a car or truck's overall greenhouse gas emissions is fairly small - and it is greenhouse gases (primarily carbon dioxide) that are driving climate change.
Vehicles are still the largest source of carbon pollution in Washington. We are working to change that by making sure new cars are more efficient than old ones, supporting zero-emission vehicles and cleaner fuels, and advancing transit, ridesharing and other alternatives to single occupancy vehicles.
we expect air quality to continue to improve as older vehicles are replaced with newer, cleaner cars. Ecology will continue to monitor air quality conditions throughout Washington. If we see any reasons for concern, we will certainly take action.
Yup. Especially here in the U.S.
Often times, they do. Often regulations don't even achieve their own goals. Sometimes they even result in the opposite.
My point is that this can be factored into the growth of the company, rather than seen strictly as a barrier to entry, as suggested by the comment above. Are there really fresh startups that really need to worry about this kind of data gathering out of the gate? I'm imagining they're going to be spending a lot more time engineering features and interfacing with important clients to get data to improve their product directly. I often see business/marketing folks wanting user data for what amounts to very premature optimization.
So does a lack of regulation. That's how we ended up here.
- People who believe that privacy means being able to anonymously use services.
- People who believe that privacy means being able to control what other people do with data about you.
These are not compatible views, and they often conflict with each other -- both philosophically and practically.
If you believe you should be able to compel a business to delete data you gave them, then necessarily there needs to be a way for that business to confirm your identity and link you to that data. You become more concerned with this idea of "owning" information about yourself.
If you believe you should be able to do everything anonymously, then it becomes much harder to control information after it's been leaked. You can't implement things like geo-locking users because what you do with the information doesn't matter -- just collecting it is a problem.
If you're in the "everything should be anonymous" crowd, you're also less likely to agree with efforts like Right to Be Forgotten; you may even reject the idea of data ownership entirely. For someone in the "I control my own data" crowd, the Right to Be Forgotten is absolutely critical -- it's one of the most important safeguards we have against a future where everything is permanently indexed forever.
I'm oversimplifying, but at the moment, the majority of pure-tech solutions for privacy are on the "everything should be anonymous" side, and (at least for the moment) most legislative solutions are falling into the "you should control your own data" side. That leads to conflict. Not always, but sometimes.
It's important to keep in mind that even though the privacy movement is aligned on many issues, there is no binary "pro" or "anti" privacy, because there's disagreement from privacy advocates on both where we're going and how to get there. In this case, California's law is very much a "control my data" law. Points like, "Businesses would be required to comply with official consumer requests to delete that data" conflict with the way that "be anonymous" privacy advocates see the world.
necessarily there needs to be a way for that business to confirm your identity and link you to that data
Why would linking you to that data require confirming your identity? My password links my HN account to me and me alone, while revealing nothing about my identity.
There's no conflicting views. They're two, completely compatible aspects of the same view.
One is how much or how little data each service gets about you. The other is how much control you have over what those services do with the data they do get.
That's why technical and legislative solutions work in tandem, reducing the former (amount of data) and increasing the latter (control over data). That's also why technical solutions are preferable: there's no need to legislate control over data that services are unable to collect about you in the first place.
Let's say someone else uploads a photo of my face to an image sharing site. Is there a way for me to prove to that site that the face belongs to me without sharing additional information?
This principle also applies in the opposite direction. Let's say a third-party noncommercial site uploads a photo of my face and makes it publicly fixing. In order to demand they remove the photo, I need to be able to link that website to an owner.
It's not that the systems can never be combined. It's that following either system in the absolute results in conflicts with the other.
As for the use of photos of me that are owned by other people, I'm pretty certain that neither CCPA nor GDPR cover those. The EU might have some relevant privacy laws, but they're not relevant to the "dichotomy" you brought up, because no one expects to be unidentifiable in a photo in which their face is identifiably visible.
I am 100% not okay with HN doing anything at all with my location information. Why would they have that information in the first place? Why would they keep it?
The discussions I frequent are more of a gray area; I favor Maciej Ceglowski's Six Fixes a lot: https://idlewords.com/six_fixes.htm
But all of this is irrelevant to the thread you're replying to, because my user id is anonymous.
IP based geolocalization?
> But all of this is irrelevant to the thread you're replying to, because my user id is anonymous.
Just because your hacker news user id is anonymous doesn't mean your account can't have an ad targeting profile built based on it if HN decided. They are sort of independent, and most sites don't care so much about who you actually are, but rather that you can be shown relevant ads.
> I favor Maciej Ceglowski's Six Fixes a lot
Those are very interesting. Thanks for sharing!
Everyone knows no one reads terms of service, but even if people did, they don't generally specify location, just that they collect "information about your visit". Or they might mention that they collect your IP, but most people don't know what an IP address it is, much less have an intuition for how specific of location information it reveals (some probably think you can backtrace the IP and send the cyberpolice after them).
That's why ad targeting profiles shouldn't be built on that kind of incidentally-provided behavioral data. As per the Six Fixes, ads should only be targeted to the content of the page I'm looking at, just like a dead-tree newspaper.
Note that this can be achieved either technically, i.e. anonymous browsing, or legally, e.g. requiring my permission to track me to build an ad targeting profile. That's not a conflict, that's anonymity and control over data about me as two sides of the same coin.
Btw I highly recommended Maciej Ceglowski's other works, e.g. https://idlewords.com/talks/haunted_by_data.htm
> As for the use of photos of me that are owned by other people, I'm pretty certain that neither CCPA nor GDPR cover those.
You're talking about specific laws, and I'm talking about general principles. The dichotomy I'm supposing is between complete anonymity and complete ownership over my own data. Owning data means being able to restrict how other people use it.
No expects to be unidentifiable in a photo where their face is visible. But if I own my face, I expect to be able to issue a takedown request to sites who post images of my face without my permission. If I genuinely own my address, or my contacts, and a third-party site makes them publicly available, I should be able to do something about that, the same way that I would be able to restrict them from distributing a piece of IP I owned.
CCPA and GDPR don't cover individuals, just companies. This is a compromise, because it's just not feasible right now to have a version of GDPR that covers what ordinary citizens share. But there's nothing special about companies. If I'm committed to a world where I own my data, I don't want my rights to vanish just because the information was posted on a blog instead of Facebook.
And that's where you start to see this conflict -- because in order to maintain control over data, you have to, well... maintain control over data. You have to know who's posting it and who it belongs to. People who advocate primarily for anonymity are opposed to that kind of world. Their extremes look different.
In the real world, most people will be somewhere in the middle. They'll lean towards data ownership on some things, and anonymity on others. For example, I doubt that many people on HN believe that companies should be able to operate anonymously. Even though anonymity/ownership is not a binary choice, different people are going to be lean towards different sides of the continuum, and that's where you see these conflicts.
If someone publishes a link between my address and my name, or my address and an online handle of mine or something, in order to remove that link I would have to prove that I am the person with that name at that address, or that I am the person with that online handle or whatever, but there's no fundamental reason I would have to further identify myself in any way. There's no conflict between anonymity and control here. Their having that information about me has hurt my anonymity, but controlling that information in no way further hurts my anonymity.
The possibility of corporations being anonymous is not an example of a tradeoff between anonymity and control either. If the operator of a company had the option of being anonymous, that would be giving them more control over information about them. More control = the option of more anonymity, because there's no tradeoff or conflict, they're two aspects of the same thing.
I feel like I'm missing something fundamental with your point, because this sounds to me like you're saying the same thing I'm saying, and then coming to a different conclusion. How would you prove that you were the owner of a name and address without revealing additional information about yourself?
The best way I can think of is to use a trusted third-party for verification. For example, send a censored utility bill that's in your name. But this still seems to me like it's a compromise -- it means that 3rd party needs to know your name/location, and it means you need to reveal your relationship with the third party.
Is there another strategy I'm not familiar with? I suppose in some cases, like with a government ID, your relationship with the third party wouldn't be new information. But it seems like a stretch to say that a government database of names and addresses wouldn't impact anonymity.
> The possibility of corporations being anonymous is not an example of a tradeoff between anonymity and control either.
If a website operator is anonymous, how are you going to contact them and require them to take down your information? If they refuse, or if you get their website removed and they just keep buying and posting it on new domains, how are you going to stop them from doing that?
I'll fully admit I've been talking a lot about theoretical extremes, but this particular problem isn't theoretical at all. Onion sites are already a thing, and it is notoriously difficult to get illegal information removed from the dark web because the site owners, site visitors, and even server locations are anonymized.
Tor makes it easy for me to stay anonymous, but it seems to me like Tor makes it very hard for me to control my data.
Theoretically, there could be a system where, if the publisher of a name+address gets an anonymous request to remove that information, then they have to mail a letter with a nonce to that address, and then if they get a followup request with that nonce, then they have to comply and remove it.
In practice, a system involving third-parties such as a lawyer or government agency is more likely, I'm just proving my point that there's nothing fundamental that requires compromising anonymity.
It's also worth noting that if a client is known to their lawyer but unknown to anyone else, it's pretty widely accepted to describe that client as anonymous, even if they're not technically anonymous in the ultimate, purist sense.
If a website operator is anonymous, how are you going to contact them and require them to take down your information? [What] if they refuse?"
First of all, that's not a conflict or tradeoff between my ability to anonymously use services and my control over my data.
And the obvious solution, which is the one we have, is that to run a website, someone has to give up a little anonymity and be subject to laws. But there's no fundamental reason people can't anonymously use services and maintain control over their data.
It today's world, data about a person is an asset. A person should own their assets, and have control over them. If there were only one option, this would have to be it - it's the only one that aligns with business interests. If you instead go purely the route of anonymous data collection, because data is such an asset it just gives businesses the strong incentive to find ways to de-anonymize your data. That is an unstable situation, and a societal counter-productive incentive.
Since data on me has value (clearly since businesses are run off it), make it a true product. Give it value, allow me as a consumer to trade it, allow businesses to quantify it's value in the market place, and explicitly bid on. That aligns incentives. As data becomes more (or less) valuable, businesses will adjust their prices for it. This aligns incentives, the more data is worth to a business, the more they'll be willing to pay for it. IMHO this is clearly the right approach.
That said, there is still a world where both methods exist. Users choose when they interact if they want to perform anonymous interaction (or pseudoanonymous), and people provide services to ensure it is anonymous. HN, reddit those are forms people would likely choose pseudoanon for, but there would be a firewall between someones pseudoanon identity and their true identity. And people would be pseudoanon knowing the risks and taking care to not divulge linking identityy info, and businesses would ensure there are limits on data preservation / recording of pseudoanon interactions. The same way people post on HN and use care in what they choose to disclose, HN would also ensur data expiry is short enough and would not share/sell pseudoanon data to 3rd parties.
Again, if you can only pick one, it has to be consumer ownership of their data, but I think they both can work together.
The first issue is that many privacy advocates who believe in anonymity do not believe in data ownership (or believe it should be much weaker). To them, the jump from "data is an asset" to "I own my data" to "because I own it, I should be able to control what people do with it" is begging the question.
The second issue is that in practice, most anonymous systems also make it hard to verify data ownership. In order for regional restrictions to work, you need a way to tell what regions your users are in. The "anonymous" side's solution here is, "anybody should be able to convincingly and legally lie about their physical location to (virtually) any business." If that solution is implemented, GDPR and Right to Be Forgotten don't work because it's impossible to verify jurisdiction.
This is part of why efforts around GDPR and Right to Be Forgotten are focused on businesses. Businesses have a physical address, they're easy to track, it's easy to prove that they're advertising to a specific region, and you can force them to share internal data with a judge. It's a compromise, because applying GDPR to non-commercial entities or individuals would require tracking them on a mass scale.
It's not impossible to compromise -- I mean, privacy advocates do compromise all the time. We work together even though we're different, because we have lots of shared goals. But the differences aren't trivial in the real world. When someone sits down to build a system, they're either thinking about managing data, or eliminating data. That approach is a big indicator into whether you'll end with GDPR or Tor.
How is that? GDPR protects EU residents when they are outside the EU, so you already can't just look at someone's location and decide not to give them GPDR protections. If the GDPR applies to you, your GDPR related features need to accessible to all your users.
https://ec.europa.eu/taxation_customs/sites/taxation/files/i... (PDF) section 2.1.
Which makes it hard to charge money to people without knowing where they are!
The GDPR applies to a data subject regarding data about their activities when inside the EU, whether the data controller is in the EU or not.
If both the data subject and the data controller(s) and data processors(s) are outside of the EU, then the GDPR does not apply.
How does the company know they are an EU resident when they are outside the EU? Probably because they have an account with the service, which makes it unlikely that they are anonymous.
You have a couple of choices with a law like this:
1. Just comply with GDPR anyway. That's honestly the easiest choice, especially if you're already privacy conscious. But you're lucking out, because GDPR is a relatively mild law and comes with a bunch of exceptions that make it easy to comply with. It's not a good long-term strategy to say, "I'll just comply with every country, and that way I'll never need to figure out who my customers are."
If you're not interested in complying with GDPR, then you have to stop selling to EU citizens.
2. At the point of sale, you can use something like billing information to try and figure out where your customer lives and block them if they're an EU citizen. This is unacceptable to someone who wants universal anonymity for citizens, because it requires billing information to be tied to identity/location. You're basically guaranteeing that you can't ever move to a payment system that doesn't provide that information.
Maybe you can skip billing information, and use some kind of government ID number instead. But no matter what, you need some way to tie the thing giving you money to the person who legally has a citizenship in a country.
3. If you don't want to verify, you can just ask the person if they're European and block them if they say 'yes.' This is probably the compromise that would make anonymity-advocates happiest, because it doesn't require any extra data to be collected and customers can lie. But that's also the problem -- customers can lie.
In the US, the most direct analogy here is COPPA. COPPA is a set of privacy restrictions for what information can be collected about children under the age of 13. There are traditional ways you can fall foul of COPPA (some sites are just obviously targeting children). But for the most part, the US went with option 3 -- you ask people their age before they sign up for your site, and you block them if they're under 13.
Again, option 3 is great for people who love anonymity. But it takes all the teeth out of COPPA, because children just lie and use the services anyway, and then their privacy gets violated. And the company winks and very coyly says, "Oh, we had no idea 10 year olds were signing up for Facebook. It's not our fault."
If you wanted a COPPA that did more to restrict data collection, you would probably prefer something like option 2 -- where we collect enough information about children so that they can't fake their age, and use that to block access. Except doing that reliably would require either building a national identity database or collecting other data that would itself be considered by some people to be a violation of privacy.
Why? If you have decided to become GDPR compliant then you don't need to know which customers are EU residents. If you really want to know if some customers are not EU residents, you can ask them. There is nothing in the GDPR that requires GDPR residence to prove their residency before you must comply with the GDPR.
> "I'll just comply with every country, and that way I'll never need to figure out who my customers are."
Like most things, you need to comply with the laws of every country you do business in or face the prospect of penalties (which may or may not be enforceable without a legal presence in that country). This is nothing new.
> If you don't want to verify, you can just ask the person if they're European and block them if they say 'yes.'
You could, and some companies have thrown a hissy fit and decided to do location based blocking when there is no evidence that this is sufficient or necessary to indicate that you don't do business in the EU.
This isn't strictly necessary. As long as you don't target EU residents, you don't need to comply with the GDPR. Just don't advertise to europeans, don't talk about having european customers, don't ship to european adresses and/or don't localize to languages from countries where you don't do business.
> In the US, the most direct analogy here is COPPA. COPPA is a set of privacy restrictions for what information can be collected about children under the age of 13. There are traditional ways you can fall foul of COPPA (some sites are just obviously targeting children). But for the most part, the US went with option 3 -- you ask people their age before they sign up for your site, and you block them if they're under 13.
This is not very accurate. COPPA covers more than just what data can be collected. It also has provisions that require the ability to opt-out of the data being shared with 3rd parties and provide notices that clearly detail what your and 3rd parties will use the data for (and who they are and what they do). Additionally, COPPA requires parental consent to collect this data.
COPPA is IMHO a flawed law, the general privacy protections should have just been extended to everyone. Age verification and consent validation were never going to work and it seems patently ridiculous to require companies to collect more dat a to protect privacy.
> Like most things, you need to comply with the laws of every country you do business in
How do you know if you are doing business in the EU without verifying the citizenship of the people who buy from you? If I'm selling a digital product, how do I know whether or not EU citizens are buying it?
You suggest below:
> As long as you don't target EU residents, you don't need to comply with the GDPR. Just don't advertise to europeans, don't talk about having european customers, don't ship to european adresses and/or don't localize to languages from countries where you don't do business.
This is the COPPA strategy, choice #3. It suggests that as long as you can pretend you don't know your customers are EU residents, it's fine to collect data on them. If that's the case, that's a much less effective law then we could otherwise have.
In regards to COPPA, you bring up the central problem yourself:
> COPPA is IMHO a flawed law, the general privacy protections should have just been extended to everyone. Age verification and consent validation were never going to work and it seems patently ridiculous to require companies to collect more data to protect privacy.
You're right, age requirements are a joke. We still don't have a reliable way to validate age without violating privacy. These types of laws only work if they're based on one of the three choices I listed in my post:
1. Universally applying the law to everyone, regardless of context.
2. Accepting that validation requires collecting and managing data, and being OK with the fact that we're going to collect and manage data to do validation.
3. Trusting consumers to self-validate and self-sort themselves.
The first option has sovereignty problems -- it doesn't work in a multi-nation, multi-state world. Even with something like COPPA, this strategy falls apart because a big part of COPPA is parental consent, and there's no way to universally apply a parental consent law. At some point, you have to decide whether or not you're going to validate the relationship between the child and the parent.
The second option is fine if you want to control your data, but means that we need to give up some anonymity -- maybe make a national database, or have some kind of proof-of-age or digital passport or something.
The third option is fine if you want to stay anonymous, but means that data protection laws have fewer teeth, because consumers will lie, which gives companies plausible deniability over violations.
What we can't do is have both 2 and 3. We can't say, "we won't require anyone to do any invasive validation, and also the validation will be really good and accurate." With GDPR, we either accept that many EU residents will unwittingly (or deliberately) do business with companies that are not beholden to GDPR, or we accept that businesses will need to validate the citizenship of their customers.
> How do you know if you are doing business in the EU without verifying the citizenship of the people who buy from you?
The GDPR lays out guidelines for what qualifies as doing business in the EU and it has nothing to do with verfying the nationiality of your customers (or doing geoip blocking). It has to do with the sorts of things I already explicitly mentioned such as advertising that specifically targets EU residents, localization into EU languages, shipping to EU addresses, etc.
> The first option has sovereignty problems -- it doesn't work in a multi-nation, multi-state world.
Why not? We have plenty of other types of regulation that differ between countries. Companies that wish to do business in multiple countries have to comply with all the laws for those countries. If you want to make a single car model that you can sell in two different countries, it has to meet both countries safety standards. If a company has no legal presence in a country, there is not much those countries can do to enforce the laws. (This last point is the actual weakness of these privacy laws and will have to be addressed by international treaties. This is an issue with enforcing rules in general (i.e. copyright) and doesn't just apply to privacy laws.)
> because consumers will lie, which gives companies plausible deniability over violations.
How so? At worst all this might mean is that consumers who choose to lie won't be protected. Plenty of other people would be.
A combination of #1 and #3 should work just fine.
> - Consumers can opt out of their data being sold, and businesses can’t retaliate by changing the price or level of service.
Seems to me that it’s a distinction without a difference. Is there something I’m missing?
Basically this puts Facebook in a real tight situation, I honestly wonder how they will survive it.
However, a mass movement to opt-out would absolutely affect them. This lays the groundwork for that, and thats what they should be afraid of.
On top of that, many people on HN, aware of the privacy implications, continue to have a Google Home / Alexa in their homes. Myself included.
Sadly, I (and many people) simply don't care about privacy. The probability/expected negatives of surveillance abuse is far less than the benefit of being able to turn my lights on and off with my voice.
The US will for sure become like China in 10-20 years, with regards to surveillance.
I very much doubt it. It is one thing when a bunch of small and big companies collect pieces of data about you, and another when the centralized governmentt does it.
Also, it really matters what that data is used for. I would be hard-pressed to compare the onslaught of personalized ads with "disappearing" people who speak out against the government. Just look at US-related political posts on twitter. Why would you even need surveillance if people feel perfectly safe to publicly air out their negative feelings towards the government to a giant audience.
For example, I think it would be legal for Verizon to get around this law by increasing their prices by $5 across the board, and offering a $5 rebate for customers who want to opt-in to their data being sold.
But it would not be legal for Verizon to allow customers to opt-out of data collection by paying an extra $5 a month.
In the first approach, Verizon will not track by default and customers have to opt-in.
In the second approach, Verizon will track by default and customers have to opt-out.
A user isn't "paying" for privacy. Privacy comes by default (for a price) and a company can pay a user to harvest their data.
Financially, they both work out the same, but based on opt-in, opt-out behaviors and perception I presume they have a different impact. The wording specified by the law says that users should have a default expectation of privacy. They can actively choose to give that up for a fee.
The business profit loss side of it says, you can't build a business that assumes it gets to profit off harvesting user's data for free. User data has value, place a value on it, and make it an explicit part of the transaction.
Ah! I was always saying I'm surprised by the brazenness of gas stations to offer cash discounts in violation of their credit processing agreements. Today I learned that as of 2010 the law protects them. Thanks!
Which makes sense, most other countries actually roll sales tax and others into the advertised prices, and what you see is what you'll pay out the door.
I agree that there is no practical difference between the two, but given how deceptive companies can be when disclosing various pricing and hidden fees, this might be an attempt to curb that behavior?
1) allowing customers to opt-in to data collection by giving a them reward. In this case, the default behavior does not involve data collection
2) allowing customers to opt out of data collection by paying a penalty. In this case, the default behavior involves data collection.
People usually don't bother to opt out.
The second line would allow companies to pay people to permit them to collect information. I imagine that because of the first statement, a company that is charging for the service cannot take advantage of this statement.
Way too hard to enforce, the definition of 'customer data' is going to be a constantly moving target. Does every click count? How about aggregated clicks important for general product optimization?
What constitutes 'selling' user data? Very few companies actually sell your data, instead they place ads based on your data. Will that be banned as well? Many companies, including Google would have to significantly change their pricing model if so.. yet that is apparently illegal.
Yes a click counts as personal data if you can reference it back to a real person. Aggregated clicks probably wouldn't.
Selling ads based on personal data is selling your personal data. The personal data provides the value to the transaction.
Yes lots of companies may need new business models, but for the most part what I've seen is dark patterns, non compliance or wriggling to avoid any real change.
In Europe at least, it's back to the regulators to make a move.
> Selling ads based on personal data is selling your personal data. The personal data provides the value to the transaction.
What about selling ads based on aggregated data? e.g. put users into buckets, then sell ads for those buckets.
I fully expect this part to be just as relevant over here
HIPPA manages with "Patient data". The standard techniques include non-reversible addressing of users. Patient N has an internal number and an external number. Without having Patient N's internal record in hand, you can't correlate it back to that user, which is particularly useful in a legal defense.
I sure hope so.
> Many companies, including Google would have to significantly change their pricing model if so
Good. It would be even better if they have to change their business model.
The problem is the rampant data collection that is engaged in without my permission and/or knowledge, and that companies often forward that data to others.
Targeted/behavioral advertising in particular has monetized that abuse. I would be thrilled if that business model dies an agonizing death.
The amount of abuse on that score has been so extreme, and going on for so long, that strong legislation is clearly the only remaining option to at least stem the worst of it.
I hope that's an accurate summary of your view.
I have a few points.
First, I don't think targeted advertising is much more effective than your typical magazine ad, TV commercial, radio ad, or movie trailer. In all of those cases, the advertisers know the demographics of who is consuming that media and can target pretty effectively. And if targeting was far more effective than old school ads, advertisers would pay significantly more for targeted ads, but they don't seem to. CPC for Facebook and Twitter is about the same (~50 cents), despite one allowing for much more fine-grained targeting. And I doubt LinkedIn does better targeting than Facebook, but LinkedIn's CPC is 10x higher.
Second, is this an instrumental value or a terminal value for you? eg: If all advertising were banned (or at least heavily taxed), would you no longer want a ban on companies storing customers information? If so, I'd like to hear why.
Third, what if we could pass laws that caused the benefits of targeted ads to outweigh the downsides? For example: We could ban targeting for politics. Then people could still get ads for more things they want, such as some software that would help them or some new barbershop that opened near them. That seems like a win-win to me.
Lastly, whenever I see an argument between to sides, I pay attention to emotions. In my experience, the less emotional side tends to be the more correct one. If a side mostly engages using disgust, anger, and catastrophizing, I get very suspicious. If I had to pick the more emotional side in this thread, it would definitely be the side that accuses the other of, "violating basic human decency", of being "horribly unethical", and doing "stupid privacy violating shit". On the other side, pretty much everyone against this law seems to think it's mistaken, not evil.
It's not, really.
I object to data about me or my use of my machines being collected without my permission at all. Past that, I object to it being shared with others. What it's used for is beside the point.
Marketing companies come into it simply because they are the most egregious bad actors when it comes to those two points.
> I object to data about me or my use of my machines being collected without my permission at all. Past that, I object to it being shared with others. What it's used for is beside the point.
What can I engage with here? I'm trying to figure out what bad consequences you're worried about, but you seem to have a terminal value that companies shouldn't be allowed to keep information you sent them.
Let's try another tactic: What do you think of casinos that share surveillance photos of card sharps with other casinos? Or wifi hotspots at coffee shops that keep logs of mac addresses to limit abuse? Or news sites that limit non-paying visitors to 3 articles per month? Your statement seems to be so absolute that it forbids all of these legitimate uses.
I do not object to data collection that is technically necessary to provide services. For instance, I don't object to web server logs.
I also don't generally object to sites using data I have willingly provided for their own purposes, as long as they aren't doing things like sharing it with other entities or combining it with data about me that they have obtained with other entities (unless I have given consent, of course). So I have no problem with your wifi hotspot or paywall examples.
Anything beyond that -- which would include your casino example -- requires my express consent. I do think that casinos violate this principle, because they do not inform me of their surveillance before I set foot on their properties. If they did, though, then I would not object.
The essential principle that I operate under is that of informed consent. It's really that simple. Having a relationship with a specific entity carries with it a certain amount of implied consent (web server logs are an example of this) -- but even that data should remain private between me and that entity. Any sharing of it with others requires my express permission. "Sharing with others" includes indirect mechanisms such as using that data to match me with advertisers, even if the raw data itself is not transmitted to others.
I fully agree, but it seems like under this legislation it becomes illegal to conduct a business in this manner. If I am fine with trading a certain set of my data for access to a service, I should be able to do so (if the service provides that option, obviously).
> Businesses can, however, offer “financial incentives” for being allowed to collect data.
The change is from opt-out to opt-in.
Why not just outlaw that?
That said, I am also not arguing against advertising as a funding model. I'm arguing against business models that rely on invading people's privacy.
Jeff Hammerbacher: ‘The best minds of my generation are thinking about how to make people click ads… That sucks.’
Those best minds are now having to change the way they generate revenue..
Of course, they’d hate the idea of having a fixed revenue per user. They want to keep sucking out more revenue per user until the well runs dry.
Ergo, Facebook will charge a different rate per nation; per state; ideally per user (they already have all the data they need to calculate exact revenue per user based on their data).
There is an incessant amount of whining about GDPR for example, and how "confusing" the regulations supposedly are. What it comes down to is many HN denizens are doing things that are explicitly prohibited by these data collection laws and want to continue doing the things that have been outlawed.
As they say, it is difficult to get a man understand something when his salary depends on his not understanding it.
And for some companies doing shit like selling customer data is the only reason they’re in business. Good riddance to them though.
Many of the people here who work for these companies truly and honestly believe the online services they are offering are/will change the world for the better.
As such, they view hindrances to this as threatening to the progress they are trying to help bring about.
Personally, I support this privacy initiative and think SV companies are many times viewed through rose tinted glasses by their employees, but that's just my perspective.
I can totally see how viewed through the lens of a hindrance to progress, some people would feel very strongly that I'm wrong in supporting such legislation.
- How do you identify what is customer data? There may be information stored in logs somewhere. Do you now have to write log parsers to extract personal data for everything that previously you just stored for general debugging and security purposes? How do you even know all the permutations of personal data that came be stored in the logs. There are possibly infinite possible ways personal information can manifest in logs. How do you ensure compliance with something when you don't fully understand what can come out of it? Any engineers now must fully understand the consequences of anything they log and design delete mechanisms for it. This extends to any 3rd party software you use that generates logs. You must now fully and deterministically understand your entire system just to comply with this law. Such a request is essentially NP-complete.
- How do you prune said data from logs?
- How do you delete data that are archived in write only media formats and/or that are in cold storage somewhere? You'd have to physically destroy the media and make a copy of everything minus the part you want to exclude. This dramatically increases archive storage complexity and cost.
YES YES YES YES YES.
Are you not already doing this for passwords, credit card numbers, and social security numbers?
Such a request is essentially NP-complete.
I think you mean undecidable, or equivalent to the halting problem, or subject to Rice's theorem. NP-completeness is irrelevant. I think you'll find that HN is the last place you'll win arguments by inaccurately using technical terms in the hopes that it will go over other people's heads, Legally Blonde-style (https://www.youtube.com/watch?v=8rNVaY7Stt4).
To anyone who knows what they're talking about, this an obviously nonsensical argument. It's similarly undecidable to verify whether the data that you expose publicly contains customer data, or customer passwords, or your own passwords, but you do it anyway, by restricting your engineers to only write and deploy code that they understand.
Nonsense. You write the log statements. You know what data structures you are logging.
If you're using some server's built in logging, or some logging library or middleware you don't understand, turn that off until you understand what it's logging.
Logs that do not contain explicit PII are still rife with pseudo-identifiers that could possibly (but not typically) be used to join activity with PII.
- You have one set of logs that stores anonymized click activity
- You have another set of logs that stores purchase transactions
- Both have millisecond timestamps
You could potentially link the click record from one logs database to the purchase record in the transaction database, when during a single millisecond there is only one transaction and one click happening. Now your anonymized click ID and all your click activity is linked to your PII in your transaction.
Sometimes it's off by a few millisecond. Sometimes the logs are obfuscated up to the second level, but then you'll still have instances of a single click and transaction in a second. Does this activity still need to be removed from logs, despite not being linked to your PII or even being identifiable? These are the challenges that need to be addressed.
Emplify, for example, makes it a point not to reveal averaged responses for subgroups of size less than 5, for similar reasons: https://intercom.help/emplify-insights/en/articles/1731829-c...
We should be making an effort to take such care with all customer data, even just when storing it. Mistakes are inevitable, of course, so small gaps that are soon fixed should be let off with a warning, with any fines proportionate to the amount of exposure and negligence involved. How would we do that? Maybe have an agency of experts tasked with determining the fines, and allowing companies to appeal those fines in open court. Like what GDPR does.
I'm a software engineer who works for a SaaS data analytics startup that has to comply with GDPR. It's not cheap, just like it's not cheap complying with all the laws restricting pollutants emitted by my car, but it's still completely worthwhile.
(My employer is not Emplify, although we are a customer of theirs. Good service.)
Exemplify is gating database records from surfacing through their UI. These records still exist in the database and are admin accessible. The act of knowing if clustering the data is too much still requires knowing the data - i.e. the data existing.
I'm saying that the thought they put into anonymizing the data they surfaced through their UI, that same amount of thought should be put into the data we all store.
If the data can't be clustered in a way that preserves anonymity, it should be deleted (after the desired aggregate statistics are computed). Emplify probably isn't required to, and so they probably don't. I'm saying they should be required to.
Saying, "Just don't store logs data" is a fundamental misunderstanding of how web development works. This data is crucial for operational uptime, debugging, and running an online business. The scope of the data is so large that there inevitably are factors that can be used for denonymization, which is to GP's point.
The reason I asked if you replied to the right comment is because logs data is fundamentally different from database records, which is the working example you gave you gave with exemplify.
I didn't suggest not storing any logging data, actually. I suggested deleting it. Old, stale logs are unnecessary for operation uptime or debugging and low-value for usability or security investigation.
They also cumulatively presents risks to customers. A gay blogger in Russia who used LiveJournal in 2004 might regret their decision now, even though in 2007 when LiveJournal sold to a Russian company, few reasonable people would have foreseen the country's turn towards homophobia later. If LiveJournal had, for example, replaced all IP addresses with cities in historical, pre-2007 HTTP logs, they would have lost nothing of value to them while their customers would be that much safer. If they had gone so far as aggregated statistics of requests and unique visitors per tuple of (user agent, city, timestamp truncated to 15-minute intervals), and then deleted detailed all HTTP logs older than 90 days as suggested by Maciej Ceglowski , can you think of anything of value they would have lost?
But of course I'm sure they didn't, because they weren't required to put that much thought into the data they stored.
I'm saying we should be required to.
I don't think this is quite the dichotomy you make it out to be.
So we can create optimizing compilers, but we can't figure out what to log?
This seems like a problem of never having motivation to solve the problem before.
"We can't do that, it's too hard" is often a mea culpa I'm industry when they oppose regulation. Then they will come up with a solution from having actually spent some effort to actually think of potential solutions.
It can trivially backfire to make things less secure if it is poorly defined which is generally a given. The GDPR made exfiltration easy as an account compromise - one could argue it is an acceptable trade off for transparency but the regulators must bear full responsibility for their constraints.
"We can't log sensitive customer data slowing down debugging and worsening data integrity" is one thing but now imagine "can't log customer IDs in a read only way as sensitive information" oops there goes a lot of useful auditing information as it is excluded or forced to be writeable.
Don't bring complexity theory into it.
The Backups question is a bit more complex. One source I've seen: "According to France’s GDPR supervisory authority, CNIL, organisations don’t have to delete backups when complying with the right to erasure. Nonetheless, they must clearly explain to the data subject that backups will be kept for a specified length of time (outlined in your retention policy)."
Paired with that is that if you're keeping data (or backups) for any length of time beyond the immediate needs of the customer then you need to be able to justify it.
Think this law is going to allow me to require 7-11 to delete me from their DVR records? Not part of the business model, it's a matter of security. Nice straw man though.
Their terrible system design can't handle "FROM PornPrefs DELETE SSN,Name,Address WHERE SSN LIKE "999-11-2222"". They brought this on themselves.
It’s like lines of code in a program — each one makes the application worse, so each one should have a purpose that it achieves.
None of the big names in tech will have any trouble at all complying with this; I'd be very surprised if any at all are not already compliant today.
At the same time, the percentage of tech startups that are already compliant with this law is likely around zero, and few will ever become so. Unless this is precisely what your startup is about, small firms, especially with venture funding, can't afford to invest anything at all into privacy beyond the surface. If your startup fails because it gets sued into oblivion, that's no worse (and way less likely) than it failing because nobody actually wanted a chat app for dogs.
Just like I don't want startups making unsafe medicine, or losing my medical secrets.
If a company is unable to even articulate what data it collects and cannot do basic operations on it (e.g. remove a piece of it), then it shouldn't be in the business of handling personal information. The same way a clinic that can't even keep track of blood samples shouldn't be in business.
And if a company's earnings depend strictly on being able to collect and sell personal data, what they need is a better business plan, not having everyone turn a blind eye.
I’ve always been told that it’s good practice to take periodic backups. In the absolute worst cases, you can simply restore directly from these.
If a customer requests that their data are deleted, in addition to my production instance, does that mean that I have to remove their data from my backups? If so, I’m uncertain of the best way to do this. I’m uncertain if many managed services will allow me to mutate backups. And even if I were managing my database and backups directly, it seems painful to load each backed up database, remove the data, and rewrite the backup.
Note: I’m not saying that any of this is impossible. However, it does require a lot of ancillary engineering work difficult for a small company that’s just trying to get to product market fit.
IANAL but the guy who told us how it's done was, and in addition to all the legal stuff, of which I have absolutely no recollection because I don't really understand it, he pointed us to this as a useful resource for people who are also not lawyers: https://ico.org.uk/for-organisations/guide-to-data-protectio... .
Turns out it's acceptable for data to remain backed up for a while (as long as you inform your users), as long as you have systems in place that guarantees it's not used anymore.
Just sayin, it's not rocket science. Reading Internet forums you'd think the GDPR was like Apocalypse Lite, but in my experience, it took very little effort to implement it for companies that weren't engaging in shady practices.
Implementation-wise, is the best approach to do this to store some token for "user XX requested YY data be deleted" and check those tokens whenever you restore a backup?
I feel like that'd run befoul of a true solution because, in the event of a leak, it could be used to tie the information in the backup to the user who requested their data be deleted. Or am I misunderstanding such that that'd actually be acceptable under GDPR?
Is there a better way to do it?
Also, I don't know if it's the best technical approach -- I did just because it's code that I wrote a very long time ago, for a friend who was just starting their business. I took care of it because we're still friends and he asked me if I could take a look at it, but it's the first time I've done backend/web development in more than 12 years now.
I think this is sufficient, even considering things like the potential for data breaches. It complies with both the explicit requirements and the general spirit of the GDPR. IANAL and all but I think that, since data leaks aren't a form of data processing by the company who collected the information, they are outside the scope of Art. 17. There are already requirements in place about the secure storage and administration of personal data.
Plus, if you think of it, the framework of this whole construction provides sufficient assurance. If you have live, online backups which can be restored immediately, with a single click, then it's clearly not a problem to erase data from them immediately. If you have offline backups, you're required to have a retention policy for them anyway, and you can't process data from them anyway -- not until they're restored and you've had a chance to purge them. It's certainly possible that someone might break into your storage unit and run away with your archive tapes or hard drives or whatever, but at that point there's a lot of legislation that you have to worry about having broken before you even get to the damn GDPR :).
But if that is really a problem, just apply the requirements to companies with a minimum amount of users. Venture capital funded companies don't need to cut corners and shouldn't be allowed to when it comes to privacy.
This is complete nonsense. The biotech industry is very large and vibrant and are most certainly not Big Pharma.
If you're ad dependent, would this basically mean you have to give your service to this user for free after this?
Basically Hillary's private email server getting bleachbitted, but for everyone now. Makes running an organized crime gang, political corruption graft ring or chinese espionage ring much easier. Same with banning facial recognition. Makes getting away with crime a lot easier than it would otherwise be. If you are a corrupt politician, this is really important stuff.
Are these the right laws to regulate SaaS companies that build business software? Should a consumer be allowed to request that data about them be deleted if that data are records of legitimate business transactions? If you buy a car from a dealership, do you "own" the data in their systems about your transaction and should you be able to request its deletion?
Yes, it is also subsidizing what would normally be paid services. Before online advertising, people would pay for services like email. Sure, $5 / month is cheap for us, but what about the developing world and the lower class?
If the price of the service is based on the ability to sell data, how is it reasonable to disallow the business from changing the price of the service for those who opt out?
Also, how can you reconcile this with being allowed to offer financial incentives for being allowed to collect it?
Remember last year when there was a big hoopla about that Massachusetts court case that hinted that any online vendor would now be responsible for collecting and reporting on sales tax for purchases coming from any US state (even if you as the business owner didn't have a business location there). Basically changing the requirement of the buyer to report sales tax for out of state purchases onto the business. The only problem is there's 9,998 different tax jurisdictions (as of 5 years ago) spread across 50 states.
“It is difficult to get a man to understand something, when his salary depends on his not understanding it.”