Operators should be free to log traffic at the network level, PII should only come into play once you're asking someone to provide personal information.
Household IP Targeting - https://www.vicimediainc.com/ip-targeting-direct-mail-intern...
Or even just your ISP (who for sure know your IP addr and your address) - https://arstechnica.com/information-technology/2017/03/how-i...
The larger issue that we (HN tech people) treat IPs as fallible because we're thinking of it like an absolute. The advertising side of the Internet looks at them like a goldmine b/c even a 75% correlation to "truth" can still make their ads reach the people they're trying to reach in a much cheaper way.
I wouldn't be surprised if some policies pertaining to record keeping in some sectors contradict that requirement as well.
Amusingly enough, California consumers will not have privacy rights regarding any written comments sent to the California Attorney General.
The IP is a an identifier, so unlike password salt (where the user is the identifier) you need a way to know what the salt is to hash the IP, and it needs to be consistent.
You can do a lookup table of IP-to-salt, but this either gives away your list of addresses (if only containing IPs you've seen) or is huge (entire ipv4 range), and either way doesn't prevent rainbow tables.
You can have a static salt for the entire site, but again this is not really helping much against rainbow tables (beyond requiring recalculating the table, once).
Is there a mitigation I'm not thinking of?
If you need the ability to group ciphertexts without decrypting them, you could create a scheme which will make cryptographers cringe, but could be justified in this specific case.
For instance, you can probably assert things like which IP blocks are likely to comprise most of the entries in the table or which IP blocks or addresses cannot be in the table.
That just makes me all sorts of uncomfortable.
From security point of view, if you use a fixed (unrelated to input) salt, the attacker will have a harder time to discover the function f (unless you store the salt next to your IP hashes). But from privacy point of view, in relationship between me (user) and you (service provider), you are the attacker. And you know your function f. Hashing IPv4 addresses, salt or not, gives me no privacy protection, since you can trivially reverse the hash - just due to small domain size. With IPv6, this problem will resolve itself somewhat; till then, I'd prefer if you encrypted those IPs with keys that have finite and short life time, in a way that a third party could audit if need be.
Modern GPUs can manage several thousand million SHA256 hashes/sec, so even with a salt per hash it’s not going to take long to get a given entry, given the 32bit address space of IPv4
If I am got a DoS attack or Spam, I need the IP to find out to whom I should file abuse complain.
Do we need to sanitize SMTP header too?
How about shuting down DNSBL?
However I'm not sure myself it makes sense. Some people will be identified by just a partial IP or even a partial hash.
Simply out of curiosity, what do you mean by that?
All my life experience and knowledge tells me it's exactly how you get around the law, unless court has its own agenda or strong bias.
Combine that with cross-site tracking and phone companies selling your info...
If IP addresses were as anonymous as claimed, there would be little incentive to save them in any long time storage.
Until I hear otherwise, I'm going to gamble that for now that's not the kind of reckless mishandling of personal information that regulators are trying to crack down on.
And you're probably right until they do otherwise.
The problem with badly-drafted laws is that they can be used to attack people who are annoying but who haven't done anything wrong... except for technically violating a law which is "supposed to" mean something else but which can be read to penalize some harmless activity the gadfly happened to engage in.
So, maybe you'll be patient when I'm not comforted by people telling me to not worry about it.
But we also have to have a certain pragmatism when deciding how to behave in a society with an impossible legal system. How much effort should I, as a developer or as a consultant to business owners or as a systems administrator, spend on purging IP addresses versus all the other things that need attention?
For that we look to how the law is applied in practice.
I was active on Slashdot back when the DMCA was first proposed and then fought its way into becoming law. There is no topic about which HN is as rancorous as Slashdot was about the DMCA. What does the situation look like now, twenty years later? Yes, there are and have been and continue to be abuses of the DMCA, but not at the internet-destroying scale that Slashdot predicted.
So I'm not going to tell you to ignore IP addresses in your log files. That's up to your judgement. But I'm going to ignore them in mine, until I see a reason to do otherwise, and when it's a topic of discussion with others, I'll tell them that according to a strict reading of the law, logged IP addresses may be a liability, but that there have been exactly 0 cases to date which have been only about some business having IP addresses in its logs for abuse and diagnostic purposes.
There's lots of talk about consent as a basis for processing. For lots of purposes "Legitimate Interests" is likely a better basis. You'll have to perform a legitimate interests assessment and be able to justify that the potential negative impact of your processing is outweighed by the benefits.
The ICO has a interactive tool for selecting a basis for processing https://ico.org.uk/for-organisations/resources-and-support/l... with links to more information.
This is great!
The Parties mutually agree that any and all disputes arising from or relating to this Agreement, including the interpretation or application of this Agreement will be submitted exclusively to final and binding arbitration pursuant to the Federal Arbitration Act. The arbitration will be conducted the state of Delaware or such other location as the Parties may agree, by a single arbitrator in accordance with the substantive laws of the State of Delaware.
Boom. No more pesky California law.
California does this for labor violations through it's Private Attorneys General Act (PAGA): https://www.dir.ca.gov/Private-Attorneys-General-Act/Private...
Glancing at the Wikipedia page for CCPA, it's possible that the CCPA is structured similarly--"Companies ... can be ordered in civil class action lawsuits ... subject to an option of the California Attorney General's Office to prosecute the company instead of allowing civil suits to be brought against it."
That said, I don't think California's PAGA has ever been tested vis-a-vis the FAA in the Supreme Court because it was only recently that they decided to strictly apply the FAA to employment contracts.
Any provision of a contract or agreement of any kind that purports to waive or limit in any way a consumer’s rights under this title, including, but not limited to, any right to a remedy or means of enforcement, shall be deemed contrary to public policy and shall be void and unenforceable
Why not consider this reply on its merits?
In other words, laws aren't code or mathematics. They're not pure exercises of abstract thought to be considered in isolation. Trying to treat them that way is going to lead to trouble.
It's fine for people to speculate about medical ideas, legal ideas, etc. Especially on a forum like this where there is no pretense that people are offering genuine legal advice.
After all, most of the time, people are writing about things they don't know all that much about.
Either wild speculation on medicine and law should be fine (this is my position).
Or, people should fear medical speculation as much as they do legal speculation (I think this is the more pathetic option).
carbocation is right about the underlying point, btw. This is an internet forum, the purpose is good conversation, and speculation is a normal part of conversation. It can of course be dumb and low-information, but it needn't be.
Judges aren't complete morons and will take a dim view of "hacks". There could be loopholes somewhere but you'd need a lawyer to spot them.
A similar thing happened with Perl's Artistic License. Its version 2 is basically also a lawyer-approved rewrite.
In other words, hackers, don't try this at home. There are professionals who can do this for you.
My gut feeling is this "legal hack" wouldn't work, because if it did someone would have used it by now against some other law that provides for damages, and someone else would have figured out how to neuter the hack. Which is to say, there's probably an existing law that prevents this hack from working. But you'd need a lawyer to be able to say whether that's true or not.
If my identity gets stolen, there is much more than $750 at stake on my end.
Are there any guidelines for determining actual compensation?
>Additionally, if a data breach occurs, the law permits consumers to recover up to $750 per incident (or actual damages, if greater).
So that might just be $750 as part of a punitive fee.
The idea with statutory damages is that determining the actual damages can be difficult and uncertain, so some laws allow plaintiffs to elect to ask for damages from a standard range, and the court will decide where damages should fall in that range based. It's basically saying "just give me about what is typical for cases like this one".
"Leaked (e-mail) adresses"
"Leaked nude photographs".
At least in the US, rest of the world isn't that shocked of our natural form.
I'd love to know what they mean by reasonable... I've seen some demos of tech that can do some pretty amazing things at de-de-identifying.
Who uses PII in test data derived from real customers? That's just an absurd practice to begin with, and no one who takes security seriously would even consider doing this.
As others have said, we've found a lot of smaller companies will test with production data because of their need/desire to move quickly. But we've also seen much much larger companies use production data in their dev/staging environments. Sometimes there will be production-like safeguards and security measures in place but not always. People shy away from practices that slow down development and testing.
We think synthetic data is the right solution for a few reasons. Most importantly, we believe it provides the right level security, while still allowing your team to be productive, i.e., your business logic and test cases still work. It also allows you to scale really easily since you effectively have a ruleset for generating data of any size. Finally, it’s a great way to share data throughout your organization and can help facilitate sales and partnerships. If you’re curious about scaling, check this post out: https://www.tonic.ai/blog/condenser-a-database-subsetting-to...
In summary, when doctors were testing a new electronic patient journal system, they used real social security numbers (our version of them). And just for kicks they tested in production, so the persons used got all kinds of prescriptions for stuff they didn't need etc.
I think culturally there may be a difference since I'm in a place where some data (addresses, phone numbers, ...) is public info, i.e. given your name I can get your address and phone number from a public DB anyway.
The flip side of the coin is dev databases not being representative of production, this can cause performance issues. "It works with 10 rows on localhost, why doesn't it work with a million in production?".
Some small companies will refuse to use generated data if it takes even a minute more to generate it vs import it from production. In the consulting world I’ve seen multiple examples of companies complaining bitterly about other security minded consultants efforts to improve security and privacy through even small amounts of additional development time.
Making a copy is probably more effort than most developers out in the wild are going to make.
As a result, you get shockingly mature companies that do exactly this obviously absurd thing because it's a ton of work to stop. Work with no obvious reason to do this instead of feature work.
Since this involves computers nothing above is a hard rule, but it goes along with my experience.
The other two categories specifically target companies that really should comply with this law - I assume the $25mil clause is there to make sure large companies can't loop hole themselves out of this somehow (offload PII responsibility onto a subsidiary or a "third party" that is incorporated in Bermuda by the owner of the company)
Throwaway for obvious reasons.
Also, I'm a bit annoyed at laws only affecting companies of a certain size. At some point right at crossing the line, there's a negative effect to having 50,001 users. (really I'm annoyed at how these data protection laws are implemented in general and I wish the discussion would be about that instead of being idealistic and only looking at the supposed intent)
Let’s do that, shall we?
Before GDPR there were laws in each European country protecting private data (GDPR is basically Sweden’s data protection law in that regard).
Not a single “poor company that will need comply” gave a damn.
Then GDPR was introduced, discussed, amended. Quite publicly. Not one of the “poor devs that would be hit by it” gave a damn.
GDPR was passed and companies were given two years to adjust their software/systems/business practices to comply. Hardly any of the “let’s have a discussion shall we” devs gave a damn until the last few months of the transition period.
And only when they realized that they had to actually do something, something they should have done literally years ago, we had (and still have) this fake outcry of “boohoo these laws make us work hard and do right things and we don’t wanna”.
Cry me a river.
Compliance/legal is a company risk and as I indicated in the challenger article here a few days ago, as an engineer I can advise on hat the risks are and the potential consequences of bad outcomes, as well as the costs to reduce them. The business decides what level of risk to take. I personally would have preferred a robust response to GDPR and thorough internal procedures, but it was not my call to make.
Of course, I personally believe that we humans should own our data and digital footprints, so I agree with a lot of the concepts behind GDPR and CCPA even if I do not agree with all and as an engineer may think some are ... silly/overzealous/misguided or what have you. Case in point: the IP tracking discussion above. If I hit your network, thats on me (barring externalities or bad actors, etc.). Retention periods and use definitions are fine, but a requirement to treat it as PII or other super sensitive data seems a bit much to the engineer in me.
It's true, businesses (or people who run them) will in the end judge the direction where the company will go, and their judgment is often worse that that of developers.
So yes, I would replace "devs" etc. with just "companies" in my comment.
However, the better integrated and communicated the company's goals and rationales are, the more aligned the judgements become.
GDPR was law two years before it came in to effect and everyone left it until the last ~month.
What about all of the state actors (and 'hackers') who are cracking corporations for data and building a massive database on everyone?
This argument only works if you feel the thing being outlawed is good (it is most commonly used in the context of privacy). To your statement I would respond the same way as I would respond to "When shooting people is outlawed, only the outlaws will shoot people": sounds good to me!
It is a state law, they can't hassle you if you're not Californian and do not service their target market. Most of America doesn't live there, and California seemingly doesn't want you to do business there.
You could say that about Europe to wrt GDPR but you should note that almost everyone is becoming GDPR compliant too because it’s a big market.
Silently redirect them to a similar-enough site run by a partner company that's based in another state/country.
It's now GDPR + CCPA, so you are cutting off EU and California. Probably, more to come.
For example, seems like LA Times does not block EU anymore.
Also, Mr. Mactaggart, what a guy!
There are limits on campaign contributions, perhaps there should be limits on individual contributions for these signature drives, which are essentially just large marketing efforts.
And just because this guy got x number of signatures, I don't see why he should now have the power to make compromise deals with the government.
This isn't my area of expertise so I may be missing something here.
Boohoo, poor devs need to finally pay attention to people’s private data.
For example, minimum bedroom sizes for rental units. Seems nice to have enough space to live comfortably, right? End result though is the $20M apartment complex has 35 units instead of 40, and is only built later when rents have gone up to make the project make sense financially, exacerbating a housing shortage.
Invasive and pervasive surveillance. Private and sensitive data sold wholesale not even to the highest bidder, but to anyone.
Hell, when news about NSA surveillance broke, it was a huge scandal that was the focus of attention of all media for more than a year. Now Facebook alone is reported to have the same level of maliciousness and willfull ignorance on a monthly basis, and it’s business as usual.
So yes, I don’t give a rat’s ass about the “poor developers” who couldn’t get their shit together and provide privacy and security to the common people. And who now pretend they are being unfairly punished by governments.
And yes, I’m a developer myself.
>If anything, startups benefit: they have less data and systems.
It's not about the absolute costs of regulatory compliance, which are relatively small. It's about the relative costs of compliance compared to the economic value of regulated activity. Google has roughly a million times more revenue than a ten-person start-up will. Privacy compliance is not a million times more expensive for Google than it is for the start-up. If it costs a startup a day of engineering effort to comply, and it costs Google ten million dollars, this is a relative business advantage for Google.
This is a pretty general pattern; established businesses get a competitive advantage from regulation, since it prevents competition from arising. If it costs $400 to get your setup inspected before you can sell lemonade that you make, this helps Nestle sell more bottled lemonade at the cost of your kids' lemonade stand.
Let's not gloss over the fact that the specific class of people who are "hurt" are the ones causing the hurt. If they only collected data they needed and secured the data they did collect, the regulation wouldn't be needed in the first place.
It's not mean-spirited to expect people who have widely profited from collecting bulk data to foot the bill for securing that data.
I see one clear net benefit: without regulation companies had zero care for private data. Well, time to suffer for it. I won't shed a tear.
Does the GDPR also have a lower limit like this? It should.
So if you make more than $25 million, OR your have more than 50k users or devices, OR you make more than 50% of your money selling data
You can end up with that many users on a side project all of a sudden if it gets posted to the front page of a site like this one.
Assuming you have any way to reliably identify which state your users are in -- which means we're back to "privacy regulations" encouraging companies to collect more data on their users.
That Google employee must be somewhat nervous these days...
>Derive >50% of revenue from selling PII
So if I forward all of the data to another company outside of CA, does my company count as processing data?
What if the code that forwards that data is written by another company and I'm just hosting it on my site? Everything goes through their code and I'm paid to just setup a website to host their code.
Maybe I do collect info in CA but I sell the data for $1, but the company also buys some consulting services for the actual price of that data that I'm selling them?
You are still processing that data. Part of processing that data involves you shipping it off...
> What if the code that forwards that data is written by another company and I'm just hosting it on my site? Everything goes through their code and I'm paid to just setup a website to host their code.
You are as responsible, if not more, in making sure that compliance is met. You are the one hosting the code. The data is moving through your servers.
> Maybe I do collect info in CA but I sell the data for $1, but the company also buys some consulting services for the actual price of that data that I'm selling them?
That's just being a jerk. But better hope you don't pass the 50k mark...
>The data is moving through your servers.
So if a random company gets breached, everyone involved from cloud providers to ISPs are also responsible because they facilitated moving and storing the data and they are just hosting code?
This is problematic. Cloud providers give you permission to publish code. I could position myself to allow another company to publish code on my popular website to collect data and my role is basically no different than a cloud provider. We don't have to agree that is what it's specifically for, I just need to give them access to upload their own code for whatever expensive fee.
ISP's aren't (supposed to be) "storing" that data. They are transferring bits between computers. You on the other hand are hosting a website with some sort of form that people input PII into. You are accepting that PII, whether or not it gets forwarded or not is irrelevant. You are processing it. So do your due diligence, contact your users and let them know what is going on, and speak with a lawyer for more information.
That's what cloud providers do! If there's a spirit-of-the-law that is supposed to protect them, this would be a good time to write that in!
This is especially true if you use a service that allows others to inject code into your code base. If NPM has a security failure that leads to a breach at a company, who is at fault? Both? Or only the company that chose to use the code? An NPM package might be processing PII after all. Does that mean NPM can never be held responsible for security breaches?
Secondly, your example would be backed up by historical cases and this law is brand new, so it is not clear. I'm not even sure how you guys can confidently argue that the new law ISN'T outright vague.
You could define in a hilarious amount of ways in which your chef can pee in the broth you ordered in a local diner. But it generally doesn't happen, does it?