I'm curious about one aspect though, have you put much thought into what happens and the side effects of doing something like key rotation if you're encrypted service is potentially compromised / leaked.
The second aspect I'm curious about, is you mention services in your general layer are not able to talk to the encryption service to decrypt data, but what about encrypting data? The reason I'm curious is the tricky part with anonymization, is I don't necessarily have to decrypt PII to unmask it.
I don't really know what you're service does, but say it's tracking location, and one of the pieces of PII is phone number. If I can go to the encryption service and ask for the encrypted version of a phone number I know, I then have the encrypted phone number that I can use to search the dataset.
The service acts more like a key value store (this is a simplified explanation, but for your questions it will do).
You give it a value, it gives you back a token, which you can later exchange for the original value.
This means the real value is stored in the encryption service, not in the receiving applications database.
This gives us the flexibility to perform key rotation (and even upgrade our ciphers as the crypto landscape evolves) at any time without having to worry about where the the encrypted value is being used, as the only data stored outside the service are opaque tokens.
As for de-anonymizing, the service is not designed to take an encrypted value and return its token.
If that were possible, we wouldn't have done a very good job encrypting it ;)
For de-anonymizing, the idea is to give the encrypted service the plain text and get a matching token. But then that will be more of a hash. If you are encrypting where all the tokens are different, you can't do a join or analysis. You can't for instance count how many unique phone numbers you have. If a user is using your app, how do they see their PI data?
> If you are encrypting where all the tokens are different, you can't do a join or analysis.
That would hopefully be part of the reason for doing it this way.
I once worked on a system where we encrypted most customer data on registration and took it entirely off line once a day (so new data was in encrypted form online for a day, and then was air-gapped permanently).
The fact that marketing etc. had to request reports to be run manually on the airgapped customer database was an important barrier that made them think about how they could meet their needs without it.
Sometimes, of course, they had genuine needs that needed access to the unencrypted data, but it was rare.
I'm a big fan of making it take extra effort to do these things - time and resources seems to be a far stronger barrier than requiring authorization.
You're correct that it does make certain kinds of analysis more difficult.
However that doesn't mean we can't ever get access to the original data.
Most of our current BI needs to can be met using the un-encrypted data, but for example, if we did want to answer your phone number question, we could craft a special purpose program to perform the analysis without compromising user privacy.
1. Select all phone number tokens
2. Decrypt
3. Produce counts (total unique, etc)
Said program would have to go through normal code review and approvals, and then deployed into the secure zone (so it could access the encryption service).
I'm curious about one aspect though, have you put much thought into what happens and the side effects of doing something like key rotation if you're encrypted service is potentially compromised / leaked.
The second aspect I'm curious about, is you mention services in your general layer are not able to talk to the encryption service to decrypt data, but what about encrypting data? The reason I'm curious is the tricky part with anonymization, is I don't necessarily have to decrypt PII to unmask it.
I don't really know what you're service does, but say it's tracking location, and one of the pieces of PII is phone number. If I can go to the encryption service and ask for the encrypted version of a phone number I know, I then have the encrypted phone number that I can use to search the dataset.