Hacker News new | past | comments | ask | show | jobs | submit login

If I understand this correctly, you basically store HMAC or password hash as a separate field and index on it? That may work for SSNs (which are unique), but it won't work for other things you commonly need to encrypt, such as personally identifiable data (in healthcare scenarios).

If you (for example) encrypt first names of people (or any other data point that is not unique per entry) using this scheme, then HMAC will reveal all the rows that have the same first name. You can then use frequency analysis to determine with high probablity what the encrypted names are.




> If you (for example) encrypt first names of people (or any other data point that is not unique per entry) using this scheme, then HMAC will reveal all the rows that have the same first name. You can then use frequency analysis to determine with high probablity what the encrypted names are.

This is where things get difficult to explain, because most health care programs are going to care about compliance first and foremost. So with that in mind:

1. You probably don't need to encrypt their first name to be e.g. HIPAA compliant, but...

2. Using a very short Bloom filter increases the odds of false positive collisions. Combine this with Argon2 and aggressive rate limiting, and now you've frustrated frequency analysis and chosen plaintext attacks greatly.

Given the threat model that we've given (database is not the same machine as the webserver, and the database server is what gets compromised), I can't see a better solution.


This would depend on how large the database was. If the names in the database were non-representative of the general population then frequency analysis is not going to help much.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: