If I understand this correctly, you basically store HMAC or password hash as a s...

CiPHPerCoder · on May 31, 2017

> If you (for example) encrypt first names of people (or any other data point that is not unique per entry) using this scheme, then HMAC will reveal all the rows that have the same first name. You can then use frequency analysis to determine with high probablity what the encrypted names are.

This is where things get difficult to explain, because most health care programs are going to care about compliance first and foremost. So with that in mind:

1. You probably don't need to encrypt their first name to be e.g. HIPAA compliant, but...

2. Using a very short Bloom filter increases the odds of false positive collisions. Combine this with Argon2 and aggressive rate limiting, and now you've frustrated frequency analysis and chosen plaintext attacks greatly.

Given the threat model that we've given (database is not the same machine as the webserver, and the database server is what gets compromised), I can't see a better solution.

danieltillett · on May 31, 2017

This would depend on how large the database was. If the names in the database were non-representative of the general population then frequency analysis is not going to help much.