Hacker News new | past | comments | ask | show | jobs | submit login

Use synthetic data instead? https://tonic.ai/

Also with synthetic data, there is an inherent trade-off between privacy risks and the usefulness of the data produced.

However, this trade-off can be of a different nature, resulting in advantages for synthetization, for example when protecting high-dimensional data.

Are there good ways to measure the amount (original) subject level data that can be extracted from a synthetic dataset, or calculated risk of reidentification (which is nice and easy for k-anonymity (if your assumptions are valid))?

Risk of re-identification is hard to estimate. It's mostly because you have to assume some state of background knowledge. I.e. what fields does the adversary even know something about.

If Im looking for a white male in new york city it's going to be harder to find my target than it would be if I also know their birth date and zip code.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact