Hacker News new | past | comments | ask | show | jobs | submit login
δ-presence, for when being in the dataset is sensitive (desfontain.es)
84 points by p4bl0 on Aug 12, 2018 | hide | past | web | favorite | 5 comments

For those wondering how this compares to differential privacy, the author of this post has also written a short nontechnical post about that: https://desfontain.es/privacy/differential-privacy-awesomene...

The theoretical guarantees of differential privacy (e.g. resilience to postprocessing or outside data, which removes the need for most threat modeling) make it the one to beat from a privacy standpoint, IMO.

Thanks, yeah, that's what I was wondering -- it doesn't compare them very much though.

The main complaint against differential privacy is that it's a very restrictive definition. However, at least in theory, it seems to often turn out that if an algorithm releases enough information to violate differential privacy, then it has released almost enough info to blatantly violate any privacy definition.

So the key question for any other privacy notion is to show circumstances where it can be satisfied easily, but DP can't (or DP incurs a much heavier accuracy loss).

Hi, I wrote these articles.

It's doable to compare syntactic definitions (k-anonymity vs. k-map vs. δ-presence, for example) and I tried to do that a bit. Differential privacy is something pretty much fundamentally different, as it applies to the mechanism and not the output dataset.

I'm trying to fix this gap: my PhD thesis is about rephrasing syntactic definitions in terms of relaxed versions of differential privacy. Typically, I'm trying to get to a point where I can say "hey everyone, remember this old definition that is easy to use but we don't really know what guarantee it gave? Turns out it's differential privacy with weakened assumptions, so here, have a formal guarantee for free". These are (I feel) natural questions that are still open problems. I will most definitely write about them when I have time (and solid results) =)

Sounds like a great line of research!

I’m well aware this is an example and extreme set, but I feel like the anonymous data set lost almost all value in the process and might as well not even include age as a metric as the ranges it includes are nearly useless.

I would think that anonymizing ZIP Codes would be far more useful to researchers in this particular case. Perhaps even inventing a system where nearby regions can be coded as such without giving away identifying populations or perhaps even physical sizes.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact