
K-map, the weird cousin of k-anonymity (2017) - duck
https://desfontain.es/privacy/k-map.html
======
rntz
The article gives the hypothetical example of redacting a table row

    
    
      {zip: 85535, age: 79}
    

to

    
    
      {zip: 85xxx, age: 79}
    

on the grounds that there are fewer than k (for some tunable constant k,
larger = more anonymity) 79-year-olds in zip code 85535, but many more in zip
85xxx. However. If I see the second record, because it has been redacted, I
also know that whatever zip code the person actually had, there were fewer
than k 79-year olds in it! This may narrow the set of candidates considerably.

So it doesn't seem sufficient to count the mere number of people a redacted
row could possibly match. You have to consider the meta-level information that
knowing the row had to be redacted gives the attacker.

~~~
btilly
This is true.

In fact it would be better to not indicate that the data was redacted. Instead
of redacting it, change it to something else in the redacted range, preferably
more common. With no hint about which pieces of data were changed, the
attacker can't use what you describe.

Heck, merely including random small (unidentified) changes makes matching
much, much harder.

~~~
nixpulvis
Yes, but now when you find out that some important property you're testing for
was caused by a mold outbreak in 85535, you'll be disturbed to know that the
research was published for a subject in 85001.

~~~
oh_sigh
You could publish the criteria for how your data may be jittered, but just not
the specifics. Then, future users of the data could know to what extent they
can rely on accuracy of the data.

