

About 33 bits - microarchitect
http://33bits.org/about/

======
randomwalker
Hey, I'm the author of this blog. Much of my previous deanonymization research
has been discussed on HN; see
[http://www.google.com/search?q=33bits.orgsite:news.ycombinat...](http://www.google.com/search?q=33bits.orgsite:news.ycombinator.com)
Also, if you find the premise of the blog interesting check out the sitemap
linked from the page.

But since this post is about the About page, let me share a couple of lessons
I've learned from the blog, which has been more successful in communicating my
research than I'd dared to hope for when I started it 3.5 years ago.

1\. Those of us working on technical areas often struggle to explain our ideas
to others not as technical, in a way that avoids oversimplification and losing
essential meaning. Sometimes you'll discover an analogy or metaphor or phrase
that does both. Seize those chances, they're powerful.

2\. Coming up with a name is more important than you might think. If a good
name will make your idea or product even 5% stickier, it follows that it may
be worthwhile to spend 5% of your time just coming up with the name. One way
to do it is to be constantly on the lookout for a good name while you're
working on the product.

3\. If you're writing about something that has policy implications, and want
it to be read in Washington, it's hard but not impossible. Two important
requirements are to network and build up an audience — they aren't going to
read your blog just because it ranks high in Google searches — and to use
language that non-technical people can understand.

Happy to answer any questions!

~~~
chrisacky
I've been trying to think of a way of using typing cadence to capture "bits"
of information. Think you would have a good solution for that?

Take this scenario (and also check out the exclaimer at the bottom!).

All users on the earth type the same paragraph, or perhaps some password
(clearly the longer the most distinct fingerprint, but bear with me on this).

Based on this sequence of keypresses, I capture the timestamp that each
keypress is activated, and then the duration that each key is held down for.

Based on this information, how would you recommend, or suggest that a person
goes about detecting some unique fingerprint from these values.

I was thinking the best way would be to have each keypress some space point
and the duration held down a vector. And then if each use is entering the same
paragraph, the distance accross all of the vectors could be used to calculate
some identifying fingerprint.

Exclaimer: I'm absolutely not interested in the slightest in tracking users.
Every weekend I try and research something that interests me. Last weekend was
user fingerprinting based on the typing speed and cadence of users.

~~~
randomwalker
This is a well-known technology :-) See
<http://en.wikipedia.org/wiki/Keystroke_dynamics>

In the research community it's a proven and accepted concept. There are
products in the market that do two-factor authentication based on password +
keystroke dynamics, but I don't know how well they work.

~~~
chrisacky
Thanks! I was sure it must have been called something!

------
orthecreedence
I understand the Log2 concept of people able to narrow down something via
binary search, but I have a question: don't the "facts" about a person have to
divide the remaining population in half (or into smaller chunks)?

For instance if you know "Frank" doesn't wear a Rolex, that would not rule out
very many people. So statistically, it would probably be better to know if
Frank has red hair, as that could rule out a lot more people.

Also, let's say you have it narrowed down to four people, but the last bit of
information is common to all of them. You now have to get another bit, and
possibly another, correct?

EDIT: Felt like I didn't express my main point well enough: while you can
certainly narrow down people with "bits" of information, information is most
of the time not just 1 or 0 and can be fuzzy (or too common) to be useful in a
binary search, although with the right bits of information it can of course be
fruitful.

I'm really interested by this concept and also curious as to if anyone is
employing it on a mass scale.

~~~
jerf
"Also, let's say you have it narrowed down to four people, but the last bit of
information is common to all of them."

The definition of a bit is something which removes half the possibilities. If
you have 4 people and acquire a "bit" of information that breaks them into two
categories, one with 4 people and one with 0 people, you, by definition, in
fact have 0 bits.

Fractional bits are not only possible, they are by far the common case. With a
lg2 in the definition of the bit, it's pretty uncommon to have integral bits.

Critical insight: What we call a "bit" in a computer and a "bit" in
information theory are related but not the same thing. You can't have a
fraction of a bit stored in your computer's RAM, the words are meaningless. It
is best to simply flush your idea of what you think a bit is and start over
again from scratch when studying information theory, then when you are
comfortable with it the connections will become obvious. Starting from the RAM
side is actively harmful.

~~~
orthecreedence
This makes sense, thanks for explaining the semantics. So basically, you're
confirming that you'd need 33 "bits" of information, which doesn't necessarily
mean you can use 33 pieces of information because a bit is a lot more specific
than a one-off piece of information.

~~~
pohl
33 bits of information can amount to fewer than 33 pieces of information.
Quoting from the article: "...knowing your hometown gives me 16 bits of
entropy about you".

 _Edit: a relevant anecdote: when I was in high school, a friend went as an
exchange student to Costa Rica. As an experiment, I had her send me a blank
postcard with nothing on it but "pohl 68320 USA". It arrived in my post office
box without a hitch._

------
twiceaday
I think the premise is false. You would need about 33 _unique_ bits. I doubt
that you can prove the existence of a person-independent algorithm to gather
these.

~~~
randomwalker
See <strike>comment #12</strike> the comment posted at February 12, 2010 at
5:15 am in the blog post.* The term entropy refers to uniqueness.

As for the development of algorithms to gather those bits, that's what my
entire Ph.D. is about and what my blog is mostly about. This is what I've been
proving for the last 6 years.

*Just realized comment numbers are unstable. Bad wordpress.

------
jmatt
On anonymity and privacy... I always thought this was an interesting fact:

Birthday, Gender and Zipcode is enough to identify someone uniquely
approximately 85% of the time.

And a quickly googled source but the meme is older than that:
[http://godplaysdice.blogspot.com/2009/12/uniquely-
identifyin...](http://godplaysdice.blogspot.com/2009/12/uniquely-identifying-
people-by-birth.html)

~~~
reedlaw
Really? That would imply there's only 730 people living in a given zip code
(365 days in a year * 2 genders). Unless you mean birth date.

~~~
gxs
Yes, he meant birth date.

------
TamDenholm
Just out of curoisity, how many bits would it take to include all the people
that have ever lived? Also, how many to realistically cover the the future?

~~~
andreasklinger
Stop me if I am wrong but…

According to…
[http://www.wolframalpha.com/input/?i=how+many+people+have+li...](http://www.wolframalpha.com/input/?i=how+many+people+have+lived+on+earth)
<http://www.wolframalpha.com/input/?i=106+billion+in+binary>

37 bits

~~~
chris_dcosta
Isn't the answer always 42?

~~~
andreasklinger
If you take the timespan of HHGTTG into account yes. ;)

------
plq
> There are only 6.6 billion people in the world, so you only need 33 bits
> (more precisely, 32.6 bits) of information about a person to determine who
> they are.

I think you should count the dead as well. But then, 33 bits ~= 8 billion,
which should still be enough, I guess.

