
PIN number analysis - Anon84
http://www.datagenetics.com/blog/september32012/
======
lifeisstillgood
On a slightly different note rated to the XkCD comic, Irish police in 2005
started a manhunt for a serial and prolific traffic offender - they knew he
was Polish, his name was "Prawo Jazdy" and he had committed over two hundred
speeding offences across the country.

Eventually they got a Polish officer through Interpol to help. On his first
day on the case he asked - why are you looking for a guy called "Drivers
License"? Cops had simply stopped ordinary traffic violations, and wrote down
the name from the international license that was where you usually saw the
name on national licenses.

~~~
shabble
Possibly due to disobeying the instructions signposted here:
<http://news.bbc.co.uk/1/hi/wales/7702913.stm>

The apparent number of people who request tattoos in foreign scripts without
bothering to check their translation is also surprising.

~~~
philwelch
To be fair, are there any Welsh people who don't speak English? Having road
signs in Welsh seems like a purely political requirement, not a functional
one.

------
furyg3
My bank issues a PIN which is not of my choosing, and since it's a smart card
to change it means actually going down to the bank to select a new PIN. I
think the majority of people never do this, so it seems like a great way to
counter people selecting "1234" or their birthday.

At some point I'd really like to see six to eight digit PINs, though...

~~~
lifeisstillgood
Apparently (urban myth time) the bank issued pin is a hash of your bank
account number plus some other details that are known without an expensive
core database lookup.

This way your pin can be verified without needing to call back to the central
bank core - the ATM cloud can just move straight on to transactions.

It may save a lot of cash it may be a myth - anyone got details?

~~~
StavrosK
The fact that you can change it disproves it.

~~~
benvd
If enough people never bother to change their PIN, it might still be worth it.

~~~
StavrosK
Yeah, but then you have to make the lookup anyway, to check if they _did_
change it. You might as well just authenticate. Unless the original PIN always
works, which is not secure.

------
brittohalloran
He quickly glossed over the fact that these are _computer_ passwords which
happen to be 4 numeric digits.

"Given that users have a free choice for their password, if users select a
four digit password to their online account, it’s not a stretch to use this as
a proxy for four digit PIN codes."

Though there probably wouldn't be drastically different phenomenon, I think
the actual PIN distribution would be noticeably different. Particularly 1234.
It may still be the top PIN, but I don't think it would have the same
dominance. People choose trymynewwebservice.ly passwords a lot more callously
than their bank password. I have to believe on average people put a little
more effort into disguising it.

~~~
rm999
I wonder about the 2580s (vertical numbers on a PIN pad). It makes perfect
sense for PINs, but not a computer keyboard. It could be people keeping their
passwords consistent with their PINs e.g. using their ATM PIN for their online
bank account password.

A lot sites (correctly) prevent people from using very short or simple
passwords, including four digit numbers. I'm curious what sites his database
comes from.

~~~
maxerickson
This paper(PDF) uses the RockYou password database to do a similar analysis:

[http://www.cl.cam.ac.uk/~jcb82/doc/BPA12-FC-
banking_pin_secu...](http://www.cl.cam.ac.uk/~jcb82/doc/BPA12-FC-
banking_pin_security.pdf)

(RockYou discussed here:

<http://en.wikipedia.org/wiki/RockYou#Controversy>

)

The patterns displayed by the codes in the paper are similar to the blog.

~~~
rm999
Wow, it's worse than I thought. Rockyou doesn't allow 4 character passwords so
they were extracting the numbers from WITHIN longer passwords. 'asdf1975'
would equate to a 'pin' of 1975, or 37489 would equate to two pins, 3748 and
7489. What a terrible assumption to build all your research on top of.

~~~
maxerickson
The paper is quite clear about the methodology used, they only extracted exact
4 digit sequences. I repeated that much of the analysis, I wanted to examine
the patterns, not just look at the frequency plot. I got the same ~1.7 million
sequences using a regex that excluded 5 digit sequences.

The blog is less clear.

------
orjan
Previous submission with discussion:

<http://news.ycombinator.com/item?id=4535417>

------
hayksaakian
Do you use your PIN number at an ATM machine?

~~~
nnnnni
ctrl-f atm machine

upvoted.

~~~
hayksaakian
After reading the comment on your LCD display and interacting with a GUI
interface? As if this was an IRC chat.

------
wereHamster
Please. Stop it already. The 'N' in PIN stands for Number. If you say 'PIN
Number' you're really saying 'Personal Identification Number Number'. Same
with ATM Machine, HIV Virus etc.

~~~
chayesfss
All of which everybody does so stop caring about it. My favorites this last
year include CAC card & SMS message.

~~~
PawelDecowski
SMS message is correct. SMS stands for short message _service_.

~~~
martinced
I think I think you are you are right sir right.

------
newishuser
A database of of 4 digit passwords is not suitable as a replacement for PINs.
Especially not for anything called a fucking analysis.

Banks don't let you choose 4 consecutive numbers or 4 of the same number
making his first conclusion completely invalid. A lot of banks assign pin
numbers now making the rest of his conclusions invalid.

I get wanting to analyze PINs, it's interesting, but pretending any old data
that looks similar will work is misleading, disingenuous, and half ass.

~~~
veb
> Banks don't let you choose 4 consecutive numbers or 4 of the same number
> making his first conclusion completely invalid

I'm not sure where YOU live, but there's a WHOLE would out there. In New
Zealand, you can choose anything you like on the number terminal they give
you. They try to educate you on a good PIN, but it's ultimately up to you.

------
borplk
Interesting analysis but I don't understand the obsession with randomness of
the PIN and such.

Unlike 80s there are many counter measures to defend against brute force
attacks. No one is going to sit down and guess/brute-force your PIN. Probably
not even your other passwords.

It's either going to be fully exposed or not at all, so how random or
complicated it is doesn't protect you as much as most people make it sound
like

~~~
demallien
Well, yes and no. I actually mostly agree with your point, but just to play
devil's advocate:

When we talk about secure passwords (Not PINs for the moment, I'll get back to
those in a moment), we're mostly worrying about how easy it is to recover
someone's password from a database once a system has been compromised. However
these days, we don't store the password, we store a hash of the password. To
recover the password, the hacker has to successfully 'un-hash' the hashed
password, which is done by applying a brute force cryptographic algorithm. The
computer detects hits by examining the frequency of letters in the output,
knowing that letters don't have an even distribution in passwords (or any
other text). Choosing a random password is actually useful in this case,
because even if the computer correctly finds the right key for de-hashing, it
doesn't recognize your password as 'de-hashed' and just passes right over it.

So, for passwords, it is fairly clear that a random password is going to be
safer than a word / name or other commonly used password constructs, in its
ability to resist extraction from a compromised database.

Can we make the same claim for PIN codes? It would be harder - you only have 4
digits to work with, which makes it much harder to determine if digits are
distributed in a pattern or randomly. You would expect a lot of false
positives and false negatives. Nevertheless, distributions in real life
numbers do exist - <http://en.wikipedia.org/wiki/Benford%27s_law> is an
obvious example, which leads us to the non-obvious conclusion that making PIN
codes longer probably leads to a lower level of security. On the other hand,
the scenario in which this is a problem is when a database is compromised, and
bank databases are amongst the most hardened targets on the planet, so the
increase in risk is probably negligeable.

~~~
VexXtreme
" Choosing a random password is actually useful in this case, because even if
the computer correctly finds the right key for de-hashing, it doesn't
recognize your password as 'de-hashed' and just passes right over it."

That's assuming someone is trying to brute force a system 'from the outside',
without having access to the list of password hashes for the system. In
reality such attempts rarely if ever happen and they are easily defeated by
using rate limiting and other techniques.

The problem arises when the attacker has a list of hashed password and is
running checks against that list. In those cases, the fact that a password is
random won't do any good as the program knows which password hash it is trying
to crack.

Furthermore, there is no "de-hashing". Password cracking software is actually
_hashing_ commonly used words, letter combinations and/or even random
characters, and comparing the output of the hashing operation with the
password hash that it is trying to crack. Cryptographic hash functions are
unidirectional and cannot be reversed, you can only try to hash raw data and
hope to produce a hash which matches the hash you're trying to "reverse".

~~~
demallien
You're talking about rainbow tables. They aren't used very much anymore, since
the salting of hashes has become commonplace.

As for your idea that hash functions can't be reversed, not so much. They
can't be _easily_ reversed, but that's not the same thing.

In reality, brute forcing is pretty much the only viable attack left now that
salting is commonplace. Still, if you have managed to get your hands on the
password table, you can brute force _without_ having to worry about rate
limiting etc.

~~~
Evbn
A quick search of HN shows how frequently passwords are in fact not salted.

~~~
demallien
Yes and a quick search of the evening news shows how frequently people are
murdered.

------
monkeyfacebag
This is an interesting look at the data but some of the inference isn't quite
there. The author uses the relative prevalence of the pattern 2468 over 1357
to justify the conclusion that people prefer even numbers to odd, completely
ignoring the pattern-based analysis he just used to understand the prevalence
of 2580!

------
eduardordm
Passwords and PINs are not the same thing (at least in most places).

In my company (credit card company) and in most competitors the PIN is a
random number generated when the smart cards are being written. Sequential,
repetitive, years, et al are all discharged. PINs can be used as a password
for transactions. Because my company focus on low-incoming families, this is
actually great, they don't need a phone or website to create/change passwords.
The problem here is delivering the password securely.

There are banks that do not use PINs at all, the password is stored in their
database. This is usually better because if you loose a password you can reset
it. This isn't possible using PINs. They are hardwired in the smart card and
cannot be changed.

PINs cannot be changed or chosen, if you can change, it's not a PIN, it's an
awfully insecure 4 digit password.

~~~
Evbn
Why awfully insecure?

------
Sami_Lehtinen
Because we all know this, it's just stupid to let people to select PINs. It's
much better to pre-assign completely random PIN. As most banks in Finland do.

Based on properly working random number generator, I would say that 1234 is
exactly as common PIN as any other PIN number.

------
shurcooL
I thought 1337 would be popular enough to be mentioned.

------
bsims
This was a great write-up, thanks for sharing. Similarly here is an article
about the most common passwords for 2012.

[http://www.cbsnews.com/8301-205_162-57539366/the-25-most-
com...](http://www.cbsnews.com/8301-205_162-57539366/the-25-most-common-
passwords-of-2012/)

------
tripzilch
So I wondered, and tried to google the number, one of his graphs shows a peak
for '1472'. Anyone got an idea what's special about this number?

And from the longer digit-sequences 292513 (#12) and 38317 (#15) and 42059
(#20). I didn't google those last two, maybe they're common US ZIP codes?

------
neil_s
Is there a link anywhere to the full set of totals, so I can lookup my PIN and
see how common it is? Surely that would be the most useful takeaway from this
article?

------
SolarUpNote
Laughing at the XKCD cartoon -- I remember having the same idea (and
realization) for having a license plate that's all O's and zeros.

