
Randomly-generated passwords still have to be legal strings - ikeboy
https://blogs.msdn.microsoft.com/oldnewthing/20160316-00/?p=93163
======
munificent
News flash: taking a randomly generated sequence of bytes and treating it as
correctly following some encoding may fail.

This is like saying "Why isn't this random sequence of bytes a valid JPEG
file?" If you want a random image, you generate the image randomly, _then_ you
encode it. Strings are no different — though the encoding is simpler.

~~~
mikeash
Unicode mystifies a lot of people. Many people don't even realize that a
sequence of 16-bit unicode value is UTF-16 (or UCS-2), let alone that not all
sequences of 16-bit values are valid UTF-16.

The password bit is really just the hook, the main info in this article is the
fact that these values are interpreted as UTF-16, and that UTF-16 has
requirements beyond "sequence of 16-bit values."

Which is kind of obvious, but still not well known for many.

~~~
sdegutis
What doesn't make sense though is why the password's bits MUST be a valid
UTF-16 string. Password comparison should just be comparing bits and checking
if they're equal.

Whether it may just so happen to represent real, correctly typed, English
words, or be a valid ASCII string, or UTF-16 string, or whether it was typed
it from your keyboard, or generated from a program, these should also be
incidental.

~~~
mikeash
If your passwords are entered by humans (which is usually what "password"
means) then you want to treat it as a real Unicode string so that you can
normalize it before you do the dirty work on it. Otherwise you could end up
with a situation where, for example, a user's password contains "é" and fails
when entered on certain systems because that system composes é differently
from the system on which the password was set.

------
eterm
I second the "hex encode it" solution, 256 bit of generated hex looks
something like this:

232e2252b8d5a294dd8564020f0cf609f511b3d196ed83030ce9615c6737f578

That's readable, typable, copy-pastable. Not very memorable but 256 bits is
rarely memorable.

It's still short enough to send around by normal messaging.

This is specifically about randomly generated passwords, this is not about
dealing with human generated "must have capitals, not start or end with a
number" etc rules.

~~~
Spooky23
Remember on windows you are effectively stuck with a maximum password length
of 15 characters in many scenarios.

And before you Google and say "No! It changed to 127 in Windows XP!",
remember:

\- Somewhere on your network, NTLM is being used.

\- Office 365 supports 16 max.

\- Lots of random apps won't work.

~~~
kyberias
I think you're wrong.

I login to Office 365 with a password that has over 16 characters. I have
similar length passwords in Windows networks that use NTLM. Clearly it works
just fine.

~~~
Alupis
Just a guess, but I've seen systems that just ignore the extra characters. VNC
is one, and there's others.

Example:

Your password is 12 characters on a system that says make it 8 characters or
longer. If you type only the first 8 characters of your password, it lets you
in. As someone else said, "silent truncation".

------
alyandon
Is there really any need to generate non-alphanumeric passwords as long as
they are of sufficient length? Sticking with plain old A-Za-z0-9 seems like it
avoids a lot of interop issues.

~~~
jfoutz
Every extra supported character makes the solution space much bigger. with 2
character passwords, your set is 3844 combinations. If you can manage to add
in "!@#$%^&*()", that goes up to 5184, a 16% increase in options leads to a
34% increase in search space.

No, it's not necessary, unbounded length is probably better. But it's pretty
nice to have a lot more characters, and ought to be an easy win.

~~~
Retric
Every _required charter type_ makes the search space dramatically smaller.
[https://xkcd.com/936/](https://xkcd.com/936/)

The goal should be long easy to type and remember passwords, adding an extra
few digit is worth far more than forcing passwords to look like: 2K#@1j*

Compare say a 20 digit numeric password vs an 9 digit [0-9] [a-z] [A-Z] and
special.

~~~
Ntrails
Usability is key.

Once you hit 12+ characters it is extremely likely that the password is
vulnerable to dictionary/phrase attacks combining whole words - or you're
using a password manager.

The xkcd is funny - but it's also a terrible password suggestion.

~~~
_kst_
Why is it terrible?

Certainly a passphrase consisting of 4 English words is going to be easier to
attack than a password consisting of a similar number of random letters. But
as long as you judge a password or passphrase by its information content
rather than by its length, xkcd-style passphrases can be just as good as
random letters, but are likely to be easier to remember.

Plug: I've written a random passphrase generator, inspired by xkcd's
[https://xkcd.com/936/](https://xkcd.com/936/) ("correct horse battery
staple") that picks random words from a dictionary, subject to requirements
you can give on the command line. It optionally tells you how much randomness
is contained in the generated passphrase (assuming the attacker knows the
parameters you used). Sample output (picking 4 words of 6 to 8 letters):

    
    
        $ gen-passphrase -v 4 6 8
        verity nonfat shunted specking
            27405**4
            5.6405e17 possibilities, equivalent to:
            58.97 random bits
            12.55 random lowercase letters
            9.90 random mixed-case alphanumerics
            9.00 random printable ASCII characters
    

The reported numbers will vary depending on which dictionary you use
(/usr/share/dict/words by default). In this case, the dictionary contained
27405 words of 6 to 8 letters.

[https://github.com/Keith-S-Thompson/random-
passwords](https://github.com/Keith-S-Thompson/random-passwords)

Of course this won't work for sites that restrict the length of a password, or
that require special characters.

~~~
carussell
> Why is it terrible?

Ars Technica published an article a few years ago about password crackers, and
one of the subjects they interviewed mentioned the XKCD strip in an offhand
remark that was flawed and hinted towards what was probably a poor
understanding of the subject matter.

Now whenever the XKCD strip is brought up, people who've been shown the
article make an appearance and and talk about the scheme being vulnerable
while using the phrase "dictionary attack" in a non-specific sort of way.

That's probably what's going on in the GP comment.

[http://arstechnica.com/security/2013/05/how-crackers-make-
mi...](http://arstechnica.com/security/2013/05/how-crackers-make-minced-meat-
out-of-your-passwords/)

------
teraflop
> We generate our password from a cryptographically secure random number
> generator. ... We found that sometimes (no predictable pattern), we have
> interoperability problems between systems.

If there _was_ a predictable pattern, that would be even worse!

~~~
thecatspaw
well, no. a predictable pattern would be "every time the password starts with
0100101 they are not compatible". its not about the number having a pattern

------
accounthere
It has happened to me that a registration form would not allow characters like
'<' which any keyboard will most likely have, while it allowed unicode
characters like (ಠ_ಠ)

~~~
_kst_
Sounds like a clumsy attempt to guard against HTML injection.

------
DenisM
It has occurred to me recently that 12 random words from a 2000 word
dictionary will give you 132 bits on entropy. More that you will ever need.

There's a number of advantages to this scheme:

\- It's easy to write down and put in a safe.

\- No funny characters required.

\- Not case-sensitive.

\- If a user makes a spelling error, we can correct it for him without any
compromise in security.

\- Guaranteed to be free from offensive words, unlike a randomly generated
password.

\- People are using words anyways, misspelled if necessary, whether we like it
or not.

We can also make it even more user-friendly by carefully degrading security,
such as trying common word substitutions for the forgetful users, or doing
some kind of middle-ground "remember me" that does not let you straight in,
but only demands first three words (to keep the family members at bay). Like I
said, this required careful calculations as to how much security is sacrificed
when we do that.

~~~
oniony
This was made clear many moons ago by the now famous xkcd strip
[https://www.xkcd.com/936/](https://www.xkcd.com/936/)

~~~
DenisM
My innovation :) is to prevent users from selecting their own words, so that
we can actually guarantee the entropy required. Otherwise a Markov chain will
tear it to pieces.

~~~
cuckcuckspruce
You mean like Diceware[1], which has been around since at least 2000?

[1]
[http://world.std.com/~reinhold/diceware.html](http://world.std.com/~reinhold/diceware.html)

~~~
DenisM
Good stuff. All we need now is to marry it with a password manager so that it
generates random passwords based on this one passphrase and then autofills the
web forms for you.

------
matt_wulfeck
The article mentions that hex encoding solves most of the problems but isn't
very space efficient. A good alternative may be base64k encoding, which was
proposed here on hacker news[0].

This gives you safe strings as well as space-efficient encoding. As a bonus
virtually all systems support unicode without any tweaking necessary.

I hope the author of base64k gets around to writing an RFC because it would be
a great standard to use.

[0]
[https://www.npmjs.com/package/base65536](https://www.npmjs.com/package/base65536)

------
oniony
All of these systems suffer from the differing password restrictions imposed
by the various services you need to register for.

• Some services require both upper and loweracase characters.

• Some services require punctuation characters whilst others prohibit them.

• Some services prohibit repeated characters.

• Some services impose a minimum password length; some impose a maximum
password length; some impose both.

Covering all these situations is nearly impossible.

~~~
eterm
The first line of the article makes it clear this is for passwords used in
_programmatic use_.

These restrictions exist for when humans are allowed to generate passwords.
This article is discussing systems where humans don't generate passwords and
don't in regular operation enter them.

Think of automatic password generation for a secure API endpoint, or automatic
generation of private keys secured by a password that typically will be used
in a system with it on the keychain but where the backup copies will be stored
offsite.

In that case it's acceptable to have a password which is both secure but it
has to be easy to type when required but doesn't have to be easy to type (i.e.
short) for regular use.

The internet of things means that more and more devices will be creating
"users" which are actually just other IoT devices. Those devices/device-users
require these sorts of passwords.

------
lyle_nel
By just looking at the title and before reading the article, I though they
meant that they are checking for illegal strings that are illegal to transmit.
[https://en.wikipedia.org/wiki/Illegal_number](https://en.wikipedia.org/wiki/Illegal_number)

------
13of40
Throw in the complexity requirements a Windows domain imposes on the password,
and this turns into even more of a headache. Of course you can just append A1!
to an otherwise sufficiently complex password to meet the requirements, but it
will make people freak out in the code review.

------
adzm
Ascii85 is a great compromise between binary and base64, which works well for
passwords.

------
JoeAltmaier
Another solution is to treat every password as binary. Text works. UUIDs work.
Everything works. But folks went down the text-password road a long time ago,
so we're stuck with this for now.

~~~
dchest
Passwords-based systems assume input, and most UI frameworks use text strings
for inputs.

Binary password is usually not called a password, it's a _key_.

~~~
smsm42
Exactly, so if they are not meant to be input, as the article claims, why are
they passwords and not keys? Why they are even checked to be valid unicode
strings? I know of no encryption algorithm that cares about key being a valid
unicode string. Looks like just laziness in distinguishing which data should
be binary and which should be Unicode string.

------
pmontra
I hope the developers of those sites that insist that passwords must be, let's
say, between 6 and 18 characters are reading this.

