
Millions of accounts compromised because there is no specialised user database - andrewstuart
http://fourlightyears.blogspot.com/2015/04/millions-of-accounts-are-being.html
======
Spooky23
There is such a beast, it's called LDAP. There are any number of directory and
authorization services, and they have probably been around for decades.

There are dozens of directory server options, and probably a few dozen secure
authorization solutions, from Kerberos and SAML to proprietary solutions like
SiteMinder. Together, these solutions give you EIAM capability that does what
you described.

~~~
andrewstuart
My uninformed perception of OpenLDAP is that it is a large, generalised
enterprise class directory service that does alot of stuff. The post suggests
a minimal, lightweight user database that does nothing at all else - is that
the same thing as OpenLDAP?

~~~
Spooky23
Pretty much.

I wouldn't characterize the magical detection of uncommon access patterns as a
minimalist feature (it can be done, but not out of the box in a meaningful
way), but other than that, all of the stuff you mentioned is there.

Typically, you use LDAP as a directory store, and a dedicated authentication
protocol like Kerberos or SAML for authentication. You can also use LDAP for
auth if desired.

If you want to go deep on identity management, the NIST documents are pretty
much the canonical source of information.
[http://csrc.nist.gov/projects/iden_ac.html](http://csrc.nist.gov/projects/iden_ac.html)

In any case, don't reinvent the wheel. Just because bozos implement user auth
in a MySQL table doesn't mean solutions don't exist. It just means that they
are bozos. It's one of the reasons I disapprove use of SaaS apps that don't
support federated identity at work.

~~~
andrewstuart
Does OpenLDAP meet the stated need for a minimal solution that does nothing
else apart from the small set of required functions?

I get the impression that OpenLDAP is a big beast that does much, much more.

I'm wanting a fast motorbike and you're pointing at a 16 wheel heavy hauler
saying "see, same thing, the hauler has at least two wheels and an engine".

~~~
Spooky23
Note I didn't endorse OpenLDAP at all. Not sure if that's the best solution --
I'd probably steer towards 389 server or something similar with Netscape
heritage.

You also said:

"Another problem is that developers roll their own user and password
management systems and get things like salting and hashing wrong, making the
data vulnerable."

You're asking for a distributed, secure system for processing user data and
credentials. The most common systems in production globally for doing this
(Microsoft Active Directory) are based on technologies developed in the 80's
and only being supplanted in recent years. There's a reason for that, and if
it were a trivial problem, we wouldn't have pervasive solutions using 1980s
RPC in 2015 still running most businesses!

------
copsarebastards
I disagree. I think millions of accounts are compromised because users are
required give up too much information and can't verify that their user data is
stored in a secure way. The solution to this is on the client side, in the
browser, not in the server.

The core proposition of this article is that if we make it really easy to
store user data correctly, people will do it. This proposition is simply not
borne out by the data. Many parts of user security _are_ easy. Take password
storage, for example: it's so easy a caveman could do it. Every major languge
has a library that takes a password and a salt and a number of iterations and
runs it through pbkdf2 or scrypt. Many have these functions builtin. But do
people do it? No. It seems like once a month some company exposes X-thousand
passwords because they stored it in plaintext or MD5. And I bet a significant
portion of HN readers think SHA1 is a valid way of storing passwords, eager to
sacrifice security as efficiently as possible. pbkdf2 and scrypt aren't hard
to use, people are just idiots.

Making a specialized database just gives people another easy thing to not use.

Ultimately there's no reason to be giving your password to a service in the
first place. You only should be giving them cryptographic proof that you have
your password, not giving them your password. This is possible with zero-
knowledge password proofs[1], but it would require browsers and other client-
side software to implement a ZKPP algorithm. This would force servers to
either implement a ZKPP algorithm correctly or expose their incompetence to
their users. Similar algorithms can be hypothesized for many other user data
needs, but passwords would be an excellent start.

[1] [https://en.wikipedia.org/wiki/Zero-
knowledge_password_proof](https://en.wikipedia.org/wiki/Zero-
knowledge_password_proof)

------
andrewstuart2
The problem with this assertion is that this data still has to hit persistent
storage somehow. Should that be in an arbitrary binary format? How should it
be indexed for constant-time access? Essentially the problems that a
specialized user database would need to solve would mean re-implementing
exactly what generalized databases are built to do.

It's not a problem of _how_ it's being stored. It's a problem of _what_ is
being stored.

I would suggest that the world doesn't need a specialized database for this,
it needs killer libraries that make all necessary operations dead simple with
any backend. And they should be dead simple to find and install.

More fundamentally, though, I'm not sure this problem is entirely solvable.
Even with all the libraries and informational websites in the world, if
developers don't go out and look for that information (and find the right
stuff), they may never know they're doing anything wrong.

P.S. Hello again, alter ego.

~~~
wmf
I don't think the article is suggesting NIHing the stack all the way down. As
I read the article I was imagining a layer on top of SQLite that exposes a
very restricted API.

------
BinaryIdiot
This entry really boils down to separating user data from other data. But some
of the items I don't think work in real usage (or at least I would like some
clarification if possible).

> It should be accessible only via its specialised API which is designed to
> constrain the ways that it is accessed.

Constrained in what ways? Especially in management / support roles you may
need to do bulk actions against users (full listing, bulk changes, etc) so I'm
not sure what constraints would be compared to a normal database.

> Its API should have password salting and hashing built in.

I disagree; I think password hashing should happen before the data hits the
database.

> Its API should throttle access with some sort of algorithm designed to
> prevent downloads of large quantities of user data.

Does someone have an implementation idea for this? Depending on the type of
user data we're talking the data could be very small. So the throttling may
have to be quite significant but even then what's to prevent the attacker from
bypassing it by using multiple connections? Multiple threads that each request
a subset? I don't think you can get around those and still have a performant
system.

> It should encrypt data internally.

How? Does it encrypt using a passcode specified by the system administrator
(which means it exists somewhere on the system or nearby system making it
decrypt-able anyway)? Or does this mean encryption using the user's password
thus separating the data? If so how do you handle the cases when
administrators need to modify the user's information for support or other
reasons?

~~~
halostatue
I’m working on an application where we have separated the Authentication
concerns from the rest of the user data. When a user is registered, we
generate a UID and communicate that with the other services. We’re going to be
using common role terminology across the services, but Authorization is up to
each service in relation to those roles.

Our needs are a little different as well, because each service may need to
reach out to a third-party service and we are holding a bearer token for the
third-party service on behalf of the user. We provide that to the services
behind an auth middleware through a header.

(We had considered changing this from an Auth Service to a general User
Service, but I am strongly leaning toward having a separate User Service and
leaving Auth tightly constrained.)

~~~
bagels
Right, there's a few classes of data that can see that would do well to
separate from the database that a website may have unfettered access to.

1) Authentication data (login id?, password) 2) Personal data that is not used
in an online manner (depends on the site of course) (full name, email address,
etc.) 3) Data that shows up on the site

If this data is segregated in to separate silos with no 'read everything' apis
exposed to the web servers, you might stand a chance of not leaking all of
your users data when the web servers are compromised. This of course will not
help you if the authentication servers are also compromised.

------
voltagex_
Posted as a blog comment but I'd be interested in people's thoughts:

LDAP can be painful to work with, even when using such "industry standards"
like Active Directory

I think a specialised server (or even Postgres extension) could be useful but
couldn't you set up the following?

* Create a stored proc or similar that does this: SELECT COUNT(PasswordHash) WHERE Username = @Username AND PasswordHash = @PasswordHash

* Create a user account on the database that only has EXEC permissions, not SELECT or anything else

* Restrict the user account to only be able to log in from the app servers (or whatever)

* Restrict the DB admin to only login via a jump box / VPN.

------
bearclough
Welcome to the world of IAM (identity and access management). There are many
solutions to the above stated problem. If you don't absolutely need to store
it don't. That includes passwords, ssn, dob, or anything of the sort.

There are a ton of services you can federate with, it's easier for the user
less passwords to remember.

If you really want users to authenticate natively. Take a look at one of the
new-er players out there Storm Path. It's basically your IAM backend to-go.
Don't write your own security if you don't have to :)

~~~
aikah
> If you really want users to authenticate natively. Take a look at one of the
> new-er players out there Storm Path. It's basically your IAM backend to-go.
> Don't write your own security if you don't have to :)

But depend on an NSA aware third party to store your client's credentials?

------
viraptor
I wonder why is the call for a dedicated database, rather than for a
simplified API instead? Many frameworks provide it already, but base it on the
existing database which reduces the integration issues.

For example Symfony has FOSUserBundle [1] which provides nice support for most
listed things. Django comes with its auth module [2]. There are more examples
like that. They allow easy auth / user management without touching any of the
underlying details. And once you actually want to go the direction of
LDAP/kerberos/..., it's usually just a matter of switching the backend
implementation for something that already exists.

[1]
[https://github.com/FriendsOfSymfony/FOSUserBundle](https://github.com/FriendsOfSymfony/FOSUserBundle)
[2]
[https://docs.djangoproject.com/en/1.8/topics/auth/default/](https://docs.djangoproject.com/en/1.8/topics/auth/default/)

------
elchief
I agree with OP that the user authentication data should be segregated,
simple, and very well protected. It should act like a hardware security
module: it will authenticate users, but never give up their keys.

I tried to find decent hashing in OpenLDAP and Apache DS but didn't. Perhaps I
didn't look hard enough. Do they throttle by default?

Who's gonna make LDAP webscale and hipster friendly?

------
arielm
Great post, and something I've been thinking about for quite some time. Would
love to see an open source framework made available that simplifies many of
the nuances necessary to maintain a "secure" user database.

Don't get me wrong, this won't solve everything, but in my opinion it's a
pretty big step in the right direction.

------
SrslyJosh

      1. s/specialized/separate/
      2. Use SASL, or another authentication protocol to talk to your auth DB.
      3. Profit!
    

If you're trying to solve a fundamental problem like authentication, odds are
someone else has already spent a lot of time thinking up a good solution.

Re: LDAP

This may not be as good as a specialized solution, but it's probably a step up
from throwing all your data into the same MySQL DB. ;-)

It's not particularly hard, either, just different from what you may be used
to.

------
ridruejo
Wesabe is a good way to store sensitive user data

[https://github.com/wesabe/grendel](https://github.com/wesabe/grendel)

------
ams6110
Easier said than done. User data is needed in some way, shape, or form in
almost any transaction the system performs. Yes you can separate the
authentication credentials, and the sensitive attributes from the more routine
stuff, and that's a good thing, but things often don't break that cleanly.

------
jay_kyburz
Why is userdata any more important than the rest of the data in your database?

~~~
vacri
You can impersonate a user with their login data, and given that people
frequently re-use passwords, also do this on other sites.

------
andrewstuart
OP here. It's a fairly simple concept - any wizards with a few spare hours
willing to try to put a prototype together today?

From HN front page to solution within hours!

If it said "built with Golang" or "built with Rust" it would certainly hit the
front page.

~~~
nedmcclain
My beta-quality Golang version of this with pluggable backends, including S3:
[https://github.com/nmcclain/glauth](https://github.com/nmcclain/glauth) I
would love suggestions/help.

~~~
andrewstuart
Ned - would you mind replying to this with a list of the features specified in
the blog post and a YES/NO to confirm which are supported by your code?

thanks

~~~
nedmcclain
It should be accessible only via its specialised API which is designed to
constrain the ways that it is accessed. YES - API is LDAPS.

It should not provide generalised database query functionality. YES - it only
proivdes the subset of LDAP functionality necessary for authentication.

Its API should have password salting and hashing built in. NO - hashed but
needs salting&scrypt.

Its API should throttle access with some sort of algorithm designed to prevent
downloads of large quantities of user data. NO - todo.

It should encrypt data internally. YES (hashed not encrypted).

It should communicate only over encrypted connections. YES.

It should be distributed. YES.

It should not be run on any web server, should run "behind the scenes" and be
accessible only via its API. YES.

It should include triggers and alerts based on uncommon access patterns or
recognised nefarious access patterns. NO - todo.

It should have no other purpose. YES.

------
teen
"What developers need is a minimal, single purpose database specifically
designed for protecting user information and designed to move user data access
away from the rest of the application data to minimise the impact of access by
hackers." \- Yea this sort of homogenization of security worked great for
OpenSSL

~~~
andrewstuart2
I'd argue that it worked exactly as expected. The bug was found, fixed, and
patches released immediately. For most people that was just a package-manager
update away.

If anybody imagined that openssl had no bugs, they've obviously been proven
wrong several times already, and I'm sure they will be proven wrong again.

The point of a single library (or few libraries) is (at least) twofold. It's
much less likely to have defects, and when those defects that remain are
found, answering the "does this affect me" question is easier and prompts
quick action.

