

Ask HN: How do startups restrict employees from accessing private user data? - GuiA

How do startups which hold private user data deal with restricting access to that data to employees?<p>I&#x27;m not talking about data that&#x27;s sensitive enough that it falls under a specific jurisdiction (credit cards, SSNs, etc.) - more data like private messages between friends, photos, and so on, which users consider to be private.<p>In the very early days of a startup, you don&#x27;t really worry about this because you don&#x27;t have the time, and you trust your cofounders enough to not snoop around on messages&#x2F;documents&#x2F;etc. that the few users you have consider to be private.<p>But as the employee count hits the double digits and keeps going up, you probably shouldn&#x27;t trust everyone to that level? Yet in most early stage startups, all employees have command line access to production application, admin panels, databases, etc.<p>How do you solve this problem? Do you add restrictions to which trusted employees can access production services? Do you encrypt the user data you store?<p>If there are any insights about how larger companies have handled this, I&#x27;d love to hear it. Surely at Instagram, OkCupid, Facebook, etc. the average employee can&#x27;t read the private messages of their ex-partner?
======
patio11
One of the first thing your DevOps team is going to do as you grow is make
sure that employees _don 't_ have unrestricted access to the production DB /
console access / root on the production web tier, since down that path lies
madness.

This comes down partly to policy and partly to tech. The policy, disclosed
early and often, is that misuse of customer data is an instant firing offense.
Google, Facebook, etc have indeed terminated people over this, often literally
count-the-minutes after the fact of the misuse became known to other people at
the company.

Tech-wise, it's spiritually similar to other security measures. You lock down
access on a need-to-have basis, you log the heck out of extraordinary requests
for access, and you audit those requests.

e.g. _Many_ companies will eventually develop a Use The Software As User X
feature. At some, this requires you to a) be logged in as a privileged
employee, then b) click to activate the feature, c) write an explanation why
you need access to User #12345's account, and d) checkbox that you have
receive #12345's consent for this. (I know some companies that skip D, largely
in B2C.) When you hit submit, that logs it to the DB and fires an email to the
audits@ email address, which goes out to 5 different people, or pipes "Patrick
just logged in as #12345 because [chasing down display bug -- customer reports
unescaped HTML in the message window, can't reproduce on staging or with own
account]" into your team's HipChat/etc channel.

~~~
GuiA
Thanks, that's super helpful.

While we're not big enough yet to have a dedicated dev ops team to implement
the first things you've described, the logging approach is totally realistic.
I especially like the idea of having a bot that monitors extraordinary admin
actions and pipes them into the group chat.

Thanks again!

~~~
caw
From a more technical basis (being a devops, and having worked in least
privilege environments), the big thing you need is centralized logging, and
from there you can do what you need. Whether it's syslog or logstash or
something else, if you get the logs in one place you can then filter over them
for instances that you'd need to alert via email or chatbot.

That works great and all until you realize only "sudo" is logged and not root
terminal actions, and even then root could delete any logs of its actions.
That's why something immediately shipping off logs is nice. I like "rootsh"
(available on SourceForge) for forcing any sudo users to either use "sudo" or
"sudo rootsh" to get a root terminal. You're not preventing anyone from doing
their jobs, you just have an audit trail. Someone asked me why you need an
audit trail unless it's to fire people for doing something wrong -- no it's
for root cause and preventing certain operator errors from happening again,
and in case of maliciousness from either an employee or someone impersonating
an employee.

The one other big thing to do is get rid of any shared accounts that can
access data. If it's AWS, gen up some keys for each user/application or use
IAM roles for the hosts. If it's Linux accounts, separate out the accounts. If
you must have a singular account for something, only allow sudo access to
switch to them. Going back to the previous paragraph, you'll at least get a
log of who switched to the shared account.

If you want to chat more about this, my email is in my profile.

------
jlawer
I've really only seen 2 core approaches to this:

1.) Free Access to everything. You trust everyone and hope it works out right.
This is the simplest solution, but offers nothing to prevent someone violating
that trust.

2.) Lock production down to a trusted team. Typically this is done with a full
dev / ops separation. Dev build the site and test, Ops run it live and have
access to the live database. You trust 3 people in ops fully and no one else.
Locking down production often entrenches rivalries between dev and ops, and
makes debugging performance issues a PITA as dev typically don't have access
to the dataset that is exhibiting the problem.

I've seen both work and both fail, and neither protects you from a trusted
employee screwing you over.

What I have seen work rather well though is audit logging. Logging all access
to the key systems and periodically (and randomly) auditing access. I've seen
this done at the db level, system level and app level before. Basically the
story is not that you will be prevented from doing something you shouldn't,
but if you are that you will be held accountable for it (typically on the spot
termination). However to be effective the company needs to be able to take the
high ground and be consistent. This won't work if someone (co-founder,
manager) is doing a similar thing and getting away with it.

As long as the data isn't high value (Credit Cards, etc) enough to make the
opportunity cost worth it, knowing you will loose your job provides enough to
make most hesitant to break the rules.

------
donavanm
Trust but verify, ie tamper resistant auditing, works for low sensitivity
data. It's also an investment you'll never outgrow. When you get larger, or
more sensitive, its time to implement something like a Two Man Rule. Take your
pick of implementations of Shamirs Secret Sharing. And lastly, my favorite,
operators of the service _can not_ access customer data without explicit
permission from the customer. A friend implemented this, Grendel, as an
internal service at Wesabi.

Personally I dislike identity impersonation schemes, even between internal
services. It leads to poor visibility and accountability. A proper
Auth/Authz/RHAC scheme where the customer expliceitly grants specific
priveleges to your internal service, and delegates, works better long term.

------
eshvk
Also, what if you are a data scientist? Yes, you could encrypt User IDs.
However, when even publicly anonymized datasets can be reverse engineered,
surely, a person in charge of feature selection and access to company datasets
could wreak havoc even under the restriction?

I am specifically curious how Facebook/Google/NFLX which are companies with
massive datascience teams handle this.

------
edoceo
My early stage employees and partners are too busy interviewing customers,
writing code a getting shit done (or commenting on HN) to waste the time
looking at inane things posted by GP.

