I won't post to say "I haven't looked at the contents of the
file, but it's named 'cat.jpg'" either. I won't even post to
announce that the one hundred millionth file has been stored.
[...] This is because I have no way to obtain that information.
The contents of files [...] is all hidden from me by Tarsnap's
strong client-side encryption.
Forgive me cperciva, but to me your post looks just like a giant plug for your own service. Client-side encryption is not warranted for everything, nor is it a reasonable goal for every app that shares data on the web. It's fine that Tarsnap does this, and frankly I would expect the same from a service like, say, DropBox - but it's not a reasonable expectation when it comes to the type of apps 37signals provides.
We'll have to disagree there. I'd be very surprised if they did any more than looking at their log files -- most likely using tail -f -- as the 100 million mark approached.
Admins will, and are completely expected to, look at the data - if only to make sure everything is working.
How does looking at individual files help to confirm that things are working? Once you're operating at scale, looking at individual files doesn't tell you anything useful; if there's a big problem users will notice it before you do, and if there's a small problem the files you look at probably won't be in the affected set.
Forgive me cperciva, but to me your post looks just like a giant plug for your own service.
We'll have to disagree there. I'd be very surprised if
they did any more than looking at their log files
How does looking at individual files help to confirm that things are working?
But I didn't write that post because I wanted to plug Tarsnap;
I wrote it because I saw the trust-is-fragile post on HN
the right response.
I'm inclined to agree with you. That's what I was getting at with my "even if 37signals doesn't want to offer cryptographically secure storage, they could at least remove the temptation to look at file names in log files by not writing sensitive information to log files in the first place" line.
http://cl.ly/0Y1M1D0z1g123I0S0u1R - cat.jpg ;)
"Hello, thanks for calling tech support. How can I help you?"
"I uploaded a file but it's not showing up in my account."
"What was the name of the file?"
"Ok, give me a moment to look at the logs..."
These are the kinds of questions that come up all the time in supporting a SaaS product with non-technical and semi-technical users. Debugging is not something only programmers do. Oftentimes bugs are found only after a client interacts with support.
Oh, another thing: deletes. At my last company I can't tell you how many times customers wanted us to restore deleted data. After many frustrating support experiences we implemented soft deletes for most objects. Hard deletes required written confirmation from the user and 48 hours to purge it from all backups.
Aren't there some instances where they'd be justified looking at data...
Attitude like yours gives us stupid privacy-violating terrorism and "protect-the-children" laws.
While it is reasonable to expect that they would contact the FBI in such instances, I would also hope that noticing such details illicits a "I shouldnt have been able to see that, so we're not doing enough to protect the privacy of our clients" response and corrective actions.
Surely there's some line somewhere...
edit: apparently HN thinks there is no line anywhere.
Investigating the contents of each safety deposit box, or even having the ability to do so is outside the scope of what a bank vault services are sold to do.
A bank vault, like encryption, sells protection. It is for all intents and purposes neutral. It can be used for good and can be used for bad. 95% of the time a bank vault or encryption is either being used for an ethically neutral or at worst ethically ambiguous use.
When any technology or product is used for bad it is a social failure. Crime will always exist. The quantity of crime committed can be mitigated by sound long-term policies that treat those causes that are statistically most likely to contribute to crime occurring in the first place.
Ever. I don't have the original source in front of me, but with enough bits, assuming there isn't some fundamental flaw in the encryption algorithm, you couldn't brute force a key before the heat death of the universe even if you recruited every particle in the visible universe for your computation.
I suspect he also means that they shouldn't be looking through files in the first place on the grounds that "there might be something illegal in them"
Was I plugging Tarsnap?
That he happens to be an expert in the field of digital privacy and has a way to prove that he is such an expert shouldn't be held against him.
Reading his post was like an "oh shit, he's right" moment for me and using Tarsnap as an example was key in helping me understand it.
Considering that in the discussion around cat.jpg, many people here were talking about a secure back-up service which encrypts all data at the client side with auditable source-code as if it was an unrealistic, unobtainable goal, I have zero problems with that.
Would that be worse than admitting/pretending they actually saw a file called cat.jpg? If there was such a file, it could have been a JPG for catalog of some kind, etc.
I think they are responding to people's first expectations and that there was actually a file with the image of a cat. I doubt it and think it was just an attempt at being funny which backfired and they felt they had to take responsibility for the perceived breach of trust and that any other explanation, even if truthful would have been seen as a weak excuse.
I don't see any compelling reason an admin should
have access to user data like uploaded files.
The overhead of decrypting an image is minimal compared to the latency introduced by a network fetch and by handling the rest of the request cycle in Ruby.
(And FWIW, people don't often have access to production encryption keys like this. Privacy is a big deal.)
Private keys are marginally more complex, but not much. If a password is sufficient security, then the private key can be stored remotely (in S3 or whatever) but encrypted (symmetrically) with a password.
So say Alice wants to share a file with Bob. Alice's client downloads her encrypted private key, and prompts Alice for a password. The private key is decrypted with the password and stored in memory. Alice then downloads the file she wants to share with Bob, and decrypts it with her private key. Then she downloads Bob's public key, and re-encrypts the file with Bob's public key. She can now send the file to Bob securely without the server being aware of the content.
Adding members to the group is trivial; just send them the group's key pair. Removing a member would be more difficult. Perhaps the most convenient way would be to add an additional layer of security on top (so members would need server access permissions, plus the private key). The only other option would be to create a new group and to re-encrypt all the existing files with a new key.
AViD's answer on this security stack exchange is useful.
Unless you are using a service like tarsnap, your admins can and will peek at your data. If you use a service like tarsnap, and you lose your password, your data is deader than disco. Pick one - security, or an admin who can save your account.
And while it's theoretically possible to develop a rich web app without seeing user data, it just doesn't happen. You need realistic data to do testing. The most realistic data you can possibly get is your user's data. Guess what 99.999% of websites use for testing?
If you have sensitive information, use good encryption. Better still do what the professionals (i.e. the government) do, and leave it on an internal-network only computer, in a steel reinforced room. If you're paranoid, lock the hard drives in a safe when you leave the room. And use encryption.
But don't make a fuss when the admin peeks at your data, in a semi-random way. If they are stalking you specifically, or leak any damaging information, that's another matter. But if you just don't trust them, don't give them your data.
There's a simple way to eat your cake and have it too, though: put a copy of your passwords in a safe-deposit box. Passwords don't strictly have to be private to protect you from would-be attackers—they just have to only be accessible to people who have absolutely no incentive to help any would-be attacker.
The whole business model of a safe-deposit box relies on other people not being able to get into them without the owner's consent—so if anyone, including the bank itself, took a peek in there, that would instantly lose them all the trust they had ever accrued as a safe-deposit-box provider—and thus a lot of money. They have much more of an incentive to keep your data private than they have an incentive to help those who want it, because keeping your data private is what keeps them in business. That's the meaning I was going for.
The rare triple-negative.
Bam! Quad-negative! Top that.
Humans can't be — and aren't — trusted to follow their stated intentions.
A commenter named Trevor even pointed this out to 37signals in their blog post as to how:
Did you know that Oracle provides Database Vault.
What it all allows you to do is set it up to prevent
event DBAs from viewing or modifying data.
Idea being, DBAs should be able to “administator” the
database, but should not be allow to either VIEW or even
MODIFY customer/employee data (e.g. credit card #, SSN ,
salary data, etc..)
There is another product Oracle provides which is called
Transparent Database Encryption . What it does is encrypt
your customer data on disk, but then when a database
select is issued – it unencrypts the data on the fly
without needing to modify your application code.
Unfortunately, no such products like this exists for MySQL.
Given the size of your company now and how much
sensitive customer data you are now storing, might be
worthwhile for you guys to seriously consider using
Additionally, every service requires some level of trust. How am I to know that the source code you show me is what you're actually using? (obviously client-side encryption services are better in this area). How do I know you won't sell my personal information, or abuse my billing information?
I plead guilty to taking advantage of the opportunity to mention my service (although most of my readers are already very much aware of tarsnap), but I would have written the blog post anyway.
that market is going to have to trust them to some extent.
Sure, but I still think there's a huge gap between "we don't log sensitive information" and "we have a policy which says that we shouldn't look at the data we've logged".
I think it might be lame if cperciva reacted to the 37s thing by changing his product, but he called it long before it happened.
Tarsnap's position here is assailable, and we will all benefit from the discussion.
How do you feel Colin's point is in any way disingenuous? Do you think he doesn't believe what he says? Because that's the only way I could see it as being "disingenuous."
Personally, I don't think it's disingenuous to opportunistically state what you believe to benefit yourself, assuming you truly do believe it.
I don't mind him wanted to do PR, but it does seem a bit distasteful. This was basically an ad couched in something that was supposed to look like content.
As one of the previous posters said, there are tradeoffs made when using a SaaS service and it is not possible to run a system like theirs while using strong client side, opaque encryption. Besides, comparing a backup system to a online file management system is apples to oranges.
I make lots of posts about security and cryptography. I happen to think that Tarsnap does things right; if I didn't, I would have Tarsnap do things differently.
I usually decide to blog about something based on (a) whether I think it's interesting, and (b) whether I think people will learn from it. (There are exceptions like calling out jungledisk for not fixing weaknesses in their cryptography, but those are rare.) The question "will this give me a chance to advertise Tarsnap" doesn't come into it -- for one thing, the vast majority of my readers are already aware of Tarsnap.
I'm imagining a group of friends and one of them mentions an interesting book he saw in X's house. The friends are immediately scandalized: what if instead of a book, you saw naked pictures of X's wife? Apparently you'll just blab anything you see, so you can't be trusted in people's houses anymore.
It's a completely innocent disclosure. That it would not have been innocent if the file had been different seems completely irrelevant. Either they would have been discreet in that case, or they would not have, but we can't tell which from this one instance.
A backup service that just needs to move around opaque blobs can and should encrypt its data, an application that needs to be able to react to the type and contents of the data that is stored, not so much, it seems like cperciva would know this more than anyone, so the post seems pretty disingenuous
Encryption these days only adds 1-2% extra load.
Regardless, even if the load was higher like it use to be before current modern hardware, you are still essentially informing your customers that "speed is more important than securing their data" - which is a terrible approach to take.
TL;DR: If you are given the privilege to maintain a customer data, it's your obligation and responsibility to do so with the most care possible.
Tarsnap can treat data opaquely and have the client encrypt / decrypt it, most web applications that arent just moving data around need to be able to access its contents to be able to work.
You'd have to contend with what is probably a large performance hit, and I don't know of any libraries that do this so you'd need to spend a considerable amount of time writing one. I suspect that this approach would only be practical for very simple web applications. For instance, an encrypted image or file hosting web application might be a possibility.
My own company will never store sensitive data with an outside firm like 37signals but that is only because we have a great IT staff. For companies that don't have an IT staff, outsourcing to 37signals makes sense and is probably worth the tradeoff to trust them with data.
Just as you trust the bank to guard your money, and many of their employees have access to your current account balance, the convenience of using these kind of services need you to trust the organization.
(sure, the bank could perform other tricks behind your back, like doing bad investments with the money you put in, but hey they'll get bailed out anyway...)
Luckily in the case of files you can easily do something about it, by encrypting them client-side or using a storage provider client that handles that for you.
cperciva is giving 2 examples: (1) use a service provider that doesn't require your trust. (2) limit the exposure of customer sensitive information to your employees that you must trust to keep it private.
Razvan Tirboaca 12 Jan 12
And a Basecamp user uploaded the 100,000,000th file
(It was a picture of a cat!)
Are you looking at your users photos?
Taylor 12 Jan 12
Razvan, absolutely not. The file was named cat.jpg and
that was logged, which was what we saw. We do not look
at user’s files.
There's no mention of what the filename was. Neither the basename, nor the extension.
My comment merely quoted a 37signals employee giving the filename, both basename and extension. You can believe them or not, but they did explain the situation, and they explicitly denied looking at the image.
So, no, I don't believe them at all when they say they did never look at the file. They say--with confidence--it was a picture of a cat. Sorry but going from "cat.jpg" to such a conclusion is IMO quite a leap. It's just three letters, it could be a CAT scan, a screenshot of the Linux `cat` command, three DNA nucleotides, a picture of a tiger, something else named "cat" or something related to but not involving cats.
I don't know if I see a filename like that I'd say "It was called 'cat.jpg', so probably a picture of someone's cat." because it can be anything that somebody named "cat.jpg" for any number of reasons and I won't know for sure until I looked at it.
And even then, just them looking at the filenames is not right. Of course I understand that if it was `company-passwords.xls` or something more sensitive, they wouldn't have said anything. But already before they could judge whether the filename was sensitive or not, there really is no reason for why they needed to be looking at filenames in the first place!
Sure some admin can always go in as root and look at everything, but you don't need to tie the proverbial cat to the bacon, by putting the filenames right up someone's face who really has no business looking at them since they're just collecting statistics.
Being in a different jurisdiction provides a small amount of protection.
Even though I completely agree that systems we build should have the least possible level of permissions required to do their job.
But the temptation to leave a backdoor open to peek once in a while, "just in case," is tempting and has it's own benefits...