Hacker News new | past | comments | ask | show | jobs | submit login
Trust is Fragile (37signals.com)
266 points by jaxonrice on Jan 16, 2012 | hide | past | web | favorite | 105 comments

If the 100,000,000th uploaded file was actually named Startup-Revenue-Forecast.ppt or 2011-Tax-Return.pdf I'm fairly certain that the name or contents wouldn't have been mentioned. I think the fact that it was cat.jpg spawned the idea of referencing it at all, and honestly if I were in their place I would have made the same joke. I think most people would have. But good on them for pulling a reverse Streisand Effect (http://en.wikipedia.org/wiki/Streisand_effect) which frames the discussion in a positive light ("how can we change and do better?") instead of a negative one.

They didn't mentioned the filename at first. What they actually said was : "And a Basecamp user uploaded the 100,000,000th file (It was a picture of a cat!)".

So people got the impression that they actually saw the picture. Hence the backlash.

Users of cloud services may be aware that engineers, sysadm or dba may occasionnally see their data but they certainly prefer not to think about it.

I wonder what they would have done had it been named "Basecamp Competitor Business Plan.pdf". Would have been awfully tempting to take a peak. Exactly why they shouldn't even be looking at filenames.

Ultimately a company storing files is almost certainly going to require its staff to look through directories, log files, database tables. And it is certainly going to require staff to have the ability, even if they never have to use it.

By giving them your files you are trusting them not to screw you over.

> Ultimately a company storing files is almost certainly going to require its staff to look through directories, log files, database tables.

Why? (Or at least, why should they see anything private in raw form?)

> And it is certainly going to require staff to have the ability, even if they never have to use it.


> By giving them your files you are trusting them not to screw you over.

By giving themselves the technical ability to examine private user data, they are making a strong (or indeed legally compelling, in some cases) argument for not using their service to store anything private at all. That's a death sentence for most cloud services.

We don't accept companies storing passwords in plain text. We don't accept companies transmitting credit card data in the clear, and PCI DSS requires quite strict controls on access to such data even when it's stored internally on the company network. Businesses dealing with sensitive data such as health records are subject to all kinds of regulations on the privacy of that data. Professionals dealing with privileged communications such as between lawyers and clients don't get a pass. Off-site backup services give all kinds of strong guarantees about the security and privacy of the data entrusted to them.

Why should we give a pass to anyone else, because they can't figure out how to set up a security system where only the end user can access the unencrypted version of their own data?

> We don't accept companies storing passwords in plain text. We don't accept companies transmitting credit card data in the clear

This is in case somebody gains unauthorised access to the data, not in case staff can't be trusted. For example, paying by credit card over the phone you hand over your phone number to whoever is taking your order, but if they were to enter it into a system that system then has to comply with regulations.

As to plain text passwords, again this is in case of the data being stolen. It's all very well saying "Google shouldn't store plain text passwords", but if Google as a company wanted to read my email, they could a.) just replace the encrypted password in my database entry with one that they can use or b.) place code in their login system that would log the plain text password secretly or c.) Go straight into where my emails are being stored and access them there.

How do you get around this and make it impossible for them to access your data?

> This is in case somebody gains unauthorised access to the data, not in case staff can't be trusted.

Actually, no. This is very much for both reasons. Part of PCI compliance is ensuring CC data is encrypted with a key that is partly known by a few people. So, let's say 3 people each knows a part of the key. The goal here is that in production, no one can access the key, but data can still be encrypted/decrypted.

To put it plain, it's not just a matter of encrypting and salting your CC data.

As for your "Google can just" remarks: yes. This can happen in many places. However, you mitigate the risk of this happening with procedures and security. I guarantee you that just working for Google doesn't give you access to the emails. I'd be surprised if the number of people that have direct access to emails at any time is in the double digits. Getting your code into production, I imagine, isn't just a cherry-pick.

You can't prevent people from having access to data you give them. However, they can mitigate the ability for it to happen.

> How do you get around this and make it impossible for them to access your data?

The same way everyone else does: encrypt the data using a secret known only to the customer, isolate internal systems that have access to the decrypted data so that no one person can ever access that data on their own authority, and ensure that whatever procedure does permit access with the requisite authority creates a robust audit trail. If security really matters, the whole system and its logs should be regularly audited by an independent party, too.

You have to have some sort of trust, because obviously if everyone in the company is crooked then nothing but encrypting everything client-side using auditable code is bulletproof. But you can certainly engineer systems so that access requires multiple people's consent and gets securely logged, which would eliminate casual snooping and provide robust evidence for legal action in the event of collective abuse.

nothing but encrypting everything client-side using auditable code is bulletproof

That sounds like Colin Percival's (cperciva) startup: http://www.tarsnap.com/

"The same way everyone else does: encrypt the data using a secret known only to the customer, isolate internal systems .., and .. creates a robust audit trail."

Who? Who does that?

If you give your data to someone, chances are they might look at it. If you store files, use email or surf the web at work, chances are the IT guys can look at it. Of course, they should not, and they probably have better things to do etc etc, but believing this will never happen just seems naive.

And then somebody loses their private key material and they look to you to fix their problem.

Client-side encryption with end-user key management is not yet practical for the average end user. Until it is (I'm hopeful we'll get there), the average service will require some sort of administrative back door that is controlled by process and people.

A friend of mine is working on that auditable client-side system you mention:


This is in case somebody gains unauthorised access to the data, not in case staff can't be trusted. For example, paying by credit card over the phone you hand over your phone number to whoever is taking your order, but if they were to enter it into a system that system then has to comply with regulations.

Not exactly. The purpose is to limit the number of people who have access to your credit card number so that if one of them uses it fraudulently, it's easy to isolate and verify the source of the fraudulent transactions. Yes, the guy taking the order over the phone will have access to your credit card number. Yes the waiter running your card at the restaurant will have access to your credit card number. If the system is well designed, though, no one else will, and it'll be easy to find the person to blame if fraudulent transactions are made.

I've never worked on PCI compliant systems myself, but I know many developers who have, and they say that the sysadmins take solid measures to ensure that no one, not even the developers, gets any data from a database that handles credit card information. Any data pulled from those servers is first sanitized to ensure that credit card numbers and other personally identifying information is removed. Credit card numbers are replaced with a "sample" number that can be used for validation purposes. Names other information are replaced with sanitized data that has the same "shape" (e.g. number of characters and identical punctuation) as the original.

The purpose of these regulations is to ensure that there's always a clear chain of custody over your credit card numbers. Preventing unauthorized access is only one part of maintaining that chain of custody.

Have you ever had to support a CRUD application?

Unless explicitly authorized by the customer, or for the purpose of providing the service, your staff should not be allowed to look at customer data, and what data they look at should be limited to what's necessary to perform their function.

If you do want the right to spelunk through customer data, you need to declare that in the privacy policy. If you declare otherwise, you're breaching the contract with the customer.

The problem is that incidents and attitudes like this make the market lose trust with the cloud services industry, which is poison to everyone.

I agree, however its somewhat disturbing how often I have to view customer data in my current job. I think the bigger companies that have good processes in place probably don't have to have people do it much but lets just say some companies that have older applications (like the one I work for) that have seen better days end up having people have to make a lot of manual database updates and also end up giving access to production DB to their developers in case of emergencies.

I'm not sure I understand what you're saying here.

The only contract with the customer is the privacy policy. The privacy policy is just a promise from a site to abide by certain rules. In the case that there is not a privacy policy, then whatever you tell that site can and will be used against you. From tracking cookies to the most sensitive of files, if you are providing information to a site then you have to assume that it will be used in any way the company sees fit unless they promise otherwise.

Ethically there may be different obligations, but to say that there is some implicit "contract with the customer" is simply not the case.

Since 1890 in the United States, the tort law has had concepts of invasion of privacy and breach of trust. Further on a state by state level there may be laws, such as COPPA 2003 in California, which required a privacy policy to be published. Canada and the EU have even more protective laws if you trade there.

I feel it is safer and more realistic to presume the first paragraph I wrote is the case and cover yourself with a privacy policy if you want to do otherwise as I mentioned.

As always, ask a lawyer if you want professional advice.

I came here to post exactly that. In most serious cloud teams you have very few select people with authorization to look at customer data (the operations team), and everybody else is outside of that group. When debugging the service, you have to pass instructions to that team so that in case confidential data is revealed, only they get to see it.

If a file storage company that claims to be protecting users' data isn't storing it in an encrypted manner that requires people to jump through all manner of technical and proceduraly hoops to get access to them, then they are failing quite badly.

Whatever encryption is there (and we don't know what they are doing in this respect), their staff who manage the systems still have access to look up the file name of the Xth file, or if they like to go snooping through all files.

This is true. The problem here is that they went looking through private user data when they didn't need to. If they were only doing it when essential, eg to debug a problem, people wouldn't be complaining. It's the fact that they did it without their being an urgent need to that has bothered people I think. What other trivial reasons have they used to look through peoples data?

If (in the course of essential sysadmin duties) I saw a file named "Basecamp Competitor Business Plan.pdf", or in fact any name, I wouldn't look at it because that would be wrong, no temptation.

Unfortunately, when around 1 in 3 sysadmins spy on their own colleagues (depending on whose report you read), it's apparent that not everyone has your moral fibre and something more than simply "trust the individual to do the job professionally" is called for.

Same here. There are countless sysadmins who would look though. Especially if they had a financial interest. There are even more who wouldn't look, but would just mention, "Guess what our 100 millionth file was named" to a Director, who also happened to have access.

All the data should be encrypted and only the user's password should be able to decrypt the data. That way you eliminate the problem of someone peaking into your files.

How would a 'forgot password' function work in this case? If you're using something like GPG to encrypt and the password is basically the passphrase for the key, a customer forgetting his/her password would become an irreversible event, and he/she would end up losing all data.

>>How would a 'forgot password' function work in this case?

You are screwed. Simple as that.

The filename could contain enough information to be a big breach of privacy by itself. Think "microsoft bankruptcy proposal.doc"; "google downsizing plans 2012/2.xls"; "ipad 3 presentation draft.ppt".

Temptation can be resisted, but in this cases you are in trouble just for glancing.

Very true. What if it was a screenshot of a secret CATalog? Or a scan of some secret Company Anonymous Transfer or whatever. So I hope they never took a look at the "cat.jpg" file because you just cannot tell what's inside.

Another interesting case would be "Plan to kill the President.ppt".

It's funny you mention that because I actually do have such a file! I also went ahead and used Basecamp too to get a sense of what could be improved or simplified. But of course I didn't even consider uploading that file and this was long before this whole debacle started. We need to trust services like Basecamp, Google Docs and the others... A lot. But we also need to be smart about that trust. A healthy distrust is definitely in order in certain circumstances.

I think the issue is access control. Clearly they can and do look at their customers personal data. Thats not very funny - even if it happens to be a picture of a cat.


That's kind of the point of log files. As a contrived example, what if users suddenly couldn't upload files that ended in .jpeg instead of .jpg. How would they be able to diagnose the problem if they didn't store data about image filenames?

Obviously they filter passwords and other sensitive data, but I think they should rightly have access to whatever they judge necessary to do their job.

There will always be people who have the ability to access data they are not supposed to, but in the end it comes down to who you will trust with your data.

To me, the transparency and contributions of 37signals qualifies them to have that trust. With that, I trust them to make good decisions about who they hire and what they store in their log files.

I wonder if parent was merely advocating obfuscating sensitive data so that engineers don't accidentally see things like "Downsizing-2012.xls". As long as the obfuscation is reversible, the data is still there for those who need it.

Of course, encryption per se is overkill for that. Something like ROT13 would do the trick.

If you're going to obfuscate reversibly, it is much better practice to use strong obfuscation and log (irreversibly) any time the raw data is accessed so there is an audit trail.

I would be happy if just the filename (not the extension) is at least obfuscated.

I can trust someone and still not be comfortable if he accidentally see that I uploaded "how to file a divorce.pdf" for example.

And now the problem is that files with "," or spaces in the name fail... The "reversible scrambling" proposal above might work (though not ROT13) - that way the data is there _if needed_, but it takes concious effort to take a look.

If in the process of debugging the "," issue, a set of files is uncovered (including "how to file a divorce", tough. Ideally, the usernames could be unscrambled separately, so at least there's no immediate connection to a single user.

A file based POST request always includes the name of the original file as it was uploaded from your computer, you could just as easily blame IE or firefox as you could a webserver log.

I doubt most web apps encrypt file names before they're written to logs.

I'd think the number of apps doing this is much smaller than those that don't, and even then only in cases where file names are replaced with hashes or GUIDs for directory reasons, not for the sake of information security.

Honestly, people are decisional if they don't think this happens everywhere.

I've seen tens of thousands of pieces of private data across all the companies I've contracted for. Data guys need to explore, they need to learn what types of customers use what type of features and why.

Heck, I talked to a guy online (didn't know his real identity, or I would call him out personally) that wrote a script that automatically checked his employer's database against outstanding warrants in the US (fuzzy matching first name, last name, city, age) and pulled in 2 to 3 times his salary just by the rewards. That is how bad some people are.

What you can trust is that a company almost certainly won't intentionally leak your data to the public, but rest assured that they do flip through it. Some awesome companies will obfuscate the email addresses or company names so that it is much harder to back calculate who owns what, but honestly unless a company is promising full encryption on their side I would just assume they can see everything.

If you want real privacy use encryption (or some other zero trust protocol) it really isn't that hard to use.

I don't think it has to be this way. We often run internal reports on usage of certain features, but it's always aggregated, and never looks at the individual data. I feel bad enough looking at a customer's account when they've specifically asked me to do so from a support request.

I would certainly terminate any account with a company that willfully was reading my private data and opening files for the mere sport of it.

There's this small, harmless incident from 37signals and then there is this attitude. Some questions:

* Should it happen everywhere?

* As a data guy, do you have professional obligations to uphold the privacy policy and operate within the law?

* What are the mechanisms available to the market and the industry to prevent the deterioration of customers' trust with us?

> Should it happen everywhere?

Yes. The market has spoken. People find terms of use acceptable, which includes looking at personal data. The alternative is to restrict your data team to the point where conversions would be half or a third of what they are. Are most people willing to pay triple just to remove the off chance that some random data guy comes across their info? Probably not.

> As a data guy, do you have professional obligations to uphold the privacy policy and operate within the law?

Any person, employee or not, professional or not, has obligations to uphold just laws; certainly including measures of privacy.

Here is a typical privacy policy:

"We use personal information in the file we maintain about you, and other information we obtain from your current and past activities on the Site, to provide to you the services offered by the Site; resolve service and billing disputes; troubleshoot problems; bill any amounts due from you; measure consumer interest in our products and services, inform you about online and offline offers, products, services, events and updates; deliver information to you that, in some cases, is relevant to your interests, such as product news; customize your experience; detect and protect us against error, fraud and other criminal activity; enforce our Terms of Use; provide you with system or administrative messages, and as otherwise described to you at the time of collection. On occasion we use email address or other contact information to contact our Users to ask them for their input on our services, and to forward to them media opportunities.

We may also use personal information about you to improve our marketing and promotional efforts, to analyze Site usage, to improve our content and product offerings, and to customize the Site's content, layout, and services. These uses improve the Site and better tailor it to meet your needs, so as to provide you with a smooth, efficient, safe and customized experience while using the Site."

That bottom paragraph is fully communicating the nature of the relationship. Outside any law that would render the above unlawful, it is well within the law for an employee to "SELECT * FROM users WHERE created_at > 2010-02-01" or to "SELECT * FROM todos JOIN users ON todos.user_id = users.id WHERE users.profession = 'developer'". There are perfectly valid reasons to do these types of things. Anti-fraud measures, site optimization, etc.

> What are the mechanisms available to the market and the industry to prevent the deterioration of customers' trust with us?

This is a problem of mismatched expectations and priorities. It's a lot like politics. In an ideal world a politician would be able to say something like 'I think the American people acted irresponsibly financing homes and that is a good part of the reason for the financial crisis' because it is the truth and it would help people in the long run to hear it, as well as help any policy formation in response to it. But practically they can blame others and get away with it.

Unless the industry is willing to educate politicians, site users, etc. There is no reason to go out shouting that this happens. It's already in the terms of use and the privacy policy. Do you think people want to know what Air Miles does with their data?

The only mechanism besides general silence (as well as inclusion in the privacy policy and/or terms of use) would be full, 100% truth when asked. But why make an issue of it? The netizens don't really care. If they did there would be competition around this angle of the market.

dhh hides behind: "I don't think it has to be this way. We often run internal reports on usage of certain features, but it's always aggregated, and never looks at the individual data. I feel bad enough looking at a customer's account when they've specifically asked me to do so from a support request.

I would certainly terminate any account with a company that willfully was reading my private data and opening files for the mere sport of it."

That carefully worded bullshit. Internal reports are not exploring. Internal reports are what you show at the monthly marketing or board meeting. C_Os get internal reports. Data guys test recommendation models. Data guys find out the interesting patterns to include in custom reports.

Also, His last paragraph is ridiculous. Obviously we don't read data for sport. In fact it is boring. You go through data for trends.

> checked his employer's database against outstanding warrants in the US ... That is how bad some people are.

Helping to serve justice is now a bad thing?

Those ends are fine, the problem are the means.

What's wrong with "the means"?

Breech of privacy/data protection laws in some countries.

Is it possible to find "wanted" people without breaching privacy laws?

Yes, it's OK for the police to do that. I doubt Joe Soap is allowed treat employee personal details in such a wanton manner.

Sure, that's what warrants are for.

In order to get warrant police needs some reason to believe that company database contains "wanted" people in it.

Obviously police does not have such reason and therefore cannot get warrant.

I guess I'm biased because my business makes me to deal with fraud on a daily basis.

I'm not really worried that 37signals are maliciously going through customer data, because I honestly believe they aren't.

However, I'm disgusted by the number of people in this thread that justify the violation of customer privacy because it's what's normal.

As an industry, we all face in our sales cycle the fear from customers that we will violate their privacy. Self-regulation by holding each other to account is the cheapest and best way to address the issue.

While I would be stupid to believe software vendors don't look at my data because I know better, that isn't my expectation.

It's not my expectation that my lawyer, my accountant, my doctor, my therapist, my social worker, or my librarian trade on or reveal or delve through my private information. That's why they as professionals are licensed and self-regulated by their professional colleges.

As information professionals, we should act professionally with information as well. This is not crazy talk. We also see credit card numbers and personal information stolen every month. Last year over 100 million credit cards had to be reissued due to data theft. That's why the card industry created PCI compliance to self-regulate the industry, as imperfect as it may be.

No, as information professionals, we should be building tools that enable users to store and manage their data privately, without asking them to trust some anonymous system administrator. Allowing them to become complacent and implicitly accepting of remotely-hosted services does users and society as a whole a great disservice.

While the tone of the post is fantastic, I can't quite believe that anyone would be as offended as they suggest. I would like to believe that people can apply common sense to this situation and realise that they disclosed 'cat.jpg' exactly because the name was entirely inoffensive and anonymous.

You miss the point. The point is with regards to privacy. If I'm paying them for their service, and I upload files, I can limit who sees them. If that can be circumvented, this is disconcerting. What if I had a file named "How to beat 37Signals.docx"? Or "Next iPad Specs - Official.pages" uploaded? And then someone reviewing the logs happens to see that. And they get curious.

The idea isn't that cat.jpg is bad. It's that over at 37Signals, someone was browsing the logs, reviewing the file uploads, and did see "2011 Financing Report for X Public Company - Unreleased" or something akin to that.

I understand your point of view. But the people offended by this are in the right. It's not what happened, but that it happened, and what it shows.

The idea isn't that cat.jpg is bad. It's that over at 37Signals, someone was browsing the logs, reviewing the file uploads

Rather, they did "SELECT filename WHERE row_num = 100000000".

Honestly, if you're concerned about something like this then you should not be using a third party solution to store your files. Of course 37 Signals can look at the names of the files you are storing- they could probably hide that information from themselves, but then they'll get a support request saying "we can't open file-x.jpg" and they won't be able to do anything about it.

Rather, they did "SELECT filename WHERE row_num = 100000000".

They're the ones who have repeatedly described it as "looking at the logs". That struck me as weird -- to have a log that ordinally attributes every upload -- however that's how they describe it and is hence why others describe it so.

Honestly, if you're concerned about something like this then you should not be using a third party solution to store your files.

I engaged in the prior argument, and there too this was the common last line of defense.

It misses the point.

Everyone knows that SaaS vendors can access your data and files, so it is bizarre that this keeps getting mentioned like it was unknown. Yet critical businesses engage vendors to hold their most confidential files -- the sorts that auditors grill them over and various bureaucratic organizations monitor them on.

Because they know, or at least believe and hope, that the organizations they entrust with their data use discretion, and have standard policies and standards -- if not actual data security and auditing controls -- to ensure that data is only used on a need basis. For instance for support purposes.

Writing a blog post that flippantly mentions a customer's data sends the wrong message. While we all know it is possible, it gives the entirely wrong impression to customers. Data security is the #1 impediment to the adoption of SaaS.

SaaS depends upon the trust of customers, and DHH is approaching this in the right way. It is quite a contrast from the many laissez faire responses on here.

Like you say, everyone knows they can access your files. It's naive to assume that they won't. Of course you wouldn't expect them to be doing this on a large, detailed scale but I think we all assume they occasionally see someone's file. The laissez faire responses wouldn't be the demise of SaaS, we're just being realistic about things. Trust is absolutely paramount when using these services but I think you're focusing on the wrong thing. Trusting that they won't see the files isn't the thing to trust. You trust that you have a better chance of being struck by lightning than of having an employee or attacker read and/or share the contents of that file.

I agree, Jason. While it's unavoidable that we at times will see things through log files in order to ensure the performance and uptime of the system, it certainly shouldn't be for these causes and it double certainly shouldn't be in order to reveal anything publicly.

>What if I had a file named "How to beat 37Signals.docx"?

They wouldn't have released it, and you would be a moron for storing that data on _their_ servers.

> Or "Next iPad Specs - Official.pages"

You'd be a moron for storing that data on their servers.

The existence of cloud solutions doesn't beget the use of self controlled servers for truly critical data.

The reality is that the vast majority of business data is not interesting to anyone but themselves and possibly competitors. And the vast majority of those businesses and competitors are well outside the scope of 37. Thus your data is relatively safe. If you're that worried about the hosting provider being able to view you data, host it yourself. Simple...

I think if you upload your data to some cloud service, you should assume that the employees of the company can look at it. The error here was some employee leaking information to the public (the name of a file).

You almost have my opinion changed to agree with you that the people who are offended are right to be offended. But honestly, isn't it a little naive to think that the company holding your data won't have some access to it? I tend to believe that unless it's on a machine you own then someone else can and will look at it in some way. They may not look at the contents but they most certainly will check out the file type, size, name, date created, etc.

Now if you upload "HowToBeat37Signals.docx" to Basecamp you should probably assume two things. There's the possibility, however remote, that someone not authorized will see it (that possibility exists on every farmed out service, no server is hacker proof despite GoDaddy's little badges) and if someone does see it and it gets leaked or used against you, you'll have a damn good chance of suing the bejesus out of them.

The word trust is the key word here. Whenever you use a service to store sensitive material there has to be some level of trust. I think it's a mistake to trust that no one within the company or as a result of a security breach will absolutely never ever see what you've stored. What you do trust is that the odds of that happening are supremely low and if someone were to see your data (at least within the company) that they won't use it against you or share it. History has shown us that no web service is 100% secure and reliable so if you aren't comfortable with your odds then you shouldn't use the service. I for one assume everything I've ever put online is not secure. I'm comfortable with my odds though and bank on the fact that no one will take something written or created by a nobody like me very seriously or care at all.

If the app is written appropriately for this sort of thing (see Tarsnap), then no, the company has no access to any of your data.

Can you actually make that work for a collaborative app?

User A uploads file_a.txt and you want to encrypt it. What key do you use for that? It can't be attached to User A (e.g. their password or password hash) only otherwise User B won't be able to decrypt it. How would you set that up in a way that's still reasonable considering Basecamp use-case? (meaning: one of their goals is to make project collaboration simple)

I dunno, maybe generate a keypair to en/decrypt the content, then encrypt multiple copies of that keypair with per-user keys? A key-getting-key or something like that.

There's probably some huge issues there, but it's a start to answering the question.

> isn't it a little naive to think that the company holding your data won't have some access to it?

That's a good question. Hopefully my answer does it justice.

First, having access to something and accessing something are two completely different things. I'm not suggesting that they should not have access to something they need to do their job. However, that doesn't mean we can't expect them to minimize the risk.

Next, you argue that someone not authorized will see the document, however remote. You mention suing, and while it sounds great, it's a long, painful struggle that I imagine isn't a quick fix. More importantly, would you knowingly hand over private data to someone who has proven incapable of keeping your trust? This is the reason 37Signals is jumping on this so quickly and doing damage control (and I say that in a complimentary way). It's not just their paying customers they have to concern themselves with, but also all the people that use their various services in one form or another.

Finally, you mention trust. In this case, you suggest trusting the odds. I'd prefer to trust that the company isn't banking on odds, and is instead actively working to mitigate that risk. Odds are a funny thing. I'm not under the belief that they can provide 100% security and privacy, but that doesn't mean I need to blindly accept failure.

> I for one assume everything I've ever put online is not secure.

But I bet you still actively work to ensure everything is secure as possible. You don't share your password to your bank. You won't hand over your credit card data, you make sure you are using SSL before making a purchase, SSH, different passwords. A variety of things to mitigate the risk.

Honestly, I think part of the reason people are defending 37Signals is that for many of us (myself included), we never really think of these things, and we see how easy it would be for us to make the same mistake. Instead, we should be focused on the fact that even for a company like 37Signals, they can make mistakes.

They can also admit to them, apologize, and work to correct the problem. We should learn from this, and try to improve.

You make a good case but I'm still torn. You may be partly right about why we're defending them but what's foremost in my mind when I defend them is how innocent what they did was. All they did was look at a file name of their 100 millionth upload. They were proud, wanted to brag about, and I really empathize as they meant no harm. I trust that they only did this a single time only to mark the special occasion and mentioning the name (or assuming the contents of, in this case) the file only happened because it lent itself well to the joke they referenced, otherwise I have no doubt they would have just said they hit 100 million uploads and left it at that.

When I talk about odds we're on the same page in a way. We absolutely should expect them to minimize the risk but let's not fool ourselves into believing that no one will ever take the opportunity to access one of our files. The best we can do is mitigate the risks and hope for the best. I don't feel that their pulling up the file name in this situation is a meaningful breach of trust. As programmers we like nice, neat, black or white answers, absolutes but in this case you have to take the circumstances and the company's track record into account. 37Signald has never shown itself to be untrustworthy and I really think this is much ado about nothing. I'm having a hard time arguing your point because I agree with you for the most part. I just think that this one instance is very obviously a special circumstance and any casual observer would certainly let it slide without a single red flag being raised.

I usually don't go down this road but I've yet to figure out what person or group made this an issue? Did 37Signals bring this up on their own? I know there were a few comments questioning them when the original post cake about but it didn't seem like anyone was that upset ver it to the point that a blog post was necessary. There are a lot of individuals who are just haters and take any opportunity to come out of the woodwork and point out any itty bitty flaw they see and make it into the end of the world. I hope that's not what started this. I also wonder if some competitor or "enemy" for lack of a better word decided to make this am issue. Or maybe it was really just some of their users in which case all I can say is, fair enough. I don't agree but a company does serve at the pleasure of its customers to a large degree.

I can respect your opinion, despite disagreeing with it. =)

> any casual observer would certainly let it slide without a single red flag being raised.

Casual observer, sure. But, I imagine it was more than just a casual observer making a fuss, as I suggest below.

> I usually don't go down this road but I've yet to figure out what person or group made this an issue?

From what I know of 37Signals, they aren't the type to bow to the pressures of haters. I imagine there was some real concern here brought forth by people not in the public eye.

Anyways, thanks for the good discussion.

This is why our policy at Fog Creek is to explicitly get permission from users before accessing their data. It's enforced by the sys admins (which we screen more extensively during the hiring process), who give temporary access to the person who needs it once the user has given their permission. When we're done, the sys admins remove access to that account again.

It's a pretty painless process (we have snippets to ask permission from the user and shortcuts to request access from the sys admins) and it helps prevent both willful and accidental leakage or modification of our users' data.

I have to interject here, because the theoretical example in this thread is what would happen if somebody had plans to compete with Basecamp in Basecamp, as some sort of hypothetical case.

I actually have a bug open in FogBugz called "Build a better FogBugz" where we discuss in some length its shortcomings for our workflow and how to fix them. It's not exactly an active project, but it will probably be when I get sufficiently fed up, and I would have no problem organizing its development in FogBugz.

The point is, if you can't trust the people who are writing your tools, why are you using their tools in the first place?

Is there anything stopping the sys admin from doing the snooping?

Ultimately, someone needs to have the keys, and they do, which is why they go through additional screening. Since there's a whole team, they would either have to collude or cover their tracks very well. Considering a major part of their job is keeping our data and our users' data safe, a breach like that would not be taken lightly.

So ultimately, it may be possible for one to snoop on our users, but it's much easier to trust (and keep tabs on) a small, well screened team than the entire company.

Speaking personally, I can say that I would (and do) absolutely trust my own data to our sys admins.

What additional screening do you put your system administrators through? Are we talking security clearances and background checks? Definitely curious to hear the specifics.

I've not been directly involved in the hiring of a sys admin, since I'm a dev, but I do know that we at least do background checks.

Can you ask? I'm curious to learn what these background checks entail. Also, if they actually do prevent bad apple's from joining, why not do them to programmers as well (given that they have the power to program backdoors etc into systems).

I'd introduce some more checks and balances if you can. Even Google got burned by a rogue engineer snooping on people, a lengthy interview isn't enough.

It doesn't really matter if you put someone through extensive interviews and then hire them, or if you use a third party who can't afford to lose its reputation, it all comes back to trust, which is why David's post is so on-point.

Quis custodiet ipsos custodes?

Do they have to go through this same process when looking at certain logging data, such as, web server GET requests? Those would show file names.

37signals really doesn't want to be the bad guy. They're not. And this whole thing is ridiculous. If you were to evaluate 37signals on a 0-to-9 scale, based on how "evil" they are, you might give them a 0 or a 1. What if the scale went the other way as well? There isn't just evil, there's apologist. And it too can lay the groundwork for unfruitful results.

37signals' target demo is smart, well-to-do, logical. They shouldn't have to apologize. As their logical demo, we should know better. We know that if the filename was MyBossIsAnAsshole.docx or even MyWeddingPhoto.jpg that 37signals wouldn't have had to think for a second on the appropriate thing to do. As logical thinkers, we know why cat.jpg is funny as it pertains to our demographic. We know that MyWeddingPhoto.jpg wouldn't be funny.

The whole burn 'em at the stake routine is asinine.

Well done. I wish more companies would own up to mistakes instead of weaseling out of them.

To DHH if you are reading these comments: Since 37signals is such an industry leader why not take this opportunity to release a "trust manifesto" that other SAAS companies can learn from instead of just updating your privacy policy? Present it as a few straightforward bullet points instead of paragraphs of legalese.

I'd be very happy if we come up with a good privacy policy that it could help others revisit theirs. Most policies, including ours, talk about shit people generally stopped caring about 5 years ago (what do you do with my COOKIES?!?!).

These days people care much more about the privacy of the data that they actively share through uploads etc and much less about the tracking. At least on apps like ours.

The real lesson here is that if you really want your data to be private, you have to take responsibility for encrypting it or not uploading it anywhere.

Even in the best-case scenario, at least some employees can access data as part of their jobs. This has been true of every job I've ever worked at.

This post and comments has made me reconsider the word "trust" entirely.

"Trust" is an emotion-laden, rhetorical word used by a someone who wants you to do something. "Just trust us."

Trust is not fragile, trust is an illusion.

Replace the verb "trust" with the word "assume" or "take a calculated risk" and you're closer to reality.

Instead of "trust us," how about, "Look at our record. Note that we have had not a single incident of data disclosure in 6 years. Decide for yourself if it is likely that we'll have one now, with your data."

Instead of "trust us," how about, "Think about our business: imagine the consequences if we were found to have looked at our customers' data, and see if that disincentive allays your concerns sufficiently."

Instead of "trust us," how about, "Here are the ways we are protecting your data. Consider whether they meet your requirements or not."

I'm voting "trust" off the island.

I think this post highlights a missing component of the IT ecosystem -- a professional code of ethics.

Many companies, especially large ones with lots of lawyers, have developed policies and procedures relating to what's acceptable and what's not. But most smaller companies and startups don't seem to have time to formulate these policies.

A professional code of ethics, specially with regard to privacy and user data, would be very useful.

Right now, most developers operate on a "do unto others" philosophy. While this may be good intentioned and work well a lot of the time, it's highly subjective -- as evidenced by the comments on this thread.

When I was in college and working for a bootstrapped startup I was handed an unprotected thumb drive with an excel spreadsheet containing all the credit card numbers, expiration dates, address, social security numbers, etc of all the clients (thousands) and told to take it home for when I was working out of the office. I was too ignorant to realize how terrible, and I'm sure illegal this was. Of course I never abused it but it definitely makes me wary of my data these days. I imagine this happens much more than people think. Don't worry, the company I speak of was localized and failed now I think. It's prudent to cancel your cards once or twice a year and be careful who you trust. Some companies value convenience over security and put way too much trust in the employees.

There is a strong case to be made that 37signals should have access to this data for debugging purposes or similar. And as others have suggested, customers trusting 37signals with data should expect this at some level, unless the customers are encrypting everything at their end first.

But should everyone in the company have that level of access, or should access be restricted to the minimum necessary? What I don't see in others comments here (except tghw's [1]) is any recognition of that. It's all very well saying you want to give your devs access, and that you can be trusted, but over time and as your company grows you're exposing yourself to the risk of a rogue operator. And it only takes one person to do something bad to severely damage the trust your customers hold in you.

It's a balance, to be sure, but I'm inclined to think a blanket "we trust our devs, so they have the access they need" could be exposing yourself to a large risk you don't need.

[1] http://news.ycombinator.org/item?id=3471338

FWIW, there's an Italian startup called Iubenda trying to do something around privacy policies, so that it's easy to have a good one:


Reminds me of my somethin' my ex-colleague came up with in a discussion over lunch.. Trust is a complex variable. it has some real part/value that both the parties involved can be secure about, and an(two??) imaginary part where both the parties have a guess about what else they will/can trust the other about.. In this case, i think the filenames would count as imaginary part. Not to imply, it's not private, but would be surprised if it had been in the terms of service..

HN Discussion of the incident a few days ago: http://news.ycombinator.com/item?id=3456819

Seems reasonable, but can adults stop boasting that they're behaving like adults? Unless 37 signals is being run by children, in which case: good job kids.

At any large organization with million of customers and public opinion affecting stocks, even mentioning to the that random people on the dev team have access to customer data can be pretty career altering. The fact that you're even looking at confidential information is generally highly frowned upon.

What if Mint.com celebrated their 1 "billionth" processed transaction by posting what it was? Wouldn't that cause outrage?

I routinely look into my customers' data and didn't really have second thoughts about it.

I even automated process of looking into customer's data. The main goal is to catch spam and scam and delete such accounts.

May be it's specific of my business (job board), but isn't scam and spam is risk in any business to at least a certain extent?

You can do whatever you want as long as it's explicitly stated in the EULA, hopefully that's the case in your situation.

I'm impressed that this mea culpa ended with a reasonably tasteful plug for the next 37s product release.

I'm not too troubled by employees reading file names in the logs, but for some reason it bugs me that the apology post included a promotion (a link) for "Basecamp Next." In this context it didn't seem necessary.

I was one of the people who was vocal about this being a serious gaffe when the post went up.

This is the absolutely the best response conceivable. Bravo!

People really got that upset over them knowing the name of the 100 millionth file? They're not going to look at the log anymore? I don't know how they operate on the server side of things but if no one is going to look at logs anymore then why have the log at all? I look up to 37Signals a lot but I think they were a little too apologetic this time. Why not apologize but explain that the log files tell you basically nothing about the contents of your files? How does anyone not make the connection between the cat joke and the file named cat.jpg? I mean, they even spelled it out in the original post! I'm not trying to be critical, I'm just kind of left wondering how something like this offended a single person. Weird.

Keep in mind that the original post said "it was the picture of a cat". It implied they looked at the content of the file.

No they didn't. They talked about how sharing pictures of cats was a running internet joke. All they said about that particular file was that it was named cat.jpg.

From http://37signals.com/svn/posts/3076-i-heard-you-like-numbers:

And a Basecamp user uploaded the 100,000,000th file (It was a picture of a cat!)

In the comments, they clarify that it was called cat.jpg, and that's how they knew it was a picture of a cat.

Yeah, I remember that. But they also mentioned the file name too so it wasn't really a secret as to how they assumed what was in it. I very rarely go into conspiracy land but now I'm wondering where the uproar came from? Does anyone know who started this backlash? Was it a group or an individual? I ask because for a fleeting moment I wondered if a competitor took their original post and used it as an opportunity to knock 37Signals down a couple of notches. I may be totally off in the deep end with that thought, however.

But they also mentioned the file name too

They only mentioned that it was from the filename in the comments. So initially at the start, it wasn't clear that's how they did it.

Sometimes I feel like 37signals should change the name of their blog to "Much Ado About Nothing"

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact