Hacker News new | past | comments | ask | show | jobs | submit login
I think tumblr has a huge security hole (pastebin.com)
165 points by adrinavarro on March 19, 2011 | hide | past | favorite | 198 comments

In the abstract, i understand, why anyone would post this. I get it, and maybe i'm getting older or something, but part of me equates posting this sort of thing with pointing at an acquaintance on the street with his fly down and having a laugh with your friends rather than telling him and having a chuckle after he's zipped up.

But the whole self-righteous and incredibly passive-aggressive "man, i think these guys have a huge problem..." followed by the "man, these guys need to shape up" and "n00b mistake!" are unproductive to the extreme.

I mean, find a bug, report it, move on. Maybe i've got some unrealistic notion of karma or general human benevolence or something, but it seems hard to believe that this is such a difficult path to take especially when nearly everyone commenting has to deal with bs of this sort day-in/out.

If someones fly is unzipped, I'd point it out because that's the sort of accident that can happen to even the most competent and discerning.

If someones pants are sagged around their knees, I expect them to have noticed this themselves, and by walking around in public they've accepted the possibility of ridicule.

I consider having a non-beta site to be 'walking around in public', and revealing code and internal data (however small) when there is a malformed request to be 'walking around in public with your pants sagged at your knees'.


That said, rappers don't have to buy belts and maybe the culture of the 'new economy' is such that some people do not have to value security.

You wouldn't take a photo and project it on a wall. You'd whisper into an ear.

This isn't a bug, it's a failure of basic security principles. Imagine if the super to your apartment complex accidentally mailed a box full of duplicate keys to a local methadone clinic. That's not an embarrassing mistake, it's a catastrophic error bordering on criminal negligence. Drawing attention to it is meant to not only deepen the embarrassment and thus encourage fixing the underlying problem so that it never happens again but also to let other people know (such as tumblr users) about the amateurishness of the tumblr operation and finally to encourage other developers out there to avoid making the same mistakes.

You lost me at "bordering on criminal negligence". They gave away a lot of details about how their internal systems are structured, but surprisingly little as far as actual usable data. Passwords can be changed, API keys can be disabled and regenerated, local IP addresses can be switched up. No user data was revealed. How is this even close to criminal, let alone catastrophic? This is pants-down embarrassing, at worst.

I meant in reference to the analogous hypothetical.

Tumblr isn't guilty of criminal negligence, but they are guilty of a very serious failing of basic security precautions. Luckily there are other layers of security at play preventing this from being a catastrophic disaster for tumblr. However, if a group of thieves break into your bank and drill into your vault you do not go home and rest easy because they only managed to drill through two feet of your vault's hardened steel and there was an entire 3 or 4 inches more. Less so if you'd done something dumb like leave the keys to the vault in a coffeeshop.

I think you fail to give them credit for what they're attempting. Security is the focus of many readers of HN, but Tumblr's focus is the user experience.

This isn't to say security isn't important, but they're rushing to make Tumblr as fun to use as possible so they can survive.

Yes, they received money. They also have monstrous growth. Now they can afford to expand the engineering processes beyond, "get it working" to "make it work really well and securely".

Good things are still to come from Tumblr so let's go easy on them when they use duct tape instead an arc welder.

Security isn't something you just bolt on after the fact, it's part of the design, and involves so much more than just code.

If they failed to take security into account in the early stages, never mind implement it at the beginning of development, then odds are they won't be implementing it effectively any time soon, especially with the rate at which they'll be expected to keep growing and adding functionality.

This kind of issue that they're showing now could (and probably should) have been detected and handled early on, even with a simple third-party code review.

And the fact that they are as big as they are, and growing as quickly as they are, means that they should have an increased sense of responsibility when it comes to security and protecting their users.

The existence of one bug doesn't imply complete disaster everywhere. It should be treated as an anecdote. Good science demands it.

Good science would also suggest Tumblr should get some experts to help them discover anything else that might be lingering, which they're planning to do. Much like a peer review process.

Your attitude is important for those in the security industry as it pushes things forward, but remember that not everyone has the time to spend on it that you might. It can either be an asset or the bane of your existence. As an asset, you get paid for the things you understand because others don't. As the bane of your existence, you fight society for not knowing what you know.

Tumblr is hiring. Maybe you should apply and help them fix it?

To be clear, I didn't suggest a "complete disaster everywhere". That being said, you can tell a lot about the state of the nation by something rather simple and isolated as what they've experienced here. There are some rather simple best practices that probably should be employed that apparently are not, and even a cursory review probably would detected it.

I was, more than anything, responding to the parent's post regarding "they can just add security later" idea.

It's true that I tend to work on projects where security is a huge deal (online banking, government, global video game services including in-game payments, etc). As the architect of these systems, a key part of the design is security, and while other projects don't have to be quite as diligent, that doesn't mean they should just ignore it altogether.

I'd also like to hope that my attitude is not just for those of us in the security industry, but for everyone making web-based applications.

Personally, I think any online service does their current or potential clients a disservice if they don't take security into account early on.

As soon as you take money from someone, I consider that to be a responsibility that has been accepted to not only provide the functionality you offer, but to do it in an appropriately secure manner.

It's the classic techie vs. sales guy argument; we don't want it perfect, we want it on Wednesday.

The problem is that if even simple and effective security is overlooked or not dealt with early on, you'll almost always be forced to accept a compromise rather than take the required time to implement it properly.

As to the job, I'm already quite busy, thanks. Between implementing Oracle clusters and my new startup that's just closing on our financing, I've got my plate full.

Besides, I hate PHP. ;)

I think this touches on PHP's real security issue.

It's easy to get going quickly but becomes an issue fast. Smart people tend to move away from PHP, but startups find this difficult.

Hiring is then an issue and the bad practices persist.

With all due respect, I'm not sure I buy that "hiring is an issue" argument.

You don't need someone full-time to help develop best practices, help design or architect code/systems, or to do code reviews.

Any startup that gets funding should, in my opinion, get a short-term consultant to come in, take a look around and offer suggestions and advice. Even if it's just for a day. These are resources that have been there, done that, and wouldn't be interested in a full-time gig with the company to begin with. Even if the company could afford them.

Hopefully this isn't too far off topic, but it's something that I see missing from a lot of clients that I get called into. (Not saying these guys haven't done this, either.)

I think there's a lot of value for a startup to validate their work with outside help, especially if they're relatively new to the game. Even if it's just pointing them to some articles or reading for them to follow up on, or just mentioning ideas of things they should look into; it can prove to be huge.

For instance, I'm currently mentoring a few developers/teams on a part-time, couple hours a week basis. Some of it is just being available on MSN to answer a quick question every now and then, other times it's doing a couple code reviews. Other times it's grabbing lunch/beer with them to discuss the concepts of things like how to implement continuous integrated testing, or managing other processes, or discussing new tech that they've heard of. For the most part, they're not paid gigs either. I enjoy helping people do stuff well, and the little bit of good faith help usually leads to some good future work.

Sometimes all it takes is asking the right question to get them to think about things in a different manner.

The last gig I got was for a major video game company. The job involved a .NET stack, which I knew nothing about and had never worked with, not even a little bit. I got the job despite that fact because in the interview I asked them general (admiteddly leading) questions, such as "how are you handling _____, or what is your plan for _____". Even though all of my experience had been in non-MS tech stacks, the technical director that was interviewing me was furiously taking notes. It was clear that they hadn't planned for some of the things I was mentioning, never mind thought of them. But once they heard the question, it was painfully obvious they should have. As a result, a 3-day consulting engagement turned into full-time for two years.

In my case right now, I've sought a mentor in some new tech I'm working with in my current startup. It's allowed me to hit the ground running, and take advantage of their experience. A half day one-on-one made all the difference for me, and will drastically improve the quality of what I'm doing.

I don't care what they're attempting, if they're hosting user content and storing user information then security needs to be a focus. Gizmodo certainly wasn't attempting security and that turned out quite well for all they're users didn't it? When you provide a service, no matter how trivial, there are basic commitments you sign up for, tumblr hasn't quite failed those commitments yet but they've come unnervingly close.

You are so right. this isnt about some kiddo being a smartass by not even hiding some of the keys. It's heroes like Him who save us from catastrophic errors bordering on criminal negligence. Guys like him will descent into the Fukishima reactors to save humanity. Songs will be written to remember his life. Personally, i farted.

I agree with you, but the problem is we will never know about the times when others may have found a bug, reported it and moved on. In this kind of situation, it's only the negative stories which will get out.

Just in case… I'm really really sorry in case I shouldn't have done this. I think I already said that, but when I saw this I wasn't as lucid as I am right now and just thought about dropping it here as nobody in my twitter TL would have done anything.

Yet I have worked in a lot of different environments with PHP over the time and this never happened to me (but I was close to). It's a big, big mistake, not just a tiny error.

TL;DR: Amateurish PHP developers at Tumblr fuck up; HN developers who don't know PHP that well make wildly incorrect assumptions about PHP.

People, I know it's en vogue to bash PHP (just wait, in a few years it'll be Ruby and Python - remember, PHP was once hyped, too, and now it's going in the other direction) - but if you criticize PHP, could you at least try to sound like you've actually developed in PHP for more than a week?

Because most of the negative comments here about PHP have absolutely nothing to do with PHP as such - the Tumblr error in question has to do with incompetent programmers. If you read The Daily WTF you'll know that incompetent programmers can screw up no matter what language they're using.

Amateurish? Incompetent? It seems a bit extreme to make those generalizations because someone made a typo in a PHP file that managed to hit production.

They made a typo. Then they didn't test on their local machine. Then they didn't test on a dev server. Then they didn't test on a staging server. Then they didn't test on a live server. At no point did they do any form of testing.

Judging by the mistake, I wouldn't be surprised if they edited the file live on the server.

A production config file is typically an exception to the rule. If you have a development team, you will probably not want each one of them to know what your production passwords are. If you have sysadmins, they will probably want none of the devs to know.

For what may be a thoroughly tested and properly deployed release, what happens when a sysadmin needs to update a password for a database? In many cases, without custom tools or a formal process, they'll just pop the config file open with vi and rsync the change out to their web cluster. I'd bet this is what happened in Tumblr's case. A sysadmin did it. Probably to avoid doing a full production redeployment for a simple config update. They will put resources into formalizing this now that they've been bitten.

In my opinion, of course, sys admins shouldn't be touching configurations that affect production code. You can have config files kept from the development team, but only allow access to the actual config file to a few select individuals if you need to. Sensitive data can be kept separated.

> For what may be a thoroughly tested and properly deployed release, what happens when a sysadmin needs to update a password for a database?

They coordinate the efforts with someone on the development team to deploy this. A sysadmin touching source code is as bad as a developer making changes to the networking side, especially if neither are talking back and forth.

This is how we work. Any changes made by networking are first vetted on by me, for example, for the systems I'm responsible for. I work with them to ensure that deployment is done at the proper time, and we handle any possible problems on our end. The networking team doesn't touch anything we work on, and vice-versa. Communication becomes key.

Coordinating between teams for any changes to a production config file is a hard sell, but yes, this is the only really solid way to make certain stupid mishaps don't happen. This is how we did it at my last company as well. In addition to using production branches.

Most companies pin down their processes in response to issues that pop up.

Yes, but of course this is all speculation: we don't know how this change managed to make its way to production. The fact that it happened is unfortunate, but that doesn't mean the entire engineering staff is incompetent.

Of course. =) But it's fun to speculate while I'm busy doing chores around the house. As for being incompetent, I didn't mean to imply that. Rather, my focus was on the fact that the mistake wasn't just a typo. There were a LOT of mistakes that were made.

Is it possible that someone accidentally pushed a file live?

Of course. =) I don't know what their deployment strategy is like. My point was, they made a series of mistakes, and it wasn't just a typo. Though, if they did push a file live, they would still have skipped numerous steps, like testing.

I haven't read the update, so I really don't know.

I'm not suggesting they are morons. Rather, they made a series of mistakes. Every programmer makes mistakes. The key is to have a strategy to catch mistakes. Something as simple as a staging server where something like this could have been caught, and having a deployment strategy where you cannot get past staging without going through deployment is a good idea.

Maybe I went a little overboard with those words. However, it's not so much the typo; it's the fact that the typo caused passwords to be visible.

Again though, PHP apps storing configuration info in PHP is a fairly common practice. If that's the way you store that information, any typo that takes your configuration file out of "PHP mode" is going to leak data. Before today, I'm sure a lot of PHP coders didn't consider that when dealing with configuration files for applications: they chose to design that way because it's simple to deal with.

> Again though, PHP apps storing configuration info in PHP is a fairly common practice.

This isn't a problem with PHP. It's a problem with ANY app storing configuration options inside the actual language. It would be like storing vhost information in C and forcing you to recompile all of apache to get a new site up.

Yes, config options are easy to add to interpreted languages. I see it happen all the time, and not just in PHP. Perl, Python, Ruby.

The problem is that to change the config options, you are essentially changing what amounts to the application, and if you break your config, you break your application.

The standard way of storing config options in PHP is with .ini files. At least, in my circle. Sure for quick scripts, we throw it in the actual code, but whenever we have to manage more than a few config options, it get's moved out.

I realize I'm expanding on what you said a bit. =) I just want to make it clear that this isn't a PHP problem. It's a basic programming problem.

The key isn't so much that the configuration language is the source code language, but that both are a template language. As far as the parser is concerned all three are written interchangeably in an HTML superset that's sent down the wire raw by default, no matter where they're located or what called them. PHP gains its low barrier to entry partly from the fact that HTML becomes valid PHP just by changing its extension, but that useful equivalence also cuts the other way.

Many other frameworks use the source code language for configuration without risking this particular error, because they keep templates separate. (In Django, for example, templates use a different language and are read out of template-specific directories.)

It's not even a problem unique to configuration files. A disclosure of source code would have occurred by replacing the first character of a PHP source file. The fact a configuration file was involved makes the effect both more severe and less likely to be caught ahead of time. More severe, because it discloses passwords, rather than just proprietary code. Less likely to be caught, because standard testing methods tend to rely on a staging or test config instead of the production one.

There are other posts disagreeing over various single villains, but a wider view is that Tumblr was bitten by a combination of a single-character typo, an otherwise useful PHP language design choice, the common and otherwise harmless practice of configuring via a programming language, and the blind spot typical test suites have for production-specific issues.

A very good point

Sure, but if they'd made just a tiny effort to test their code before releasing it they'd have caught that bug instantly. We're not talking about time-consuming tests here - we talking about loading the page in the browser once. If the PHP code is showing, you're doing it wrong.

Oh, totally. This should have been caught very quickly.

Any programming language's flaws can be made up for by sufficient skill and discipline on the part of the developers. That does not make those flaws any less real.

Mirrored to http://codepad.org/UP3FxRNv

Just in case.

edit: oh man. Check out what Google has on this[1]

edit: in readable form on github gist[2]

[1] - http://www.google.com/webhp?hl=en#sclient=psy&hl=en&...

[2] - https://gist.github.com/29c5c5970d1f3313abd1

And it looks like the hole has been there for at least a month:


We saw this from facebook few years ago. Now with tumblr. Is there something at core of php that makes this inevitable? I ask this as a concerned php dev(and not out of snark).

I think PHP has so many gotchas, so many ways to trip up even seasoned developers, that it's inevitable to miss something.

Putting passwords in the php file? That's not a seasoned developer in my mind.

I'm going to disagree with you. The problem with PHP is that there's no easy way of sharing state between requests.

If you have a separate (ini-style) configuration file, every time you get a new request, the file will have to be read in from disk and parsed. On heavily-loaded web servers this can be a significant performance issue.

Configuration stored in a PHP file will be cached by your opcode cache and so doesn't incur any per-request parsing/reading overhead.

The problem here is not that Tumblr stored their configuration in PHP. The problem is their lack of testing their changes.

(As I mentioned elsewhere in the thread, this particular nasty issue can be solved by simply enforcing that every .php file begins in '<?'.)

> If you have a separate (ini-style) configuration file, every time you get a new request...

it will just pull it from the cache, not needing to be parsed or read from disk.

Configuration stored in a PHP file is bad. Changing working code simply because you want to add a new slave is a horrible idea. It would be like having to recompile Apache from source every time you want to add a new vhost because all the entries are written in C.

Interpreted languages like PHP, Python, Perl, etc all make it easy to put config options directly into the language. When your app is small, it's fine. But as it gets larger, move it out.

But yes, that they didn't even catch this is in testing is the real problem.

> it will just pull it from the cache, not needing to be parsed or read from disk.

I don't believe PHP's ini_read caches anything. Sure, your OS cache is going to have that file in, but PHP still has to do the fopen(), fread(), parse, fclose() dance on every page hit. This does not happen with an opcode-cached source file.

I have actually benchmarked this. Amortized across billions of page hits per month it produces a notable saving.

Kudos points out that you could use APC - this is absolutely true but IMO it basically equates to doing the same thing in a more complex way.

> Configuration stored in a PHP file is bad. Changing working code simply because you want to add a new slave is a horrible idea. It would be like having to recompile Apache from source every time you want to add a new vhost because all the entries are written in C.

This is a false comparison between interpreted and compiled languages. Not to mention that plenty of C programs use #defines for configuration. Varnish even translates its configuration language into C and dynamically loads it as a module, which provides significant flexibility.

In a dynamic language, and where you trust the person doing the configuration, there's no reason why your configuration shouldn't be in a source file. Both Django and Rails do it this way.

"In a dynamic language, and where you trust the person doing the configuration, there's no reason why your configuration shouldn't be in a source file. Both Django and Rails do it this way."

Not true. In Rails, sensitive information like database passwords or AWS keys are not stored in the source, you store them in configuration files, YAML files by default. It's also widely recommended that these not be checked into version control.

My bad - it's been a while since I touched Rails and my memory was rusty.

> I don't believe PHP's ini_read caches anything.

I wasn't referring to straight PHP. There are multiple ways to cache the config file's contents so you don't have to hit the disk to read.

> This is a false comparison between interpreted and compiled languages. Not to mention that plenty of C programs use #defines for configuration.

Your misunderstanding me here. I'm not saying #defines are bad. However, they are a different level of configuration from vhosts. You aren't #defining your virtual hosts. Same with php. You have the C code with specific options being set, and then php.ini for the user facing stuff.

> In a dynamic language, and where you trust the person doing the configuration, there's no reason why your configuration shouldn't be in a source file.

Except we saw at least one reason today.

> I don't believe PHP's ini_read caches anything.

so check if the ini file key is in memcache, if so get it from there, if not, ini_read it and populate the cache. Not hard.

Sure there is, APC. But that aside, Linux won't read that file every time, it will end up living in the disk buffers that it maintains.

The disk buffers make the operation cheap but not free, i.e. keeping the information cached is the right thing to do.

If that is a big problem, pass secrets as environment variables. These are getting passed and read anyway.

A huge mistake, but Tumblr's staff are probably quite experienced, nonetheless.

How about that gotcha though? A single character is mistakenly changed from "<" to "i" and that exposes the source code to the browser. Think about that.

I wonder if the 'i' has anything to do with vim

Clearly they should ban vim from their development process! ;-)

No, joking aside, the normal way to handle this is to have the perimeter server (nginx or varnish) catch any 5* responses and turn them into a user-friendly error-page. That way you never expose sensitive stack traces to your users.

So, this is standard stuff and easy to fix. However who of us hasn't screwed up on a similarly trivial issue before? I wouldn't judge them too hard on this one, happens to the best of us.

Clearly they should make vim mandatory on every machine;-) I guess the 'i' comes from someone who is used to use vim but in this case using another editor.

That wouldn't have caught this though, this code wasn't even executed, just sent as plain text.

Nginx/varnish can also catch Content-type: text/plain (or text/php?) and treat that like a 5xx response. Yes, that's a bit of a hack, but unless tumblr needs to serve text-files (which I doubt) it's a legit safeguard.

I know you were kidding, removing all text editors from your production servers might be a good idea

This is my guess. Anyone who's used VIM for years knows that the first couple of times using a different editor, your text is littered with "i" and ":w".

It could simply come from forgetting to exit from insert mode, and then coming back thinking the editor is in normal mode.

That was my first thought as well, as a long time vim user I kind of cringed.

Well, a single character change + a developer not testing their code locally + not running high level unit tests + deploying untested code to production.

Tumblr has at least one full time sysadmin. I think it's more reasonable to suppose he made the change to a production config file, rather than a developer. A definite failing in their deployment process, but it really says nothing about the quality of the application code or developer testing.

It's needed to connect to databases with authentication. What would you suggest?

I believe the way to do it "correctly" would be to put the actual file containing the credentials in a non-public location and just do an include where you need to access it. At least that's how I do it. I could be wrong...

If one would made a similar typo in that file, its contents will be displayed too.

A solution would be to have a .ini-like (or some other simple-to-parse format) config file and PHP code to read its contents. PHP code could be leaked, but config file contents wouldn't.

Yeah seen a few client libraries that are doing that. Even putting them in a config only file lessens the chance of something like this happening as it is modified a lot less often.


I believe you are wrong. include()/require() would equally happily display any file's contents outside of `<?php ... ?>` scope (the case with "i?php"), within document root or not.

Edit: I've tested this:

    $ php -v
    PHP 5.2.6-3ubuntu4.6 with Suhosin-Patch (cli) (built: Sep 16 2010 19:51:25) 
    Copyright (c) 1997-2008 The PHP Group
    Zend Engine v2.2.0, Copyright (c) 1998-2008 Zend Technologies

    $ cat test.php
        require "/tmp/test2.php";
    $ cat /tmp/test2.php
        define("TEST", "test");
    $ GET http://localhost/test.php
        define("TEST", "test");

Off course, but in this case, passwords would only be exposed if the config file had a miss-typed opening PHP tag. If "test.php" had it, you wouldn't be able to see the contents of "test2.php".

Yes, you are right. And in this exact case they (mis-)edited the file, that contained passwords (i.e. test2.php in my improvised example).

Sure, I was replying for this hypothetical situation that you guys ware discussing, where they would store passwords in a different file outside of webroot ...

it's always included to the document root (the bootstrap file). it doesn't matter where you're including from, a broken open tag would cause errors like this one.

If you're doing an include, wouldn't the passwords still be in a PHP file?

if the php file is printed rather than executed, the include will not be followed. You'd see the "include /path/to/inaccessible/file" but you wouldn't see the passwords within the include.

I think. I haven't seriously used php since 2003 or so.

Yes, but included files still need a <?php tag at the top, or PHP just prints them out.

That wouldn't help in this case, since a typo in the config file is making the file not be parsed as PHP.

If you want to avoid that possibility, you can use a non-PHP format (eg: YAML) and parse it.

Client SSL certificates? I don't know what DB they're using, but at least Postgres and MySQL support certificate-based auth.

How is a client certificate different from a long random password? Both have to be stored somewhere. The correct solution is to keep this (and everything else) out of your web root.

If that i?php blunder happened to me, my users would see 10 lines of code. One include for the framework (which lives outside the web root), plus three calls to get the framework to handle the current request.

Those 10 lines would be totally unproblematic

.pem file won't be leaked in this ("i?php") manner. Only file's location would be revealed.

Of course, you are completely right, it should still be inaccessible from the web.

Typically, these files are not web accessible. Problem solved.

This bug was nothing to do with the file being web accessible. If you miss the opening <?php tag out in any file, regardless of whether it's in the web root, it will get printed straight out to the browser.

But you would never `include()` a PEM file.

No, better words would be: "Putting passwords in php file under doc root, why not keep it outside doc root?"

Good point. Minus the keys/passwords the "harm" would be limited.

Please name at least 5 of them.

Tsk, tsk. It's trendy to blame PHP for every problem out there.

Also, it's trendy to drop unseasoned developers onto it and expect them to build complex applications that are publicly accessible yet 100% foolproof. Go figure.

If you asked me, I couldn't even tell. I've encoundered some in the past, sure, but found (in docs) both rationale and the correct way to use stuff.

There was that reference-vs-value matter when passing objects around, in PHP v4.x, but that's fixed by v5.

I understand that view.

But I am talking specifically about something that would just dump the code of the php file to the browser.

I think this was more of a systemic failure - the file that failed in this case was also relied upon to disable development mode/error output on production servers. I'm pretty sure my production web servers return a HTTP code 500 with no further output when they encounter an error with the php.ini-production configuration file provided in the PHP source distribution.

The hitch was of course that the file in question had a "i?php" (someone was using vi and hit i too many times?) instead of "<?php" which lead the php interpreter to output that section of code as if it were HTML. That feature is something PHP could provide an option to disable and only allow explicit echo/print/printf which should be what templating engines use.

Actually, PHP returns a 200 when it encounters a fatal error.

This explains why it was indexed by Google.

they just had a single typo i?php instead of <?php I guess somebody's using vim

At Last.fm we had a Subversion pre-commit hook which ensured that all .php files began with <?.

That would have easily caught this problem, no drama.

Unless the file was checked in separately. If they were doing something like updating the db password, so they need to update "current running" and "up-and-coming" code. As a result, they may have made the change in production, with the typo, and the change in SVN, without the typo.

I would put my money on something like this being the cause.

+1 to russ. PHP syntax checking on precommit would have caught this.

Edit: I'm wrong. It does not catch this.

Not necessarily. Makes sense for syntax check to only check between php tags.

Root of the issue is that by default php outputs anything not between php tags to the browser.

Perhaps some sort of buffer control at the top most include can disable output to browser until a later time. That's pretty much how templating systems work except they don't disable output afaik thus leaving possibility of this open.

Exactly right. There's no reason for a config file (or any of the setup code) to be writing output. A good way to force this is by immediately starting an output buffer with a callback that returns nothing, then calling ob_end_clean() after setup is complete.

You could do ob_start(); include '...'; ob_end_clean();, but it turns out that die() (say, in a custom error handler) () causes an implicit buffer flush. So it's not a foolproof solution.

You can attach a callback to the buffer to prevent that implicit flush.

    ob_start(function() {});

Thanks, I'll investingate and probably adopt it for my code :D

Or register_shutdown_function - http://blog.kevburnsjr.com/php-fatal-error-500

No chance. The problem was a missing <?php tag -- PHP syntax cheching would NOT have raised alarm, because everything before a <?php tag is not considered PHP.

Forcing every file to start with <?php is just PITA for developers working on templates.

> Forcing every file to start with <?php is just PITA for developers working on templates.

Yep, our templates were .tpl files, which is probably a good convention to have even if you use raw PHP as your templating language.

But you get the drift. This is something you have to deal with when you use PHP.

If the templates are raw PHP, yeah. Constructing HTML in code like that is a PITA and a recipe for errors and XSS vulnerabilities. It's common to use templating engines like Smarty or Mustache. In that case, you can standardize on PHP files starting with an opening tag and having no closing tag. Then syntax checking is a piece of cake.

After using eZ Publish with its templating language (also implemented in PHP) for about a year, I've found `raw' PHP just more expressive and concise -- and my team, coming from various backgrounds, has that preferrence as well.

Whaddya know, PHP is a templating language, after all.

Also, it takes us less effort to ship product that matches performance requirements with `raw' PHP than with another layer of abstraction.

That's why I always git diff and run tests before committing, I do that all the time :-/

And this is the reason one should separate production and development environments and NEVER edit any files on production servers (let `git push` or something similiar handle it)

Shouldn't this have been caught by checking chsnges before committing them or on a staging server.

Just put: display_errors = Off is your php.ini and you will never see any errors. This is recommended everywhere for a PHP deployment.

    ; This directive controls whether or not and where PHP will output errors,
    ; notices and warnings too. Error output is very useful during development, but
    ; it could be very dangerous in production environments. Depending on the code
    ; which is triggering the error, sensitive information could potentially leak
    ; out of your application such as database usernames and passwords or worse.
    ; It's recommended that errors be logged on production servers rather than
    ; having the errors sent to STDOUT.
    ; Possible Values:
    ;   Off = Do not display any errors
    ;   stderr = Display errors to STDERR (affects only CGI/CLI binaries!)
    ;   On or stdout = Display errors to STDOUT
    ; Default Value: On
    ; Development Value: On
    ; Production Value: Off
    ; http://php.net/display-errors
    display_errors = Off

I guarantee you that Tumblr has display_errors = Off. This wasn't a display_errors related fuckup, it was a missed opening PHP tag.

Well, based on the code that was dumped, it looks like they're using their own error handler which was ignoring display_errors (search for "Use of undefined constant" in the dump).

That custom error handler was never set, because the code you see above never even executed. The reason was, like said many times already, a miss-typed opening PHP tag, which tells PHP where PHP code is. If that tag is missing or miss-typed the code is just returned to the browser like HTML.

The point is that no server configuration can save you from an error like this.

The point is that no server configuration can save you from an error like this.

Huh? Yes you can protect against errors like this; http://news.ycombinator.com/item?id=2343675

Well, off the top of my head, mod_security has rules that scan outgoing data for password or code leakage.

A deployment strategy that requires testing that pages show what you expect them to show would also likely catch it.

If the custom error handler was never set, the PHP notices wouldn't look like that.

Well, this is partially true. The output clearly contains error statements, but this directive would not have stopped the first 748 lines of sensitive information from leaking before the first error.

Facebook had one or more servers in their rotation not configured to actually execute PHP, so it was instead delivered as text/plain. This is a different error - the file was executed by PHP, but since they mangled the opening tag (i?php instead of <?php) it was passed through as HTML by the PHP interpreter.

Except for the obvious – never edit files on the live server – other ways to protect against this would be to have multiple opening tags (first line of file would just be <?php ?>, then another opening tag on the second line), have your VCS check that certain files begin with <?php, store the config in a non-executable way (in a YAML file, or in the server environment), or using a combination of file() and eval() to always prepend the '<?php'.

And people should really install a PHP error handler first thing - before they load anything else - that delivers errors with a HTTP code 5xx they can catch in their caching layer.

Yes. With PHP, you generally put code in the web root. This is a terrible idea, because if your PHP parser becomes non-functional for some reason (or is accidentally disabled) your source code is exposed to the world.

With other web frameworks it is generally encouraged to put source code and static content in different directories. With the code completely outside the web root. This is much safer: never put code (or passwords) in your web templates.

It is possible to do this with PHP, but in practice almost no one does that. PHP, in my experience, has a lot of these insecure-by-default issues.

- Nope, in PHP you don't generally put code in the web root.

- PHP and generally all the frameworks based on PHP are strongly encouraging putting code outside of the web root. Just take a look at directory structures in Zend Framework (http://framework.zend.com/manual/en/learning.quickstart.crea...) or Symfony2 (http://symfony.com/doc/2.0/book/page_creation.html#the-direc...)

- ...and almost everyone that uses PHP professionaly does that.

This bug is nothing to do with putting code in the web root. If you miss the opening <?php tag out in any file, regardless of whether it's in the web root, it will get printed straight out to the browser.

This sounds like "insecure by default" to me - you have to have the opening stanza correct in order for the file contents to be treated as code.

Perhaps a more secure idea would be to require an opening stanza for the reverse - for content that should simply be printed to standard out? i.e. <?out to make content that should be outputed, not <?php for content that should be interpreted.

Sure - I'm not defending PHP. This is all the case for backwards compatibility.

The easiest solution would be to define a new file extension/MIME type which is "PHP-by-default".

that is a great idea. I check if any of the 'hardened php' projects do this, and they don't

Files outside the web root are not accessible by the user via HTTP, so I don't see the issue with that?

Unless you include it from somewhere in the web root, but that's the other insecure-by-default behaviour I was hinting at. With a secure-by-default web framework, it's not possible to get the code to show at all because it's not intermingled with the content.

If there are no PHP files in the web root then what does your web site do?

Every block of PHP code must begin with '<?php', regardless of where it's located, or whether it's included from another file.

I do agree with you that this is a silly behaviour. But it's nothing to do with the web root.

no. this was a config file with a broken <?php-tag. it'll be treated as a plain text file when included/required.

> Is there something at core of php that makes this inevitable?

Yeah, the fact that there is no way to separate code and content. In my past days as a PHP dev, I've done this and similar things many, many times (also putting a space at the beginning or end, and having random other stuff fail because something can't set a header anymore).

It has always appeared obvious to me that PHP should define a "pure code" file, one in which "<?php" would be illegal syntax, and which would never have its content written to the client. That's how nearly all PHP is written, anyway, and it would eliminate a good deal of stupid errors.

It looks like they included something that when the PHP parser fails, it reveals the php code. Hence the connection protocols with username and password.

Try including the sensitive connection protocols from a non-www directory?

I believe the main reason you're seeing it with php apps more often, is because php is a very common platform.

Yes, its is that the web root contains code, at the very least an index.php but likely much, much more.

They've got their Twitter / Facebook / oAuth secret keys in there. Doesn't that mean everyone who sees this can act as Tumblr post to those services on behalf of users?

I hope they've changed them.

Nope, you need the users' token to perform that (given that they gave post to wall permission). However using Facebook secret key, you can ban users from using your app. But then again, you need user id.

But you could have users log into your site and then act as tumblre. If they already OK'd the tumblr app, they would never know. Otherwise they'd just have to ignore the screen saying what App you're OKing (and most people, I'd wager, would, and just think you were using tumblr for something).


Whilst I hope tumblr correct the problem rather quickly as it is a major problem, I find those jumping to blame are forgetting one small problem. No programmer is perfect, typos are easy to mistake on any keyboard and it will happen to everyone no matter how much of a ‘ninja rockstar poodle’ they think they are.

I hate to see someone else work in the clear like this. It’s like popping a zit before your first date. It’s painful and will show up for day/s afterwards. Now I know what will be today’s headline I can bypass techmeme.

Yes its a big boo boo. It’s a massive security risk and to some it may feel like the end of the world but by then it will be tomorrow. Passwords will be reset, keys will be replaced and the valley will be talking about something else. Hopefully it won’t be someone else’s mistake.

P.S Don’t forget to test your code before deploying – now you know why.

Which is why you have test servers and never ever make live edits to deployed code. I find it exceedingly easy to say that this was kind of incompetent.

Typos are very easy to make - but that's why you need to first test your code locally, test your code in a development environment, have others test and approve your code in a staging environment before a small typo gets to production where something like this can happen.

I always use a include for any hashes or passwords in a separate file. When I started learning PHP I exposed my MySQL database password more times then I could keep track of.

It does hammer home the point of staging before deploying. Also the point of making sure you vary your passwords between sites.

Do not store configuration in code. Store it in files that aren't part of the software. Store this file outside the web root.

I know it's easy to criticize, but far out Tumbler, you guys really have to get your act together - the downtime and general laggy-ness is at least understandable, but there is no excuse for absolute newbie foul-ups like this.

Although, on the plus side, having a site that mashes up tumbler as a content provider certainly has given us plenty of opportunity to fine tune and improve our caching strategy.

I sure hope they realize they just broadcasted the pass for the "tumblr3" database user, as well as their Twitter, Facebook, Recaptcha and other secret keys.

Well, at least they're using strong passwords. Fat lot of good it's doing them, though.

If they're clever, that password isn't the actual password, it's a salted hash that the Database class breaks down to the real password before connecting. In theory, that hash alone shouldn't be enough for a breach, unless someone is able to figure out how it's encrypted and salted.

If you can "break down" something to real password, it's not hashing but rather encryption. Anyways, I don't see how it helps if the actual php code is exposed. Adding layers of super-duper-secret php code (Database class) is not going to help if you display them in browser.

My point is more that there's at least one layer of protection between that page and the actual password, which I'd say is more important than displaying some code. It would take more than one mistake to really do damage, which is better than nothing.

You can clearly see all the routes in the app. 400+ routes and only 11 controllers. Most routes are concentrated on 3-4 controllers. Each of those controllers has got to be 10,000-20,000 lines apiece.

The dashboard controller alone has approximately 120 actions.

Where do you see this?

the routes are defined on lines ~1160-5600

They were throwing this when opening a tumblr blog. A twitter search reveals some people have seen this message too.

Have you sent them an email about this?

Just did.

Thanks, I really didn't know where to email them (nor I'm sober enough right now to do that, IMHO)

AUTHORIZE_ID and AUTHORIZE_SECRET_KEY... anyone know if those are for Authorize.net? Yikes.

There was a similar "gaping" hole 2 years back. http://news.ycombinator.com/item?id=164422 Better email them.

Found a link with better formatting on twitter: https://gist.github.com/29c5c5970d1f3313abd1

If we want security, programming languages must either make secure code easy to write or insecure code impossible to write. (Or both!)

PHP doesn't do well on either count.

Writing secure PHP code isn't impossible, but it's tedious even for seasoned developers.

Can someone please explain this a little more? My basic understanding is that they incorrectly opened a PHP tag and exposed the code. If that's the case, wouldn't the page have appeared broken in development? Or have been found during testing?


Their site has a front-end controller (www/dispatch.php) to which all requests get routed (by a mod_rewrite rule, perhaps), rather than having .php files in the web root for each page. This file sets up the environment -- among other things, it registers a custom error handler and includes a config file (config/config.php) that defines a bunch of constants. Then it dispatches the request to the appropriate controller, based on the URL.

Someone (probably a sysadmin) edited the config file and accidentally changed the opening tag (<?php). This caused PHP to output its contents, rather than parsing and executing it. Since there was no output buffer active, those contents were sent directly to the user, which caused HTTP response headers also to be sent automatically. That's the big first line you see in the output.

Since no error had actually been triggered by this point, execution continued. It tried to set an HTTP header ("P3P: CP="P3P_CP"", whatever that means). However since HTTP headers had already been sent, this did trigger an error, which was passed to the custom error handler, which sent some debug output (the rest of the output you see) and stopped execution.

By looking at the code !quality, I wouldn't be surprised to learn that they have no tests at all.

How can you look at this code and make any statement about general code quality? The stuff that was exposed was pure initialization code that would look the same anywhere you look.

Granted. This should move outside of the web root, but just looking at this one file that just sets up configuration values provides no ground to say anything about general code quality

> How can you look at this code and make any statement about general code quality? The stuff that was exposed was pure initialization code that would look the same anywhere you look.

No it doesn't. Having to modify live code just to change configuration options is bad. If I want to add a new slave, or change some configuration option, it should have no affect on running code. I shouldn't be changing any software logic, and that's exactly what editing this file does.

Their are standard ways to avoid this. But no, if changing your config file means changing what essentially amounts to changing the executable of your code, you're asking for trouble.

It doesn't really matter if the configuration is in an executable form or not. Without testing, you can break it either way. With testing, you probably won't. They're equivalent.

This issue isn't the site breaking. The issue is the leaking of the configuration options. By removing it from the code, you can help avoid that. And you can even go the extra step of making sure that even if you load a broken config, the system will continue to use the old config.

If one accepts that errors like these happen, I guess it would be a good idea to have an automated way to quickly change passwords on all the services that are used. Does anyone have some citations for literature on how to deal with that?

Good point.

Just like you'd want to have a well described (preferably automated) way to restore from backups, you should also have one for resetting all passwords. Such a process is also useful for protecting against disgruntled employees.

"this is not what I had in mind when I said we should open source the backend"

Regardless of how it occurred, and given that this isn't a language issue, I have two questions....

Does anyone know how long was it actually in this state? (There's a heck of a lot of entries in the Google search quoted in another comment, but then how often does Google index tumblr?)

Did no-one at least press F5 or CMD-R after making the edit, let alone run tests? Quality control is the real issue. I can easily imagine myself making this mistake, typo's are the source of the majority of my bugs, but I find it hard to imagine taking more than 10 seconds to notice it.

I haven't used PHP heavily in about 2 years, but it surprises me that at this point there's not a configuration setting that negates the need for the opening PHP tag and instead just treats everything in a .php file as php.

I doubt anyone using PHP at Tumblr's level is mixing raw HTML into their PHP files and thus has no need for the php tags.

If you have a PHP app, place nothing inside of web root other than an include to your front controller. This is all that should ever get exposed:

    include '../app/front_controller.php';

Then, if you accidentally edit your front controller file so that it starts with 'i?php' instead of '<?php', it will get exposed.

This is what happened with Tumblr.

You are right, it wil expose the front controller, but it wont expose the includes from there. If you screw up the open tag on the config file, even if it is outside of webroot it will be exposed

I just tested a few variations on one of my old projects and the reason why I didn't see it is because I had ob_start() and was killing the buffer in my templating class (which is a good idea)

not to mention that the db passwords were in an external yaml file anyway (most of the frameworks do this) which would get loaded and cached

The ob_start thing is an exceedingly good point and I don't know why I didn't think of it myself. Probably because I haven't touched PHP in 3 years.

> Probably because I haven't touched PHP in 3 years.

we should start a support group

Sure but you will expose only that line, which tells you almost nothing.

Down voter: care to explain why? My statement is correct ...

I didn't vote on your remark, but elsewhere in this thread, it's demonstrated with six lines of PHP that the contents will be exposed.

All this could have been avoided by using:

  if(getenv('tumblr_env') == 'production') {
    //See: http://php.net/manual/en/function.register-shutdown-function.php

I bet the tumblr developers are happy 100+ people are debugging their mistake.

They could make their app completely open source then ; in that way they'd probably have a better quality assurance process than "suffer, apologize & fix".

A human error caused some sensitive server configuration information to be exposed this morning. Our technicians took immediate measures to protect from any issues that may come as a result.

We’re triple checking everything and bringing in outside auditors to confirm, but we have no reason to believe that anything was compromised. We’re certain that none of your personal information (passwords, etc.) was exposed, and your blog is backed up and safe as always. This was an embarrassing error, but something we were prepared for.

The fact that this occurred at all is still unacceptable, and we’ll be seriously evaluating and adjusting our processes to ensure an error like this can never happen again.

How to prevent a leak like Tumblr's - http://news.ycombinator.com/item?id=2343561

Don't edit files on the production server?

I rarely go off like this but man, look at this PHP file!

It's a mess! Very long, not dry, no wonder the dev fat fingered it...

Is the rest of the framework like this?

No headless output testing? Manual output checking (at the least)?

For the record, as much as I hate to admit it now I was a big long time PHP developer. My code is used round the web today...

It's a config file. How can you make dozens upon dozens of unique memcache, database, and defined constants DRY?

Thats the point. Who says it has to be a config file? Think outside the box.

Here is how Flask can do configuration (a Python micro framework):


class Config(object):

    DEBUG = False

    TESTING = False

    DATABASE_URI = 'sqlite://:memory:'

class ProductionConfig(Config):

    DATABASE_URI = 'mysql://user@localhost/foo'

class DevelopmentConfig(Config):

    DEBUG = True

class TestinConfig(Config):

    TESTING = True

Zend Framework also handles configs with inheritance. It's an elegant way to handle overrides, no doubt. They do one better by allowing the definition to be in ini files, which would have circumvented the Tumblr bug.

Still, I think config files are often best when they're verbose and as straightforward as possible. There are fewer surprises this way. If I'm working on a project by myself or with a very small team of people I trust, I'd go the inheritance route. If the team is larger, I'd go the verbose route.

I pretty much agree with jjm comment. I never have ONE config file but a directory of config files, everything is separated : database, memcache, heywatch, s3, routes, ... It's way easier for the developer, the sysadmin, the deployment, ... Of course it would multiply the risk of this happening by n files.

I feel really bad for them right now, but I can't stop reading it.

This looks like a vim switching mode mistake to me. Oh humans...

that means the edit was done on the production server?

Clicked on link, just saw a huge ad and no obvious way to dismiss it.

I tought tumblr.com was build with Ruby on Rails.

the attacker left their IP address in the dump, and their LAN interface address

Edit: it's whoever dumped this as opposed to an attacker

I don't think tumblr have acted on this yet. The other exposed pages have S3 API secret keys, facebook api secret keys, the username and passwords for vimeo, clickatell (whatever that is), twitter oauth secret key, etc. they are going to have to revoke and re-setup each of these.

It doesn't really look like there was any attack. One of the developers made a mistake with the PHP opening tag in the config file. Anyone visiting the page would have seen this output.

there are dozens of pages that have been exposed on this one server. I have no idea how the opening tag can get mangled in so many different files

By having a central config file that is included throughout the pages. Once included, it's not parsing it as PHP because it has the typo in it - thus showing it on all pages that include that file...

If the typo is in a top-level layout file or something that would happen.

I hope their firewalls are good because this may instigate an attack. Tumblr, 80,443,established,related accept, all else DROP.

Firewalls have got nothing on exposed API keys.

This is very true. In this case the obvious choice is to create new keys although how inconvenient it will be I dont know.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact