Hacker News new | past | comments | ask | show | jobs | submit login
Update on Git.php.net Incident (externals.io)
136 points by gslin 4 days ago | hide | past | favorite | 69 comments





Okay, disregarding the whole incident part of things, I have two main points I wish to address:

First, why on earth would you still be using non-parametrized queries. Hell, I even remember than back in php 5 the docs said explicitly please use paramtrized queries. To think that after all that evangeling mysql_query et. al. would still be used is really saddening.

Second, huge fucking kudos to them for the incredible response time, transparency, and great logging (looking at you Ubiquity).

All in all, PHP was my first real programming experience and will always remain my sweetheart. I hope it grows even stronger of a community.


> First, why on earth would you still be using non-parametrized queries.

Legacy software no one wants to maintain, and no one is paid to maintain.


It’s worse than that: the advice to use parameterized queries goes back to the late 1990s before that legacy software was even started. The problem is that it’s ever so slightly more work and some people will say “I’m busy, I’ll be smart enough to always escape my inputs” and will get the CVEs to prove it.

Unfortunately, some of the core developers were in that camp so PHP 4 came without improvements, PDO avoided the opportunity to be safer, etc.

register_globals had a similar arc: people knew it was a risky feature before the turn of the century but turning it off would inevitably get someone whining about how hard it was to explicitly import their variables or check both GET and POST, as if this wasn’t trivially abstracted.

A lot of this goes back to 90s C / Unix culture. PHP was a product of that world and had the same “Real Programmers™ check their inputs & return codes. If your code breaks, it’s your fault for not being a Real Programmer™ and I don’t want to be slowed down by safety checks intended for you.” attitude which has taken decades to stamp out.


> It’s worse than that: the advice to use parameterized queries goes back to the late 1990s before that legacy software was even started

MySQL didn't introduce prepared queries until 4.1, which was released in 2004.


Yes - this was a common criticism if you were trying to get MySQL into an environment with a different database since this was a SQL92 feature which was good for performance and security. That had often been seen as a performance move or a way to avoid confusing error messages but that started to change by the turn of the century as the technique became more visible and so many web apps were fronting databases rather than static.

Poking around https://metacpan.org/pod/DBI I notice support added by some point in 1997, possibly as early as 1996. Python’s DBI spec had it no later than 2001.

The other thing to remember is that it wasn’t uncommon to have drivers emulate this behavior on databases which didn’t have protocol level support for it. That didn’t help performance but it did accomplish the goal of making sure that data wasn’t confused with code.

To be clear, I’m basing my comments on having used PHP professionally starting with PHP 3 for many projects, including some household name companies. I trained a fair number of people, some of my earliest open source work was in PHP, etc. so I don’t hate the language but I definitely think there are cautionary lessons to learn about the value of defaults and how languages are taught. As we’ve seen with C, telling people to be more diligent is less effective than making the default safe behavior you have to opt out of rather than the reverse.


> which has taken decades to stamp out

You optimist. I think we're still trying to stamp it out :-)


It’s definitely not gone but I feel like it’s shifted to a defensive posture & not getting many new adherents. Part of that is probably better hardware & languages making the performance and developer productivity arguments less persuasive.

It's amazing that the Internet have evolved a lot, ton of open source software has been developed, web framework authors been participated in lots of ways for their community and cloud computing are widespread.

WordPress is still using non-parameterized queries, it is assume they have been battle tested over the years. While PHP is 26 years old, we are not surprise legacy software are still existed.


No one had to maintain it though, they could have moved to GitHub or GitLab at any time, or use one of the many open-source self-hostable solutions like Gitea, Phabricator (it's PHP), GitLab, ...

That still requires someone to do the migration. Don't forget that aside from VCS, this system also integrates with their bugtracker, wiki, etc.

Anecdata about the username guessing - some 10 years ago while investigating a second order sql injection incident I saw exactly the same pattern.

Attackers got the passwords but not the usernames. Turns out the stored input had a character limit and the project stored fields as 'username' and 'pass'. Since 'pass' was shorter the attackers were able to squeeze it in a short query, but couldn't get usernames.

Looking at https://main.php.net/login.php it's 'user' and 'pw' in the form, assuming the db fields match.


Some unpaid volunteers "maintain" an extremely outdated tech stack that hosts the most used programming language that literally runs the entire eCommerce world, moving unimaginable amounts of money every day. Totally insane. We should rethink programming languages from an economical standpoint.

It's hard if not impossible to name a resourceful tech giant with best paid non-volunteer talents that was never breached or hacked. With and without extremely modern tech stacks.

Totally not insane if you ask me. Security incidents can happen to anyone. The incident response matters and I think in this case those unpaid volunteers did an excellent job.


> unpaid volunteers

"at PhpStorm, we are fans of Nikita’s work. We always supported the Open Source, and this felt like a new opportunity – so here we are! Nikita will continue contributing awesome features to PHP, and together we will experiment on what is possible in the realm of language tooling."

~ https://blog.jetbrains.com/phpstorm/2019/01/nikita-popov-joi...


FWIW, the two people who are paid to work on PHP (Dmitry and myself) both only work on the PHP implementation and not other parts of the project (infra, websites etc). The infrastructure is maintained entirely by volunteers.

I am personally not qualified to do infrastructure maintenance, knowing next to nothing on the topic.


You're not wrong, but commercial entities also go bankrupt, stop funding projects or take care of code quality just enough to be able to sell a product and get the money. Clearly unpaid volunteers seem to be a successful alternative to the commercial world when it comes to general code quality and especially longevity.

It's also often the case that those unpaid volunteers have a day job related to the projects they volunteer in, so there is some direct or indirect financial support. It seems reasonable to assume a large part of open-source software and infrastructure is maintained by people having vested interests in it.


As I said another time before. At moment, the IT world builds the highest towers on very soft sand. The younger generation don't even remember in the 90s you had mostly even to pay for Linux (you had to buy CDs, because the Internet was to slow and expensive).

Though you did only pay for the service of sending a CD to you - you were allowed to freely share those with anyone you met.

From SuSE (Germany), you bought the books and the CD was a gift. That was a law thing, they couldn't sell a CD but on the same time just don't give any warranty.

It is never a problem until it is.

And the idea that there is some mischief- or accident-proof alternative is, I'd argue, equally nutty.


you'll find this exact circumstance for pretty much 80% of all software dependencies out there.

Kinda scary how bad master.php.net was.

- md5 passwords more or less

- “...running very old code on a very old operating system...”

- no parameterized queries


I'm assuming PHP is a victim of it's success here: Widely used systems feel incredible pushback to breakage of existing functionality when "everything is working". In this case, fixing the password security would've required breaking everyone's ability to commit to the PHP codebase, so it was not done.

This is pretty much... all large scale enterprise IT: Everyone's good with security improvements until it interrupts a business unit, and then someone up the chain decides it isn't worth the disruption to deal with a hypothetical security risk.


PHP is a victim of a lack of core contributors and a reputation as a toxic environment if you get involved.

We think of Python as severely under resourced (2 FTE devs on the language IIRC?) but PHP has less. PHP's ethos means there's no FAANG sponsorship; companies like Jetbrains do sponsor core developers but the developer of the de facto debugger, unit testing framework and package manager/repository all fundraise for their own work. It's nice in the 00's-era hacker ethos, but for a language so widely used it's deeply under-resourced.

And that's one reason why the infrastructure is old and creaking.

As recently as 2020 there was pushback by php-internals' members to using GitHub because it wasn't open/their own infrastructure. This incident has been a catalyst for change, and there is much more change needed.


Any environment that regards marketing as proscriptive instead of subscriptive is already insecure anyway :/

> fixing the password security would've required breaking everyone's ability to commit to the PHP codebase

The parameterized queries could have been fixed, though, right? And you can progressively update people's passwords to a different algorithm as they log back in, and warn them that in six months their password will need to be reset if they haven't logged in.


And if someone hasn’t logged in for 6 months, their account should probably deactivated anyway.

But, yeah, the comment above is right about enterprise IT. I’ve sat in on meetings where 2FA was ruled out as being too unusable by the marketing team, who said quarterly password rotations were a “good compromise”. They just want to be seen to be doing something about security, and pick fixes that don’t involve much cost to anybody, but also tend to provide little value.


> fixing the password security would've required breaking everyone's ability to commit to the PHP codebase

Can’t you have a scheme that is like bcrypt(md5(password)) and since you already have md5(password) in the database you can migrate to bcrypt?


Not if you need to support HTTP Digest authentication.

well http digest auth is stupid. it looks more secure because the password is not sent unencrypted over the wire. however it still is bad.

kerberos or even basic auth (because the password can be checked on the server against a real algorithm) would've made this impossible.


"The master.php.net system, which is used for authentication and various management tasks, was running very old code on a very old operating system / PHP version"

Yeah, that's not great.


Sounds like the golden age of PHP.

I’ve heard the term dogfooding before but this is ridiculous

They dogfooded, past tense. But nobody wants to work with old PHP code - which I guess is part of the dogfooding in this case? idk

> Running old code for an old service that no one remembers or use any longer...

which is why every two years, I switch the servers completely to fresh new machines, installing everything anew. Let those old things that we installed for who-knows-why die off.


People (well, me too) these days script that too, so the good thing is you don't forget a service or configuration, the bad thing is you never forget a service or configuration.

This is I think a problem in pretty much all of IT; people install a system and forget about it. What 'people' (sysadmins?) SHOULD do is create a maintenance schedule / contract at the same time, and make sure that it is transferred to the next guy in case of moving on in a job / position.

Doing what I can, I make a point of having a calendar reminder to update my software's dependencies every month. Doing it frequently reduces the amount of work needed and keeps things up to date.


Welcome in the real world. It's almost the same like in traditional maintenance, were you have to change filters in HVAC or change mechanical parts all x hours of running and so on. Funny it's also an Cobol thread on the first page of HN. That code does run for years and years .. and I read here make everything new every 2 years. Big projects are not even done in 2 years .. I'm work in industry automation and I'm more and more happy, if a new thing has no computer at all. So much work, so short life.

It's a big problem for large corporate sites with lots of sub-brands. A million little sites get launched and then no one ever cleans them up. It would be fine except that they're ticking time bombs of WP ownage etc. It's my firm belief that any site which isn't an on-going concern should be changed to a static site in a cloud bucket just to defuse all these time bombs.

This is why I document everything during installation, and automate security updates if the system doesn't have any special needs requiring package freeze.

Moreover, we template system installation further, so I can install the same server with the same config, with the latest packages with just three commands.


I don't have much faith in the security of php.net. A while ago qa.php.net exposed the environment variables of thousands of users who had compiled php and run the test suite.

They guess it could be a user database leak, making it possible to use https with password auth for pushing. But the attacker didn't know the usernames.

Can't that also suggest that the user database is fine (otherwise they would have guessed correct usernames in the first place), but when a user was found they could find their password in some other leak?


As we don't really know, we went with the worst-case assumption. If it's wrong, then at least it was a nice excuse to update some old infrastructure :)

I came to say the same, though I suspect they may have gotten the passwords by watching them being typed. These people go to conferences or whatever and could have leaked the password, but the username probably wouldn’t need to be typed.

At least to me, this seems like the most obvious vector for a leaked password, but not username.


Something that would have prevented this from the beginning: PGP signed commits. Important projects should use it, no matter how annoying is the extra setup.

In fact it turned out to be 10 minutes when I first did it and I am very glad that I did.

> running very old code on a very old operating system / PHP version

> the new system supports TLS 1.2, which means you should no longer see TLS version warnings

> The implementation has been moved towards using parameterized queries, to be more confident that SQL injections cannot occur.

> plain md5 hash

That's not fantastic. I get that people often don't touch working legacy systems for fear of breaking them, but this sounds rather avoidable. Does the PHP project go through any audits or risk assessments? It's rather surprising for, as others have mentioned, the backbone of the majority of ecommerce websites.


>That's not fantastic. I get that people often don't touch working legacy systems for fear of breaking them, but this sounds rather avoidable. Does the PHP project go through any audits or risk assessments? It's rather surprising for, as others have mentioned, the backbone of the majority of ecommerce websites.

PHP only has 2 full time engineers that work on the core. Everyone else is a volunteer, what do you expect? Companies to actually contribute towards employing more active contributors? HAH


Let's be honest, using years old server, weak unsalted hashes and un-parametrized SQL on a critical security service.. is very on-brand for PHP.

It's kinda sad that they're moving onto a proprietary platform though.

Why not consider Sourcehut, or even GitLab, which are both hosted and they don't need to deal with handling their own infra?


  It is notable that the attacker only makes a few guesses at usernames, and
successfully authenticates once the correct username has been found. While we don't have any specific evidence for this, a possible explanation is that the user database of master.php.net has been leaked, although it is unclear why the attacker would need to guess usernames in that case.

from the logs, the hacker seems to be guessing the password too. this is not consistent with a database leak. it is consistent with a team work where a separate reconnaissance job has been done before hand and the hacker had access to profiles of the developers with separate lists of known usernames and passwords.


Seems like the first breakdown of actions mentions a false name which makes me think that if Git GPG sign / verify was used this wouldn't be a problem?

From what I understand they assume a user database with md5-hashed passwords may have leaked.

Now md5 is outdated, bad and was never made for passwords, but still, there's no easy way to simply invert md5. So for this to have happened, the password must've been bruteforced. Which is only practically possible if it was a weak password.

Which makes me think maybe that's not what happened and maybe the real culprit is password stuffing. If someone had a password weak enough that bruteforcing is plausible then maybe that person also used the password for another service.


> Which is only practically possible if it was a weak password.

Back in 2016, hashcat could achieve 200 billion hash crack attempts per second on commodity hardware: https://gist.github.com/epixoip/a83d38f412b4737e99bbef804a27...

That can crack all alphanumeric 8-character passwords in 18 minutes, and all 9-character alphanumeric+symbol passwords in just 10 days!

Then consider that there are precomputed rainbow tables that have had far more computer power thrown at them, and suddenly MD5 starts looking more like a "light obfuscation" than a true cryptographic hash...


>Then consider that there are precomputed rainbow tables that have had far more computer power thrown at them, and suddenly MD5 starts looking more like a "light obfuscation" than a true cryptographic hash...

...because that's not what a cryptographic hash means. SHA256, a "secure" hash is vulnerable to the same things you mentioned. The term you're looking for is a password hash, or a key stretching function, which is intentionally slow to be resistant to brute-forcing. As for rainbow tables, that's solved at the application level with salts.


Yep. I think current PHP best practice is to use standard library password_hash() and password_verify() which takes care of salting:

https://www.php.net/manual/en/function.password-hash.php


Just a question: is there anybody who compiled the affected source?

Yes, we all did. The commit containing the backdoor was reverted with a new commit. See c730aa26bd and 2b0f239b on `php/php-src` repo on GitHub.

tl;dr

   git.php.net supported pushing changes not only via SSH, but also via HTTPS

I don't think that's a terribly good tl;dr. Allowing git pushes over HTTPS is not necessarily a problem per se. The likely real culprit is mentioned later in the blog post:

    The master.php.net system, which is used for authentication and various management tasks, was running very old code on a very old operating system / PHP version.

Why do these systems bother with passwords at all? Given you can always reset with an email, just have login always be the reset flow.

https://en.wikipedia.org/wiki/Greylisting_(email)

Are you ok to sometimes wait up to an hour to log in?


Modern email systems don't seem to use Greylisting. I used it many years go but it seems to have gone out of fashion nowdays. I now use Fastmail which doesn't seem to employ Greylisting (or at least I've never seen it become an issue). I suppose spammers have now gotten wise to it and have now devised ways to get round it so it's no longer effective and thus no longer widely used.

I'm running my email on a private, small volume domain. Greylisting is still super popular and common. You won't experience greylisting on popular domains... because they're popular.

I set up greylisting on our MTA and eventually had to remove it. While I was okay with waiting a while to receive a message other people were incredibly frustrated with it. It was particularly troublesome when a service would send them a code in email, it didn't arrive, and then they would ask for another one. The first code would finally arrive, but it was marked as invalid because the second code that they had asked for was now the valid one.

It just wasn't worth the trouble and headaches.


These days, I mostly read my email on a device. I still log into stuff on my computer. Having to bring up another website, logging in (something I don't do on the tablet / the mobile phone), waiting for an email and then maybe be allowed in is a productivity killer for me.

Do not reinvent the wheel. Passwords work. Just store them somewhat sanely with an acceptable degree of security.


Some websites do support this sort of flow as an alternative authentication method.

So every time I want to log in I need to open an email and follow the link, instead of being able to use my password manager to log in?

Sounds like progress to me.


So you don’t use two factor?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: