Hacker News new | past | comments | ask | show | jobs | submit login

> ven assuming that "modern" PHP managed to come up with better ways to deal with all of this, I assume that these obsolete functions and operators still linger for backward compatibility? If so how do you avoid them?

I think the "path of least resistance" is important: developers are time-constrained, understanding-constrained, lazy (if they're virtuous), etc. There's a big incentive to do whatever is easiest/quickest.

When I last used PHP, about 5 years ago, there were OOP APIs cropping up to replace many of the standard global functions; namespaces had just been introduced; closures had become useful; frameworks like Symfony (and Drupal 8) were becoming established, rather than the old "plugin" approach of throwing around arbitrary code and hoping for the best; dependencies were being managed by composer; files could be autoloaded from sensible locations; testing frameworks like PHPUnit and PHPSpec had become best practice; etc.

Yet all of those things were opt-in and verbose. The path of least resistance was still:

        Hello <?php echo $_GET['name']; ?>
(For non-PHP programmers, this is appending a GET parameter straight into the page, which is an XSS vulnerability). Doing things "properly" took a great deal of effort and discipline.

Compare this to something like Java: it favours class-based OOP so much that even "hello world" needs a class. The path of least resistance is to do things "right" (from Java's perspective). Haskell's path of least resistance is simultaneously easier ("hello world" is just `main = putStrLn "hello world"`) and harder (`main` uses the `IO` type, whose API enforces certain conventions).

Deprecation warnings, linters, etc. can help with this; but PHP's only real strength is its installed base of code and developers; changing the language too much would throw away this advantage (akin to being a new language, see Python 3 and Perl 6); not changing it enough prevents the more serious and/or systemic issues from being dealt with.

I wish the language designers and users luck, but I'm really hoping to never use it again ;)

The $_GET/$_POST pattern is rare in modern applications (anything built in the last 5 years) and is rightly recognized as an antipattern. Similarly many of the older obsolete functions have been either removed or fallen out of fashion.

You are absolutely right that the path to this has been fixing the path of least resistance. Laravel and other frameworks have made it easy to get variables using their preferred methods (and automatically sanitizing input). Learning resources (tutorials, docs) utilize the OO functions and many of the older functions were deprecated and have been removed for years. If you look at one of the more infamous functions, mysql_real_escape_string (https://www.php.net/manual/en/function.mysql-real-escape-str...) that was removed entirely in PHP 7. If you look at the more modern mysqli, they've chosen to have both escape_string and real_escape string, but one is an alias to the other. Similarly, sane defaults have become the norm, especially since most developers are using containers or VMs/vagrant to program. The one major outstanding issue is the api naming is still inconsistent.

On a sidenote, the PHP docs are great with pages showing examples and notes talking about potential issues. When I switched to python I couldn't believe how bad the docs were in comparison.

To understand that, you have to take a closer look at its origins. For me, this talk by Rasmus Lerdorf (creator of PHP) was a real eye-opener: https://www.youtube.com/watch?v=SvEGwtgLtjA He mentions that what we now refer to as the "PHP language" was originally intended to be only the templating system, and you were supposed to write your business logic in a "real language" such as C. However, as PHP got more popular, bits and pieces were added until the "templating system" became a full-fledged language. Sounds like this happened more by chance than by design. And because you can't throw all of that out, it shows until today. You can introduce new features, but doing that in a way that stays (more or less) backwards compatible is difficult and forces compromises - see type declarations, which are actually just type hinting and of very limited use. So sure, PHP is wildly popular because of its "first mover" status, so it will probably stay with us for the foreseeable future, and working in it may have become more pleasant over the years thanks to better tooling, but it will unfortunately never be as pleasant as in other, more well-designed languages.

That's kinda the problem with any domain-specific language: it either requires a full language for support, or it becomes one (and generally not a very good one). Every language is an opinion about what things should be easy and what things are allowed to be hard.

I've got nothing against DSLs and template languages and such, but most of the ones I encounter make me sigh and say, "I know you thought it would be easier but it's just one more thing I have to figure out how to debug, without any of my usual debugging tools."

> what we now refer to as the "PHP language" was originally intended to be only the templating system, and you were supposed to write your business logic in a "real language" such as C

I did get a little perverse amusement at writing HTML files containing nothing other than single `<?php` element; where that PHP's job was to generate HTML; and it did so via a templating language (e.g. Twig or Smarty)...

You can't remove those old APIs without changing culture, and culture is really hard to change.

One thing that shocked me during my brief foray into PHP was how heavily popular PHP apps like WordPress rely on the filesystem. Want to move a site? Copy the database, and copy a bunch of files. It seems like there's a cultural expectation that applications run on a single server in your closet. Of course you don't have to write code this way... but people still do.

WordPress is NOT an example of modern PHP practices though. The amount of legacy insanity in WordPress is enough to make any sane dev cry...

Why was it shocking? PHP (originally) stood for "Personal Home Page" after all...

comparing a web page with a potential XSS vs a java 'hello world' (assuming console output) is a bit disingenuous, no?

<?php echo "hello world";

There - no XSS, no HTML, no class requirements.

I think you're confused. The PHP snippet was to demonstrate my point that:

- The path of least resistance in PHP is often directly at odds with the "proper" way of doing things (including many things listed in this article).

- This is important because PHP's intended domain is Web development, so this can easily cause security flaws.

- Any attempt to "fix" this example will inevitably make it longer, more complicated, harder to remember, etc.

Your PHP "hello world" does not demonstrate those problems, since it has no XSS or HTML, so it is a poor example.

After this demonstration, I abstracted to the more general concepts of "path of least resistance" and "'proper' ways to do things", irrespective of language or domain.

To clarify my meaning for these phrases, and to demonstrate how language designers can use the former to push the latter, I gave two examples: Java pushing its preferred approach of class-based OOP, by disallowing raw, top-level statements; and Haskell pushing its preferred approach of monadic I/O, by enforcing the type of `main` and restricting the available APIs. Notice that neither of those examples make the "proper" way easier (that can be very difficult, in general); instead they eliminate anything that's easier (like top-level statements), such that the "proper" way is the easiest thing that's left. In other words, Java's requirement that even "hello world" be wrapped in a class is a good thing for Java programmers, since allowing top-level statements like `System.out.println("hello world");` would undermine the principles of the language. This general argument is not specific to PHP, and certainly not a direct comparison to the XSS snippet.

After defining and demonstrating these general concepts, I then returned to the specific case of PHP, to point out how backwards-compatibility with its legacy of easy, insecure approaches undermines the attempts to improve the situation. In other words, the "proper" way to write PHP (at least, back when I used it) is to use OOP, namespaces, type hints, escaping of user input, etc. Yet the design of the PHP language discourages all of those, by providing easier alternatives which are "improper" (like in my XSS example); and removing those alternatives (in the same way that Java forbids top-level statements) would break almost all existing PHP projects and require the majority of PHP developers to change their habits; and doing so would eliminate PHP's main selling point (installed base and developer mindshare).

I hope that clarifies why the comparison is not disingenuous (i.e. because I'm not making the comparison that you claim).

If I were to make a comparison of that vulnerable PHP code against something else, it would need to be against a language which is primarily designed for Web development (to avoid being disingenuous), and it should be designed to make the "proper" approach the easiest. The Ur/Web language fits these criteria nicely, and (from a quick skim of the tutorial at http://www.expdev.net/urtutorial/step1.html ) the equivalent to that PHP would be:

    fun greet data = return <xml>
        Hello {[data.name]}
Not only will this will perform the correct escaping, it will also be a type error if another page tries linking to this one without giving a `name` parameter.

funny, because it's what React.js allows to do nowadays

The difference is React actually handles escaping properly. It won't just dump out raw HTML by default, so you won't have an XSS issue.

PHP, on the other hand, requires manual escaping with htmlentities() ... It is very, very error prone.

You are comparing a framework to a language though. Escaping in PHP is usually handled by the templating component, you don't go around writing htmlentities() everywhere.

If you're dealing with decade old code that uses no framework, you certainly do. PHP is, by default, a primitive templating language...

PHP comes with a Web framework built in (that's what things like `$_GET`, `$_POST`, `htmlentities`, etc. are). It is also a templating language, that's why we need to write `<?` at the start (to open a PHP tag).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact