Hacker News new | past | comments | ask | show | jobs | submit login
Google results for PHP tutorials contain SQL injection vulnerabilities (waritschlager.de)
556 points by phil294 87 days ago | hide | past | favorite | 324 comments

Pretty much. The best way to insert supply chain exploits is to embed them in a stack exchange answer to a beginner's question.

This isn't new, we've always had programmers who programmed by "recipe" rather than first principles, and DRY paints that as a feature, but it underlies a lot of pain and cost over the years.

To give some context, I inherited some kernel code when I worked in the Systems Group at Sun Microsystems in the 80's that was written by a mathematician who had become a programmer because the money was in programming, not applied math. They had cut and pasted code they didn't understand in order to achieve the result they wanted out of the code they were "writing." When I inherited it I read through it and found a couple of dozen ways the code would panic the kernel[1]. Once fixing those obvious issues, it became clear that the original owner of the code didn't really understand what computation did. They had an idea, and mathematically they could show that it was correct, but literally no ability to express that algorithmically.

This is not a "new" problem but it is an important one that managers of software engineers need to watch for.

[1] At the time the only difference between "kernel" programmers and "application" programmers was that kernel programmers recognized that unsafe code crashed the whole system, not just the application. So they tended to be cultivated from paranoid programmers.

In PHP's case, Stack Exchange is not necessary to get SQL injected tutorials. The official docs for _years_, if not decades, included them. The docs for how you were supposed to do SQL were just full of the antipattern of building queries by string formatting and concatenation. I wouldn't be surprised if some dark corner of the docs still had those available.

> The official docs for _years_, if not decades, included them. The docs for how you were supposed to do SQL were just full of the antipattern of building queries by string formatting and concatenation.

PHP directly exposed the libmysqlclient C library. Any language that provides the ability to send a raw SQL query (hint: almost all of them) has documentation you can copy and paste to introduce an injection vulnerability.

You'll find injection vulnerable examples in the MySQL docs themselves: https://dev.mysql.com/doc/c-api/8.0/en/mysql-real-query.html

> You'll find injection vulnerable examples in the MySQL docs themselves: https://dev.mysql.com/doc/c-api/8.0/en/mysql-real-query.html

I can't find those examples, is there something in the mysql docs that I can copy and paste and be instantly vulnerable?

MySQL docs shouldn’t worry about injection.

Why downvote ? Why database needs to worry about executable wrong input ? That’s input ( App ) level problem. Database will do what it is asked to do.

Because people learn from and refer to the documentation. Examples therein should be presented in a manner that demonstrates safe execution.

So why should Perl, Go, PHP, or Rust?

Because those language are usually used at application level, and application does need to worry about wrong input. Databases doesn’t need to worry about if some correct security level input ask to delete the database , it should delete it, not to question the input , but at application level , that is not acceptable. How much simpler can this get ?

Because not everything that uses MySQL is a web server?

because it's serious if it's a web server?

In defense of PHP, this is probably due to simplicity. PHP got many people started with server side web development, and docs were supposed to be "simple". The thinking back then probably was "if we make the example more complicated, we will loose people", which was probably even true for the target audience.

> In defense of PHP, this is probably due to simplicity

I don't consider this to be a good defence; in fact, I'd argue that makes PHP itself insecure.

I like to think of secure coding in terms of the 'path of least resistance' for a lazy/busy/inexperienced developer. If doing things securely makes life harder, things will be done insecurely.

We don't need to make the secure approach easier in absolute terms; we can just make the insecure approaches more painful. In this case: have database functions require arguments of an 'SQL' type, rather than strings; make it easy to write literal SQL values; make it easy to parameterise SQL values; make it as hard as possible to convert a string value to an SQL value, e.g. bury it in some deep namespace hierarchy, with a long and scary-sounding name, require a config value to be enabled (or even make it a compiler flag!), etc.

This way, the docs (plus stack overflow, blog posts, etc.) don't have to choose between showing a secure approach or showing the simplest approach; since they are the same!

> make it easy to parameterise SQL values

It usually is easy to parameterize things like values used in WHERE clauses. It's often much harder to work with dynamic query conditions (e.g. optional filters on a particular column). I don't believe I've seen an approach that does this in a way that can provably provide any sort of safety guarantee.

Do you have an example? I can only imagine three scenarios:

- SQL logic. We should be able to write this as a literal query, parameterised as needed, e.g. (made up syntax)

        SQL"SELECT foo FROM tbl WHERE @MyParam IS NULL OR bar = @MyParam"

 - PHP logic. This is just ordinary control flow, like anything else, e.g.

      ($myParam === null)
        ? SQL"SELECT foo FROM tbl"
        : addParam("MyParam", $myParam, SQL"SELECT foo FROM tbl WHERE bar = @MyParam")

 - A mixture of SQL logic and PHP logic. This seems inherently unsafe to me, so it's not surprising that safety guarantees can't be proven. My point is that such things should be be made difficult ("artificially", if needed), such that nobody would choose to go down that route when another option is available.

I'm not quite sure what you're asking for an example of. But suppose you have a table tbl(foo, bar, baz). Depending on user input, you may want to query on any combination of foo, bar, and baz. With this and larger number of columns, it becomes impractical to have conditions for every combination.

One approach would be to construct a list of conditions as well as a list of parameters to be substituted. Shown below without any particular language syntax, but hopefully comprehensible.

conditions = ("foo=?", "bar>?") parameters = (fooValue, barValue)

Then when building the SQL, you join the conditions together with AND and substitute in the parameters. This works in the sense that you are still prevented from injection. But it's rather messy. I suppose perhaps you can actually do what I'm discussing with some ORMs in a reasonably clean way. But my point is that most SQL interfaces make it easy to parameterize a single set of fixed values, but hard to do so for table and column names. Arguably this is a feature not a bug since you probably want to avoid such parameterization anyway. But having a safer way to do so would be nice.

You are aware that PHP was made popular roughly 20 years ago, right? In any case, I agree with your theoretical approach, however, I think you are ignoring the reality of "coders" out there. Security is still a niech topic in CS. Most people still want to get their work done, not caring about security at all.

> Most people still want to get their work done, not caring about security at all.

That's exactly why tools and languages should be designed for security, so that their users don't have to care.

I also disagree about 'not caring about security at all': security is opposite to functionality, since it prevents things rather than enabling them. If developers truly didn't care about security, we would see far more use of 'eval' as a way to plug systems together. Instead, we see a huge amount of effort spent on defining data and interchange formats, parsing parameters, branching on their values, etc.

For example, any time an API/URL provides options like '?sortField=age&sortDirection=DESC', that indicates developers who do care about security. In contrast, we can make a far more flexible API by accepting arbitrary code instead, like '?postProcess=(x) => x.sortBy((elem) => elem.age).reverse'; and this is much easier to implement, since we could just send it to 'eval()'.

>I don't consider this to be a good defence; in fact, I'd argue that makes PHP itself insecure.

Agree with you completely on that. PHP, is in fact, insecure by design.

I don't see this as a defense, I am seeing it as a way to reach to more programmers, and basic security practices be damned in the process.

I have to wonder: what did these people have to gain if PHP got popular (which it did)? That's an egotistical way of popularizing your language.

Now do C.

I know right. That's why I'm not doing C anymore, for 10+ years now.

Was Dennis Ritchie egotistical for popularizing C? He would have known all the security flaws in it, right?

Very disagreed with your analogy. By the time those PHP tutorials were written, SQL injection attacks were already known.

So not sure what you're trying to push here but I refuse to participate.

Still to this day you have free-after-use bugs in the Linux kernel, about 20% of all known bugs in the kernel. How is this not comparable?

It is comparable, that's why I'm moving to Rust. Hence I said I don't want to participate simply because I am not defending C either. It played its role and it's time for it to be phased out.

Which MySQL library did Ritchie wrap back in 1972?

My bad, which version of Linux should Ritchie have been auditing in 1972?

I see what you mean, but at the same time, back then just getting a working PHP environment up and running was complicated enough - PHP, Apache, MySQL, and you had to get them all working together.

If it was about accessibility, they should have made an easy installer and even offered cheap hosting themselves I think.

As for SQL injection, were prepared statements even a thing back then? Either way they should never have allowed and normalized string concatenation to build up SQL queries.

"just getting a working PHP environment up and running was complicated enough"

Hah, that was one of the biggest strength of PHP stack - it was not complicated; on you MS Windows machine it was enough to install some wammp/xammp, etc. PHP/MySQL/Apache bundle, open editor, put in the first line <? and start coding.

On production, typically some shared hosting (cheap! Another said stack advantage) this was already installed, so it was sufficient to FTP files over there and be done (one more advantage).

There was no other comparable stack in terms on simplicity and being able to do something quickly. I believe there is none today, only PHP stack matured, so there are frameworks, etc.

Yes, there were security concerns, but still much less comparing to its server-side predecessor CGI scripts (better known today as AWS lambdas or "serverless").

Configuring the "plugins" and everything to get PHP working on Apache on Linux can be complicated and annoying if you're not already familiar with the process.

There have existed multiple installer projects for PHP, Apache & MySQL on Windows for two decades.

Mysqli driver was released with PHP 5.0 in 2004, it has prepared statements.

> As for SQL injection, were prepared statements even a thing back then?


yes, prepared statements have been a thing since there were relational databases.

but also, (server side) prepared statements are not required in order to use SQL with bound parameters. the binding can occur just as easily on the client side, and this is in fact quite common. the point is that the programmer is not manually deciding whether or not to escape a parameter on a query-by-query basis, the process is automated.

> As for SQL injection, were prepared statements even a thing back then?

Yes. I was writing prepared statements in Perl before PHP3 was released.

When I started like this, I always followed up with a google query for securing what I did. Not the best way to learn, but as you say, it got you started.

The fact that it got newbies started doesn't mean it was built for newbies. I think you're making too many assumptions here.

"Else you are not getting the authentic php4 experience!" /s

PHP has been in a poor shape for many, many years. It started shaping up in last rather few years, and there is a large backlog to tackle, colossal if you include all the numerous tutorials and Q&As from getting copy-pasted since 2000.

The language never was the problem, not since PHP 5.1 at least (which introduced PDO) and that is 16 years ago.

The problem always was the ecosystem that took decades to update and the fact that Google's search is algorithm-ranked and not supposed to be curated by humans, which would have kicked out at least the most horribly insecure stuff.

PDO was slower, the interface - critically - tons of examples still either made prepared statements a side note or actively encouraged string concatenation, and it preserved the flaws of the previous interfaces like ignoring errors and warnings (silent is no longer the default in PHP 8).

I no longer consider myself part of the PHP community (started around 1999), in part due to the low priority reliability and security had. It was exhausting having to vet code so frequently because even experienced developers forgot all of the rakes in the grass.

I disagree. As someone who has used PDO a lot it still makes it unnecessarily hard to use parametrized queries compared to the libraries available in most other languages, even C (at least for PostgreSQL). The pgsql library for PHP is also pretty good, better than PDO.

You can do parametrized queries in two lines of code with PDO. I wouldn't classify that as "unnecessarily hard", sure you could split in half to one line of code, for that you need to write your own wrapper function.

But, you need to write the hard thing once. In a 10 line wrapper function and then call that function from everywhere.


Problem solved.

I guess every PHP developer writes a bunch of these wrapper functions for common sql tasks before he starts his work.

Or installs Doctrine DBAL which handles it all for you.

At the cost of some performance, but the ease-of-use with a highly-tested and relied-upon library normally outweighs that.

Yup, make it work (securely!) before making it fast; you can't diagnose performance if your code isn't working as it should in the first place (and SQL injection safety is a non-functional requirement; you can't consider code working if there's that weakness). In practice, bad database design or access (like the n+1 problem) will weigh heavier than the overhead added by a library like Doctrine. Or, it should, if Doctrine adds THAT much overhead it's a problem.

This is the most PHP comment in this thread yet; "yeah, I know this very basic API is inadequate and broken, but you can just write a wrapper, everyone is writing the same wrappers anyway!"

Is PHP supposed to be a high-level language or what? Hell, I'd consider it a very flimsy excuse even in C (in most contexts anyway).

Already answered in this thread. There is nothing hard or wrong about the API. It is just lots of misunderstandings and commentators not looking things up before commenting.

  $stmt = $pdo->prepare('INSERT INTO user (email) VALUES(?)');

I mean, PDO wasn't great (in contrast to perl's DBI which was pretty good and offered parameterized queries even if the underlying interface didn't), so it made sense to prefer mysql_query. But mysql_query didn't do parameters, and mysqli did, but only it you ran a new enough version of MySQL.

I wouldn't call it a language failure per se, but a problem with the libraries that shipped with the language. That distinction may not make a difference.

Bad examples that stuck around don't help either, of course.

Long after PDO was a thing, the official docs still included the old insecure jank.

Is your contention that Google is to blame for indexing them?

> The docs for how you were supposed to do SQL were just full of the antipattern of building queries by string formatting and concatenation

This is still to this day the recommended way to construct a "WHERE foo IN (a,b,c,...)" query in PHP. It's insane that there's no way to pass an array of values into a parameterised query for that use case.

If there's anything more annoying than the lingering bad examples of PHP code online, it's people lying about what PHP "recommends", like you do.

Common database drivers like PGSQL, MySQL, SQLite etc. don't accept arrays of values for parameterized queries. This is at C level, in their own client libraries, and their own communication protocols.

This means there's nothing specific to PHP about this problem. Many higher level libraries, including for PHP, abstract over this problem and do offer arrays by binding per query.

So where are statements like yours coming from? Probably just being eager to say something bad about PHP without checking your facts too much.

Maybe I'm misunderstanding you, but it looks like it is a PHP specific problem. Python does not have it.

Of course at C level parameterised queries ("prepared statements") do not accept arrays. It would not make sense for them to do.

This is because the placeholders can be substituted by values of different types.

In typed programming languages, arrays have elements of the same type.

For example in SQLite, you are supposed to call `sqlite3_bind_int()`, `sqlite3_bind_text()` [1], once for each query parameter.

In languages like PHP and Python, where arrays can carry values of different types, their wrappers around the SQLite C functions can do this function calling for each value in the array. In Python, that is easy, the default, and explained in the very beginning of the official standard library's sqlite documentation [2]:

It states at the very beginning that query construction by string construction is unsafe and must be avoided. It immediately provides an example of how to safely call a parameterised query with an array of values, using `execute()` and `executemany()`.

PHP's standard library simply does not seem to have such an `execute()` function that accepts an array [3], nor do the official docs seem to contain any prose that could explain how to use the library safely [4]. The only way you can find out is by reading user-contributed comments on some specific functions in the function reference.

So Python's standard library provides safe functions, and immediately instructs the user how to use them. PHP's does not. Unclear to me how one can conclude that this isn't a PHP specific problem.

[1]: https://www.sqlite.org/c3ref/bind_blob.html

[2]: https://docs.python.org/3/library/sqlite3.html

[3]: https://www.php.net/manual/en/sqlite3.prepare.php

[4]: https://www.php.net/manual/en/class.sqlite3.php

You're misunderstanding You cannot do something like the following:

   "WHERE status in (...?)" and then ->execute($status_array)
But you can pass an array of parameters just fine (to individually bound input placeholders). It depends on which API you're using but it's of the format:

    ->bind_param($types, ...$params); or ->execute($params);

"Maybe I'm misunderstanding you, but it looks like it is a PHP specific problem. Python does not have it."

You can concatenate a sql query just fine in Python anyway you want. Adding a sql injection is just as easy in Python, PHP, or lisp. Thus this is a choice. Nothing to do with a language. Language bashing is gross and spreads lies. And yes, you can bind an array of params in PHP.

Actually if you use .NET EF Core 2 FromSql or ExecuteSqlCommand, you get parametrized queries from string interpolation for free: https://docs.microsoft.com/en-us/ef/core/what-is-new/ef-core...

> Maybe I'm misunderstanding you, but it looks like it is a PHP specific problem. Python does not have it.

I don't know how carefully you read what I said, if you misunderstood that this is a native library limitation (C level) of the actual database clients, and that most of the popular PHP libraries can bind arrays.

> This means there's nothing specific to PHP about this problem.

I don't particularly care if languages that I am not using have the same problem. It's up to people who use those languages to make those criticisms.

> Many higher level libraries, including for PHP, abstract over this problem and do offer arrays by binding per query.

Feel free to link an in-built PHP library that supports this feature, that would be far more useful than just obliquely suggesting that one exists.

PDO explicitly does not support the functionality being discussed [1]:

> For example, you cannot bind multiple values to a single parameter in the IN() clause of an SQL statement.

You're required to roll your own implementation, something that thankfully isn't particularly difficult, but unfortunately also seems to be enough of a barrier that a lot of programmers don't bother.

[1]: https://www.php.net/manual/en/pdo.prepare.php

I'm pretty sure PDO doesn't support escaping WHERE/IN(...) because MySQL prepared statements don't support it. In theory, PDO could support it in emulation mode, but supporting something like that only in emulation would be an antipattern.

Similarly, PDO doesn't support dynamic table names. You're expected to roll your own precisely because it demands more situationally specific escaping than regular parameters would, e.g. testing a list of allowed tables.

> would be an antipattern

I just want to point out the above is identical to saying "is bad" without explaining why it's bad.

We can say it doesn't align with PDO's goal of being barebones library that doesn't add non-native features on top. But that's not what it does. It both omits many native features on the various databases it supports, and has non-native features like hydrating objects, for some reason.

I'd say if a PHP library is at the same level as the C library that backs it, that's a failure of design goals. C is intentionally low-level (by modern standards) and PHP is supposed to be high-level and consumable by people with much less clue than C programmers.

It's unfortunate that you pretty much need a DBAL (like Doctrine DBAL) on top of a DBAL (PDO) to get some of the missing features. Like, say, escaping identifiers.

It is hard to design good API:s that can last for decades. Just look at Java, language that is backed by large international corporations but the Java standard library still sucks. PHP core is a much smaller community driven project.

Key to great APIs is battle tested APIs in real world projects, finetuned over years of experience. PHP has that in the much larger general PHP community and to access it you use composer.

We, the PHP community, must use all resources to compete with other languages, it is unrealistic that the core PHP team can implement a great API, for example we can look at the filter API, it works but it is not great. PHP core also has longer release cycles.

We need to push PHP developers to use composer more, maybe the PHP docs should state that.

To be fair prepared statements happen at the RDBMS. PDO is just a library to interface with that. It can't compile a query execution plan without knowing the table and columns involved. So dynamic tables or columns as variables doesn't make sense so you need to handle that outside the prepared statement as it doesn't belong there.

Just out of curiosity, do you have a preferred mysql library in nodejs? I wrote my own based on node-mysql that does server-side prepared statements, but it does them by compiling PREPARE and SET calls, not through a lower-level language. It works for my needs, but it's not really high-performance.

Are you just talking about prepared statements? Mysqli was released in 2004.


I see what you're saying, but maybe "array" is the wrong term. Indeed you can't say `WHERE foo IN (?)` and pass more than one parameter (or an array parameter) for the single `?`. That's a limitation of the databases, not the programming languages. But Postgres does have arrays, and you can pass them as parameters. In fact that's how you solve this problem. An equivalent to `foo IN (...)` is `foo = ANY (array[...])`. So parameterized that would be `foo = ANY (?)`. Instead of a bunch of parameters, you have one parameter of array type. The parens here signify a subquery, not a list of single-attribute tuples. In fact using ANY is more expressive, since you can say `ANY (array[])` (but you might need to add a cast), but you can't say IN ()` (which is a syntax error).

Anyway your point is true that lots of languages' client libraries and ORMs implement sql "parameters" by string substitution. That's still better than having the programmer do it himself, but not as good as it could be.

> Common database drivers like PGSQL, MySQL, SQLite etc. don't accept arrays of values for parameterized queries.

One workaround to that is passing the array as a string with some separator, deconstructing it into a temp table and then using that table as a array when it is part of a stored procedure.

Even better, you can pass it in a JSON array, and use the built-in JSON functionality to iterate through it. I started doing this in SQLite, and it made some queries a lot cleaner.

postgres does. PQexecParams, which executes a sql statement with parameters, accepts oids that are array types.

Probably good idea to use a framework or a database library in PHP. Laravel has Model::whereIn('foo', [a, b, c, ...])

I'm shocked no one else gave this answer earlier in the thread. If you're using PDO directly in 2021, you're absolutely doing it wrong. You don't need to use all of Laravel, or even all of Eloquent for that matter. If you don't want to depend on a framework or use an ORM, you can use "illuminate/database" (https://packagist.org/packages/illuminate/database) for a secure wrapper around PDO. No need to reinvent the wheel.

> If you're using PDO directly in 2021, you're absolutely doing it wrong.

This is somewhat the point. If using the language's standard libraries is "absolutely doing it wrong", that's an indictment of the language.

>This is somewhat the point. If using the language's standard libraries is "absolutely doing it wrong", that's an indictment of the language.

Exactly, all languages have footguns but some have a lot more than others. You don't hear for example Java developers bitching about JDBC to anywhere close to the extent PHP developers bitch about the various common approaches to database connections.

> If using the language's standard libraries is "absolutely doing it wrong"

You are being deliberately obtuse. Other comments in this thread offer correct examples of using PDO to avoid SQL injection. I didn’t mean it was impossible to write safe database code using the standard library—obviously, PHP is a Turing-complete language, it can be done!—I just meant it’s awkward, and verbose, and developers are unlikely to do it consistently throughout an application. Hence this type of concern is best abstracted into a library.

To your point about “indicting a language,” most languages have footguns like this. The worst you can say about PHP is that the documentation should do more to discourage new users from working with PDO directly. (And I mean the official documentation—the language maintainers can’t be held responsible for the kind of unofficial tutorials the article complains about.) But regardless of what the official docs say, most PHP development today is done using frameworks like Laravel, Symfony, and Zend framework that do not suffer from SQL injection issues.

> It's insane that there's no way to pass an array of values into a parameterised query for that use case.

Maybe I'm misunderstanding you, but assuming $params is an array in the following code, isn't this passing an array into a parameterized query for that use case? (Edited to note this is literally an example from the PHP documentation, and not one of the squiffy comments.)

    $place_holders = implode(',', array_fill(0, count($params), '?'));
    $sth = $dbh->prepare("SELECT id, name FROM contacts WHERE id IN ($place_holders)");
In Python, using MySQLdb, I believe this would be something like

    place_holders = ','.join(['%s'] * len(params))
    cursor.execute("DELETE FROM foo.bar WHERE baz IN (%s)" % place_holders,
Which, while more succinct, seems to be functionally exactly the same thing. I don't see what PHP is doing that's "worse" here offhand.

One could argue, "Yes, but a Python programmer would use SQLAlchemy," which is probably true, but then you need to let the PHP programmer use Doctrine or Eloquent.

> isn't this passing an array into a parameterized query for that use case?

Yes, I definitely should have been more specific there - what I'm referring to is passing it as one parameter, instead of potentially dozens or hundreds. There's a lot of ways to do this safely, but none of them are elegant. In the example presented here I believe it's the case that you can't do the ->execute($params) and bind some parameters explicitly, so if you had something like " AND status = ? AND due_date < ?" at the end of your query you have to chuck those variables into the same nondescript array.

I prefer the looped bindParam() method for this reason, but that has its own challenges. Firstly, it requires some boilerplate (not a big deal, but no-one likes writing boilerplate), and more pressingly it still has the issue where it actually is each individual element of the array being parameterised, and spams the ever-loving crap out of any debug outputs.

Obviously all of these issues are way less concerning than SQL injection vulnerabilities, but life would be so much easier if you could just do $sth->bindParam(1, $params); on a single question mark, and have that show up logically in things like debugDumpParams(). Even if you had to use special syntax to indicate when a parameter is expected to be an array, that would be a huge improvement.

I'm sure there's technical reasons why this is more difficult to implement than it would initially seem, but I've seen enough string-concatenated queries on StackOverflow from people who just give up on getting the parameterisation to play nicely that I believe it's worth the effort to make doing things right as frictionless as possible.

How often are you passing dozens or hundreds of parameters to a single sql statement? Maybe there is a better way to structure things.

How often am I doing it? Not very. But when there's a WHERE foo IN(a,b,c,...) query that has an arbitrary list as input? Could be any number of parameters in there (although I think most SQL drivers start complaining in the early quadruple digits).

Better is to concatenate question marks and then pass the params as an array.

Yes, something like this

  $user_ids = [1, 6, 46, 3, 17];
  $count = count($user_ids);
  $in = '(' . implode(', ', array_fill(0, $count, '?')) . ')';
  $sql = "SELECT user_id, email FROM user WHERE user_id IN {$in}";
  $stmt = $pdo->prepare($sql);

I realize this is just an example but if you're in control of all the values then there's nothing unsafe about concatenating them into a query. The problem comes with values that are submitted via a web form, API, or some other external, untrusted source.

Yep. Fortunately a lot of the use cases of IN() are with controlled values - often IDs obtained from a previous query - so string concatenation is safe (and a lot less hassle than the alternatives).

Unfortunately that gets people into the habit of using string concatenation, which is not a great habit to have.

This is true but it’s still better to default to the Safe Thing, as long as there isn’t a good reason not to. How long until those values you know you control carelessly get turned into values you don’t control somewhere along the way?

I used to mess with other devs by injecting bits of code from other sources (when blink in html worked it was one of my favorite ones). The correct way is to bind your params and do not trust that the data you got from some other system is 'OK'. What may be fine in one system could be an escape code in another.

I speculate that the reason this is such an issue is because the interface at the ODBC level is basically security wise broken. It works 'OK' for getting/putting the data but it has 2 modes of execution. One of those paths is not great for security, the other has a usage issue. 'Binding' can be a real pain as it takes at least 1 call per variable parameter. Then managing the buffers correctly. So just building up the strings is an easy way to skip a lot of steps. So many take it. But that path leads to security vulins.

Not sure if I follow you completely, but in my example $user_ids could come from an external source because I’m only concatenating question marks (?), then bind $user_ids with execute and that is safe. What I also always do before passing the $user_ids to execute if it comes from an external source is to validate that everyone of them is an integer with filter_var.

Can't you at least build a "IN (?, ?, ?)" string, if you're going to build a string dynamically?

Not only is it the case that you can do that, it's the case that you should do that. Which leads to the question of why it's not in-built functionality. Making secure code harder to write than insecure code is a great way to ensure that lots of people write insecure code.

You end up writing code with 16 ? parameters and filling the empty ones with -1 or something ;)

No, you use array_fill() and implode() to generate exactly as many placeholders as you need.

What do you mean? You can Google php parameterize query and get atleast 2 different methods of doing this. You shouldn't ever touch string concat in any language when doing queries

> You can Google php parameterize query and get atleast 2 different methods of doing this

There's no way of doing this with a single parameter. You need to parameterise every single individual item in the IN clause to do it that way, which is a horrific solution when it's of a completely unknown length.

Still better than string concatenation in many cases, but that the language has no in-built way of doing it is one of the many reasons PHP code is so often vulnerable to injection attacks. There's so much friction to writing secure code.

That's what the docs should show because that's how it works. The docs should give you the streamlined barebones implementation. It's trivial to write your own function to use parameterized queries and add in all the type checking etc you need. It's only a few lines of code.

> It's trivial to write your own function to use parameterized queries and add in all the type checking etc you need. It's only a few lines of code.

It's fairly trivial to do this, but now you're potentially adding thousands of parameters per query in circumstances where the contents of the IN() are variable in count. This is not ideal for a number of reasons.

Additionally, a language should be designed such that the easiest possible way to do something is at least moderately secure. If you need to attach some boilerplate code on top of the standard libraries every single time you use them for it to be safe, then there is no reason for that boilerplate to not be in the standard libraries.

My first job while in school was doing web development with a LAMP stack. I had zero PHP experience, so it was 100% learning on the job and my learning resources were basically the official docs, a PHP book (can't remember which) I got at a book store, and stackoverflow.

PHP has a very forgiving design; it makes it very easy to get any trash code up and running. It really is great for newbies to get their hands dirty. I look back fondly on that first job, but boy did I have to unlearn a lot of bad lessons from those days.

To be fair to PHP, a lot of other languages had bad examples around as well. Pretty much every language has a way to do string interpolation on untrusted input and pass it to to a database .

Java and C# 1.0 examples (mostly) did not have anything like this, because they shipped with a database layer (JDBC/ADO.net) and not just a raw driver. PHP instead spent many years fucking around with hacks like addslashes() and etc before addressing the root issue.

Java ones were pretty easy for me to find...

  String insertQuery = "insert into student values('" + studentNo + "','" + studentName +"','"+ studentAddress + "','"+studentAge+"')"; 
  int result = statement.executeUpdate(insertQuery); 

Result #5 on Google for: jdbc insert program example

And this one, #1 for "jdbc example variable where" on Google:

  String query = "select LastModified from CacheTable where " + " URL.equals(url)";

> Pretty much every language has a way to do string interpolation on untrusted input and pass it to to a database

Otherwise they would not be Turing complete.. but defaults/easiest route does matter very much in this case.

And the default docs and tutorials. If you look at Ruby on Rails. The way every tutorial will show makes it impossible to get an sql injection. Then as you gain more skill you will eventually find the function that lets you run a string query but no guide or tutorial shows you this so it is less likely you will use it without understanding.

Ruby and Rails tutorial is not a Ruby tutorial. The ancestor comment talked about language tutorials, not framework tutorials.

JDBC does have an option to pass parameters without doing string interpolation and I would not consider it a framework.

I don't mean going through hoops. String interpolation is easy in most languages. See https://news.ycombinator.com/item?id=27958651 for example.

> I wouldn't be surprised if some dark corner of the docs still had those available.

It's not exactly a dark corner, but at least there's red boxes all over the place.

Back in the 5.x early days it had disclaimers but clearly not enough to discourage people from keeping unsafe code in place.

> The best way to insert supply chain exploits is to embed them in a stack exchange answer to a beginner's question.

I'd love to see a concrete example of this happening in this way!

The rest of your story just describes 'smart, but bad jr. programmer' and doesn't really discuss the exploit issue.

> The best way to insert supply chain exploits is to embed them in a stack exchange answer to a beginner's question.

Do you have any actual evidence for that?

As Hanlon says, "never ascribe to malice that which is adequately explained by incompetence."

Incompetence explains this one fully for me.

It doesn't take a malicious actor to make a honeypot, only to exploit it. If someone naively posts a widely used solution on a help site, they've done the job for the exploiter, who only needs to know what low hanging vulnerability fruit awaits them.

"Programming from first principles" doesn't solve security problems, in fact security problems in this case come from being ignorant of best practices and industry experiences. You would _want_ programmers to use established techniques to avoid this problem. Of course developers should always understand what their code is doing though.

> DRY paints that as a feature

That's not what DRY should be. Good developers should understand at least a couple of levels of abstraction underneath what they're writing in order to produce sensible code.

The idea of abstraction is that you only have to spend the afternoon/day/week (depending on the complexity that's being abstracted over) learning how everything comes together once, as opposed to spending that time to grok a slightly different version of the same complex system every time you read/write something new.

Yeah; anyone who thinks DRY means copy the answer on SO is completely missing the point.

DRY should not introduce vulnerabilities. It avoids them by

1. reducing complexity and cognitive overhead

2. allowing you to fix your code in one place and propagate your fix throughout the codebase

Ironic how in today's market the applied math dude could easily transition into some higher paying role like ML Engineer or Data Scientist. Not knocking those roles by any means, but the tech world seems ravenous for algorithms.

From my experience, algorithms are easy and the software engineering is hard.

What’s more, many, not all, but many of these algorithm scientists look down on their programmer counterparts. It’s these folks who end up making the algorithm successful for the company.

Thought the best way was to develop a basic and easy library. Then upload a malicious binary that doesn’t match the source.

There was a really good post on how to do it and evade detection.

I believe all programmers will resort to copying code without too much of a review at some point. Strict first principles would mean not to rely on mental work of predecessors, which is entirely unrealistic today. Maybe that could work in the 80s, but the amount of software layers today is astonishing.

I develop bare metal code for special µC, but I would never imagine to build even a basic OS. This is work you spend your lifetime in. Of course if I could just copy the things from established OS things might look differently.

That said, I don't think the first steps in SQL should guard against SQL injection. That is a topic for later and only hides the main learning target. You can understand SQL perfectly by first principles and still cause your first program to allow for such injection. But that should be a different lesson at first. Being able to identify it as a danger also relies on the experience of others.

> This isn't new, we've always had programmers who programmed by "recipe" rather than first principles, and DRY paints that as a feature, but it underlies a lot of pain and cost over the years.

Something tells me that exploiters love the programmers who attempt to build user authentication systems from first principles.

> This isn't new, we've always had programmers who programmed by "recipe" rather than first principles, and DRY paints that as a feature

No, it doesn't. Programming by recipe rather than building the recipe into a reusable abstraction is the exact opposite of DRY.

The problem seems to me to be if you hire newb developers and don't have any mentor-ship or code review. Any non-newb dev will know that you ALWAYS have to sanitize strings for SQL. file names or whatever. Most SQL ORMs or libraries will let the developer run raw SQL so you better have some competent person writing teh code or at least review it.

Certain people just aren’t “real programmers”. I worked with a guy who would cut and paste some example and change a couple things, and respond with “I got this working”

Meanwhile I would said okay, what does it actually do, where’s you copy it from, what did you change and why...

At this point he would just get mad at me. I'm sorry I don't want people cutting and paste code they don't understand and sticking it in our codebase.

And the issue is the fucking search engine, I can't understand why when you search for JS,html,css documentation google sends me to outdated websites like w3c schools. I always have to foce a search on MDN.

A newb can also copy paste bad Python code, mess up some ORM clause and delete all your database.

But in this case oh PHP and MySQL even the worst dev shop uses a framework or library, so this article is probably affecting almost nobody that matters.

>> I can't understand why when you search for JS,html,css documentation google sends me to outdated websites like w3c schools

Probably the same reason they still send people to sites like Expert Exchange

> Any non-newb dev will know that you ALWAYS have to sanitize strings for SQL. file names or whatever.

This is the “No true Scotsman” fallacy of programming. There are many people with plenty of experience who haven’t yet learned that lesson memorably and at least an order of magnitude more who know it but don’t exhaustively trace every data flow through the system and assume something already handled validation or escaping, being correct all but a fatal few times. With unsafe defaults an attacker only has to find a mistake once — you have to find them all.

There is no unsafe defaults, all ORMs allow you to run raw SQL so same "newb dev" can do same stupid thing if it finds a SQL tutorial when googling "how to do X".

You need to teach developers to always escape strings, even if we remove old and bad tutorials you still need to teach the devs about this issues, otherwise they will do the mistake with file names, or with parameters to shell commands. It might mean having to read a book and not reeling on Google and soon on AI to teach you to code or SQL.

There’s a difference between what you encourage and what’s technically possible. PHP code so commonly has bugs of this class because there’s a quarter century of tradition, examples, and tutorials normalizing the idea of taking request variables and passing them directly to other code. Contrast with something like the Django tutorials & docs where the examples pervasively use the ORM’s escaping & validation and the extension points describing how to use custom SQL tell you to use placeholders and emphasize why it’s important.

That might not seem like a lot at first thought but it lowers the bug frequency enormously in the code I’ve looked at because you only need extreme caution in rare cases rather than every view. That means that when someone is busy, having a bad day, etc. they either have no problem or it’s a safe crash rather than an exploitable hole.

That is not the reality. Even the most stupid and lazy person will find a framework to use.

This are just old pages that bad search engine surface. IMO if you focus the actual lessons here are:

- developers are lazy, you need to fix that, there is no magic language that solves the issue though some fanboys will say that their favorite language is more idiot friendly.

- search engine and soon the AIs are stupid, let's try to encourage books or other quality materials. Recently I found a collegue that did not know that in JS the "addEventListener" function exists and you can use it to add more then 1 listener at a time, this person probably can put on his CV 5+ yearts JS experience and a few frameworks. Mayb e if we stop focusing on the "my language is cooler" we could find the actual problems.

Back in my starting years I was reading books to learn, there when you get to the SQL chapter you were explained all about SQL injection and related bugs and how to use prepared statements. With PHP you can block this dangerous functions (PHP is flexible and many stuff like "exec" is blocked in most hosting places, but the problem with newb developers remain, if we can agree about this real problem(IMO_ we can maybe address it. Sure, downranking bad tutorials would be a part of the solution, also Google should probably stop their shit where they put the solution directly on the search page and not forces you to actualy visit SO and see comments, limitations and alternatives, shame Google, you make software industry worse with your greed...

The best way to insert supply chain exploits is to embed them in a stack exchange answer to a beginner's question.

None of these answers seem to come from SE so this might be harder than you might assume.

It's not. Note that these threads are a few years old, but in recent research (stay tuned!) it has if anything gotten worse.



It may have but this research doesn't show that. It appears to be in questions, rather than answers, when I can find a page that hasn't been removed. Beside most of these not actually being there, the google result is for answers which additionally have been ranked by Google. The thing you link is not meaningfully comparable to the Google result, nor is it representative of 'if you search for something on SE, how often does it tell you to put SQLI in your code'.

In 1980s, where did one go to find code to copy&paste?

copy & paste ... ha I wish. We had to type it bit by bit from the back of a magazine back in my day. Do you remember this? [0]

[0] https://arstechnica.com/staff/2018/11/first-encounter-comput...

This is exactly how I got into coding. A friend had the computer, and the 2 of us would hunt&peck the code in. We were maybe 12 years old. The DATA lines were the tricky spots. Everything else used words we could understand, and it just made sense after doing it enough. The DATA was just jibberish. We found it easiest if one typed while the other called out the data. Faster and fewer mistakes.

Same. I used to buy Amiga Format magazine for the game reviews then found the code for creating my own game was way more fun

Back in the late 80s and early 90s many people learned to code from a book.

You would just study the code samples found in those books.

Or the existing code on your machine or local documentation. I learned quite a bit of QBASIC from reading nibbles.bas and gorillas.bas and the editor's own help was extensive.

Dr. Dobb's Journal of Computer Calisthenics & Orthodontia

Running Light without Overbyte

You would use a reference source, which may be paper documentation provided by the tool vendor (e.g. API documentation with usage samples), online documentation provided by the tool vendor, sample source code provided by the tool vendor (including illustrative sample applications for reference was quite common), or third party reference books. Since documentation was much more necessary than nowadays, it was more thorough and generally of a higher quality than now, though of course not always perfect.

Mailing lists and USENET mostly, with FTP listing of open source code here and there, followed by transcribing from printed material

By using the brain as clipboard between book, eyes and keyboard.

I freelance fixing and maintaining legacy web apps, almost always PHP.

Anecdotally I see SQL injection vulnerabilities in about half the code I look at. It’s one type of problem among many other problems and vulnerabilities in code written by amateurs and often copy/pasted.

PHP programmers can find lots of resources online. Some of those are terrible, either very old or written by amateurs excited to show how they got something to work.

I have seen the same kind of thing with Java and Python, but the popularity of PHP means there’s a lot of junk info and examples online.

PHP has supported safe SQL and safe HTML for decades, but the programmer has to understand the problem and the solution.

I run a small IT company and the Windows sysadmin stock answer to nearly all problems:

  C:\Windows\System32> sfc /scannow
... followed by "reinstall your operating system". OK so no harm done apart from rather a lot of downtime, assuming you can put it back together again. The number of times I see "disable your AV" still is frightening.

I have a browser plugin that I discovered thanks to this parish called uBlacklist which you can use to try and clean up your search results by banning known bad sites from your results. social.microsoft.whatever was first ... 8)

I also note an awful lot of Linux related link farms and "blogs" with ads and cloned content from other sources have surfaced over the last few years. WordPress is another quagmire. I could go on but basically, search is very close to completely screwed (but not quite.)

Disable your AV is a perfectly cromulent suggestion. It is a root kit that operates at the lowest level of your operating system, and any issue with it and it will affect every layer above it.

Now, if disabling works you should set reasonable exclusions and enable the product again.

Disabling your AV is never a good starter for 10 and is often proffered as the canonical fix for a problem. I shudder to think how many people have been debagged and radished (I'll take your cromulent and raise you really odd) as a result of following "sage" advice.

I read the logs and set exclusions until the damn thing works. I have briefly disabled the whole AV/firewall/browser plugin thing sometimes to double check but that is quite rare. When I smile my teeth make a "ping" sound and briefly flash white.

So I install a rootkit… to save me from rootkits?

Yes you do. You install something with all privileges on your system that claims to keep the baddies out.

Hopefully you choose wisely on what to install on your system. Hopefully you even know what is "wise" to install on your system.

If you find out what is wise to use on your system, please let us know.

Microsoft Defender is pretty legit

Now Now...

The Sysadmin stock answer to nearly all problems is

    shutdown /r /t 0
If that fails, then

    sfc /scannow

Blog spam is the bane of the n00b programmer. Even if you’re not totally new and are merely picking up a new language, it turns tutorial hell into eternal Hell.

Raspberry Pi tips is another quagmire of replicated garbage.

I've been mentoring a couple of junior programmers for a couple of years and I have seen the kind of junk tutorials and online misinformation they find. Some of it is useful because it shows so many bad ideas and implementations -- like studying a plane crash to find out what went wrong.

I wrote an article about this back in 2007, regarding Javascript examples in an O'Reilly book -- a source I used to recommend because of the quality of their writing and editing (I no longer have that opinion).


It's not just an issue with blogspam, plenty of "bookspam" as well. A few years ago a friend of my then-girlfriend was learning C for some project. The book she had was so badly written even I could hardly follow it, and I can already program in C. It's no surprise this part of her project failed.

I started programming on MSX-BASIC (kind of like C64), and when we finally got a PC in 2000 or so I got a book titled "Learn C++ in 10 minutes". It was so bad hat I was turned off from programming for a few years, as I thought I just didn't have what it takes (it also didn't help that the tooling and "getting started" was much harder back then, especially on Windows; if I had known you could just download e.g. Python instead of mucking about with this pirated Visual Studio I probably would have had an easier time – but I didn't know about that. It wasn't until I started playing with FreeBSD a few years later that I got back in to programming).

also csharp, python, JavaScript

Yeah, but what are your other options?

* procmon, if you're lucky you'll catch WTF is going on somewhere deep in the registry

* Hoping Microsoft still has the answer in a KB article somewhere (hope you didn't need any Server 2008/2012 stuff that was on UserVoice, it's gone now)

* WinDBG if you're that good

Which brings us back to cargo culted answers like sfc /scannow

I wouldn't compeltely discount social.microsoft, very very occasionally it's had a tiny tidbit of information in between the people incorrecting each other.

I used to fix Windows computers for a living; this was 10 years ago and I don't really know what changed in Windows 8/10 as I never used it, but I imagine it's roughly similar to XP and 7.

With some knowledge and experience it's possible to fix a lot of problems. Actually, a lot of problems people chuck up to "Micro$ucks bad" are just hardware problems. If someone comes in with "I get random BSODs" then there's a good chance it's just faulty a faulty RAM module, disk, or something like that. The first step for random issues should always be to run memtest and a disk check tool (I don't recall the name of the tool I used for that, but there are some subtleties involved in testing this well, and I don't know the status of SSDs as this was kind of before they became common). Checking hardware is easy, checking software isn't.

Software problems can be a bit trickier to solve, depending on what the issue is. They're very hard to debug remotely over the internet: but there's a lot more you can do than "sfc /scannow" if you're sitting in front of the computer.

You really don't need WinDBG in most cases.

Most problems can be solved if you want to put massive amounts of time in it. The issue you state, without realizing you've stated it, is that 'knowledge and experience' is in demand and expensive. So I have the option of investing a few hundred bucks of someone's time into fixing the issue, or running 'sfc /scannow' and surprisingly often fixing the problem.

It's not that time-consuming if you know a little bit what you're doing, you can go a long way with just 30 minutes; and this saves time/money too as 1) 1) hardware problems are correctly identified and fixed instead of lingering for ages (and reducing productivity), and 2) no need to reïnstall everything, which is time-consuming as well.

> PHP has supported [...] safe HTML for decades, but the programmer has to understand the problem and the solution.

That's not good enough for a language advertising as "Hypertext Preprocessor" though. PHP's distinguishing feature is that's kicked off from SGMLish processing instructions in otherwise static HTML, and it has all context available for perfect injection-free HTML-aware templating. Eg escaping quotes when it's outputting into attributes, escaping "]]>" when outputting into CDATA sections, or with the help of a real markup processor, suppressing/escaping <script> elements or onclick or other event handler attributes where advised through a grammar such as an SGML DTD. But it doesn't because it's just such a hack job of a language, by the developer's own admission.

Further evidence of this is the fact that `<?= $foo ?>`, and the long-form `<?php echo $foo ?>`, don’t offer a way to easily HTML-encode the output; instead you have to use `htmlentities()`. Whereas ASP.NET has had `<%: foo %>` to encode output for almost 15 years now, and Razor defaults to encoding: they make it harder to render unrecoded output.

Actually only 12 years since that syntax came out with ASP.NET version 4.0. ASP.NET went a long time without it (and classic ASP before that).

And, like with razor, you can use plenty of libraries with PHP that will encode by default.

Razor is stock though, but there is still no in-box way using PHP's own syntax to auto-encode output.

Comparing a language to a framework is a bit wonky isn't it? Laravel (PHP framework) for instance has {{ $foo }}

PHP started life as a template engine for CGI applications written in C. And some pretty major projects like WordPress use PHP as a "template language".

There are various constructs in the language rarely seen in PHP code that make this easier, such as <?=, but also also "if (..):" which can be ended with "endif", and "foreach (...):" which can be ended with "endforeach".

It's not a hard feature to add. PHP devs want to move away from this "PHP as a template language" (I think they tried to remove the <?= a few years back); that's all fine, but fact of the matter is that people ARE using it as a template language and will continue to do so in the foreseeable future. Not supporting that with something as simple as "automatic escape special HTML characters" is extremely disappointing, and would actually prevent a lot of problems.

"I think they tried to remove the <?= a few years back)"

Not quite. They deprecated `<?` but not `<?=`, see here: https://wiki.php.net/rfc/deprecate_php_short_tags

Another PHP-RFC removed `<%`, `<%=`, and `<script language="php">` (I'll admit I didn't know about that one): https://wiki.php.net/rfc/remove_alternative_php_tags - but as with before, this specifically retained `<?=`.

Ah yes, it was just the "<?" tag and not "<?="; I misremembered.

The ASP tags were always a bit of a misfeature; I don't think I've ever seen it used even once. <script language="php"> is just weird because it's intended for client-side scripts :-/

PHP is simultaneously a framework and a language, though. It features a very simple framework, though, and has been supplanted by others, including those that resort to reimplementing their own templating system, which defeats the point of using PHP in the first place as that was its main goal: to be a templating system for Personal Home Pages.

I won’t disagree that PHP has its flaws. A lot of them are legacy problems to support old code. Every language and tool with a big installed base has this problem — just look at the legacy crap in Windows.

It’s fairly easy to write clean and safe PHP. Any number of libraries and frameworks exist to do safe SQL queries and escape HTML. The problem is a lot of programmers don’t even know the vulnerability, not that it’s hard to fix.

I could bitch and moan about PHP or make a good living fixing bad code. Complaining won’t make that legacy code better or magically rewrite it.

Fair enough, but in the case of PHP, the target that XSS attacks are after are not the PHP sites themselves most of the time, but weaponizing those for c&c attacks on third-party sites. Thus merely using PHP, with its well-known combination of copy/paste culture, popularity among newbs, and poor security practices opens site owners up to liability claims (if nothing else, such as gross negligence with PII). And PHP's defense is weak, with not even an attempt to bring its built-in web templating into something that could remotely be called state of the art, considering that eg SGML is 35 years old.

Not saying this to diss PHPers; in fact, I like the PHP community for their get stuff done mentality, and I think they deserve better. If I were contracting for PHP, though, I'd make sure to negotiate strong liability disclaimers.

Are you aware of an actual case of an owner of a web site getting sued because their site was used to attack other sites, without their knowledge? This could (and has) happened with other tools besides PHP. Is Microsoft liable because Windows is used as a launchpad for attacks?

IANAL, much less a judge, but I think there's a plausible legal theory for suing isn't there?

Same here. PHP is also often picked by beginners (including me 15 years ago) and you can see that. However, I have a lot of fun fixing these kind of issues and improving the code. It feels like archology/restauration sometimes and it makes me happy to keep them running securely. Also, it pays really well usually.

From my anecdotal data, a whole lot of tutorials are written by 'learn Java in 21 day' stage developers. People are excited, want to put their name out there and start churning out tutorials on concepts only yesterday they had no clue about. Similar situation with many online courses too.

> PHP has supported safe SQL and safe HTML for decades, but the programmer has to understand the problem and the solution.

The ecosystem has a ton of exposed wires in builtins and libraries.

When the function's name is literally `mysql_real_escape_string` ... what does that tell you?

While you're not wrong, per se, this has a bit of that "never give PHP credit for getting better when it's still possible to do bad things with it" vibe to it. I mean, it's fairly well-established PHP did some pretty boneheaded things in its history and one could argue they didn't get serious about cleaning those up until rather late in PHP 5's life cycle. (Some would say not until PHP 7.)

In the case of your example, what it tells me is that they had a "mysql_escape_string()" that they needed to remove but had to deprecate first to avoid breaking existing code, however bad it might be, and so replaced it with "mysql_real_escape_string()" -- which itself hasn't been in PHP for over 5 years, since that whole MySQL driver was deprecated. There's still a "mysqli_real_escape_string()", but that name is likely a quirk of history, as there's no matching "mysqli_escape_string()" for people who would like to use the supported driver but continue screwing up the charset.

(Edit: another comment reminded me of something that I knew once but had forgotten. The MySQL C API has the "escape_string" and "real_escape_string" functions in it which do precisely the same things the old PHP functions did. So this actually tells us even less about PHP the language, although it may tell us something more about MySQL.)

> "never give PHP credit for getting better when it's still possible to do bad things with it" vibe to it

I was trying to go for the "PHP has tons and tons of terrible shoddy baggage" vibe.


What language anyone uses that's older than a couple of years doesn't have terrible shoddy baggage in at least someone's opinion? I have the same opinion about node.js/npm, and Java. My opinion doesn't make anyone stop using those languages.

Stroustrup quipped "There are only two kinds of languages: the ones people complain about and the ones nobody uses." PHP is the first kind. Like every language and tool before it that came with a low barrier to entry it led to a proliferation of bad code. My friends who work in ML/data science make the same complaints about Python -- it's easy to get something to work but the code quality -- ugh. And in a few years lots of that code will face the "upgrade and break it or keep it and cross our fingers" point that so much legacy PHP is at already.

That's true. PHP just started so, so, so far down, it's had more to overcome than most.

Not sure if you're just trolling the low-hanging fruit or not but I'll assume not.

When PHP came out in 1997 the other available products for putting web sites together, at least for smaller organizations, were:

- ASP (classic, not .Net)

- ColdFusion

- Perl

The first two were proprietary packages that required a license for the software and a license for the operating system (Windows). I got into PHP when a customer wanted to migrate away from Windows/ASP because of licensing fees -- they took the leap with open source, which was a big gamble at the time. The CTO had read "The Cathedral and the Bazaar" and swallowed the kool-aid. We still had to use SQL Server though, that company was committed to it across all of their applications, so I got to use PHP + ODBC for a while. Fun.

Perl had a fairly big base of CGI scripts but in most respects seemed worse than ASP, CF, PHP because Perl had a steep barrier to entry. PHP was an easy choice for shops looking to get off of ASP -- which Microsoft was making noises about discontinuing -- and ColdFusion, which several of my customers back then used, but complained about the cost (Adobe now owns CF).

So it was PHP. Then along came WordPress and the PHP world exploded. As you point out the language has had a hard time keeping up with the demands placed on it (Rasmus certainly didn't imagine Facebook-scale sites back then), and the evolving security threats (lots of web sites were purely internal back then, not exposed on the public internet, and the script kiddie hackers were still in nursery school in Kiev). Hosting providers sprung up to offer turn-key PHP/MySQL hosting, with the proviso that the site owner and developers did not control the PHP configuration.

Since 1997 a lot has changed and it's easy to point to problems in PHP and say "That could have been done a lot better." And that's true, but no one had that crystal ball back in the mid-90s. The push was to get something on the web. Planning for future maintainability has never been an aspect of software development we can boast about and the PHP code out there today is no different, there's just a lot of it.

For my part I push my customers to upgrade to the latest version and to do a security analysis and vulnerability test so we can find and fix the most egregious problems. Even this level of upgrading can get expensive and risky. I wish no one was still running PHP 5.4 in production in 2021 but wishing won't change that it's still fairly common, and companies using that code are only going to call someone like me after they've had a serious problem.

Nah, writing CGIs in C, that was my first handling of FORM submits.

> PHP just started so, so, so far down, it's had more to overcome than most.

This is probably fair. :) I think PHP tried to combine Python's "batteries included" approach with Perl's "more than one way to do it" style, but did it in a pretty disorganized way that created lots of Catch-22 issues later -- when you get that popular, it makes backward-incompatible changes fraught with peril, even if you're addressing obviously craptacular past mistakes.

I think PHP has become pretty solid in version 7+ on, although my feelings about using it remain mixed. I've joked in the past that it's stopped being a cargo cult version of Perl and is now a cargo cult version of Java.

> The ecosystem has a ton of exposed wires in builtins and libraries.

PHP is a light wrapper around C libraries.

> When the function's name is literally `mysql_real_escape_string` ... what does that tell you?

That it comes from the MySQL directly:


That doesn't surprise me.

MySQL is the PHP of databases.

Exactly. Free, well-supported, useful, widely-deployed, used by lots of developers.

It makes me wonder if there’s a mysql_fake_escape_string or mysql_doesnt_actually_escape_string function. And why those functions would even exist in a language.

It exists because "mysql_escape_string() does not take a connection argument and does not respect the current charset setting."

I thought the security of escaping is dependent on not having mismatched charsets? In which case, not respecting charset settings seems potentially not actually escaping?

Seems like a strange function to have, although I could be foggy on my charsets.

It shows the culture in PHP. They would rather keep a function around that doesn't work properly just so existing code still works instead of making everyone test that the new function works.

It shows a trade-off between arbitrarily breaking code in production or not. Lots of PHP sites are hosted on services that don’t give the programmer control over the PHP version. If the hosting provider upgrades and breaks a bunch of sites that’s a problem every bit as serious (to the site’s owner) as unescaped HTML opening up XSS attacks.

This is not sustainable or a desirable thing to keep. By accepting this state of things, no security fix with breaking changes can ever be implemented.

With all respect your comment is both arrogant and unrealistic. Exactly how do we not accept this state of things? No one claims it's desirable, it just is: bad code is out there, and it's not easy to fix.

What would you tell a small business that relies on clunky 10-year-old code to run their business? To rewrite it in a more modern language at huge expense and risk (given that a majority of rewrite projects fail)? Can you guarantee the new thing won't be just as obsolete and vulnerable and buggy in ten years?

These kinds of problems -- poorly-written and vulnerable code, amateur programmers, lack of professionalism, maintaining back-compatibility with an installed base -- are not specific to PHP. They afflict the entire software industry, and always have. Who could have seen into the future back in 2000 (when I first got exposed to PHP) that a new site would get probed by an army of bots within five minutes of going live? Or that it would be even harder today than back then to find and hire experienced programmers?

PHP has had many security fixes implemented since the early releases, but how can anyone force users of an open-source language to upgrade for their own good? Or pay someone to ferret out and fix vulnerabilities they have never got hit by?

Even brand new code has this problem. Look at all of the cryptocurrency code written in the last few years. We read about hacks and thefts and vulnerabilities every day, and that was written by supposedly smart people with access to modern languages and with knowledge of the contemporary security issues. And it still gets hacked. If we knew how to write perfect code that would still be perfect into the future I'm sure we would do that but until then we'll have to live with what we have. So far it has been sustainable, just less than optimal, if by optimal we mean what we can imagine rather than what we, as programmers, actually deliver.

Yes, yes, the poor companies. But do you ever consider the poor customers/users that put private information in the companies' databases? Or that the price they pay assumes the companies do not let their software rot for 10 years?

Then there's the typical logical fallacy of taking a trivial problem of escaping SQL and conflating it with something more complicated, and comparing to eternal perfection.. yawn.

>companies do not let their software rot for 10 years?

I think that one of the big mistakes made in the last 20 years is that every company needs its own custom software and that software is like an asset that you buy once and not a constant cost source.

The vast majority of businesses have no need for custom software and should be using 3rd party services. Then those 3rd parties have the income to dedicate to keeping the software secure.

Its honestly terrible how many local businesses have their own complex software built on some ancient version of a frame work which is sitting on an ancient server box in their office. Its a ticking time bomb no one wants to think about. Prolonging the explosion is not the solution.

Agreed. I often tell customers to use an off-the-shelf solution and get on with their real business. Custom software development is expensive, risky, and incurs long-term maintenance costs. I outright refuse to take on custom e-commerce sites or accounting or CRM systems at this point.

About half the time the customer will find someone else who will happily bid on writing custom code despite my suggestion. That’s one reason the legacy code problem just gets bigger every year, and a lot of it shouldn’t have been written in the first place.

Look at the major data breaches over the past decade -- TransUnion, Experian, multiple US government sites, etc. and point to one that was caused by a PHP SQL injection attack. This kind of thing can happen to anything accessible on the public internet.

Do you know how old the software your bank uses is? Pretty much every government agency and utility you rely on? What price do you pay for that? A lot of that code has been rotting longer than any PHP web site.

There's no logical fallacy. I wrote multiple times that escaping SQL is essentially trivial in PHP, and has been not only easy but the recommended best practice. The problem is lots of inexperienced programmers don't know the problem to begin with. They would write vulnerable code in any language. I had to work on a Rails site a few years ago that was vulnerable to XSS and SQL injection, even though Rails by default protects against those things. Someone had gone around all of that because they didn't understand the problem in the first place. I don't know that any language can protect us from that.

Again, frantic hand-waving and pointing fingers, filled to the brim with logical fallacies. I can only imagine what kind of work culture exists in your company that you keep repeating the same tired, generic excuses that I've heard thousands of times before, thinking that they're not fallacies.

The fact that you and many others in this industry think these arguments are in any way rational or defensible puts our industry to shame.

I freelance supporting legacy software. I wrote that already. There’s no culture in my company, just me.

There’s a difference between an explanation and an excuse, and between counterexamples and “hand waving.” I’m sure it makes you feel superior to dismiss opinions and comments with vague references to logical fallacies or indefensible arguments, but just hauling out big words doesn’t make you right, or even make any sense.

I can’t fix everything wrong with software development. I’ve been doing it for 40 years and we just keep making the same mistakes. My small contribution is fixing broken code one customer at a time, at least leaving the campsite cleaner than I found it. I don’t lose a lot of sleep over our collective failure to write perfect software.

Of course this is a lot of work but it means that unless PHP takes security seriously, no one will take PHP seriously and the language will die off / be relegated to dirt cheap contractor work.

No serious org is going to use a product where you have to remember that the sql escape function doesn't work and you have to use the one that says real sql escape.

This is a canard, really. The PDO library, which is a core PHP module, has SQL injection mitigation built-in (with escaped parameter substitution). It was introduced with PHP 5 in 2004. The popular PHP frameworks such as Laravel and CodeIgniter also protect against SQL injection and XSS by default.

The MySQL escape functions are named the way they are because that's what they are called in the MySQL API, which PHP exposes pretty much verbatim. I don't see a lot of people using that interface in new PHP code (because Laravel and PDO), but it comes up on older code.

Again the problem is not obscure function names or that PHP makes it possible to shoot yourself in the foot. The problem is a whole lot of inexperienced programmers (and quite a few who should know better) not understanding the problem in the first place. If you don't know what SQL injection is or how it happens or how code can make it possible you aren't going to know how to protect against it. PHP does do it for you if you use PDO (more than 15 years old at this point), or any of the numerous other safe RDBMS libraries. This is like complaining that Honda makes shitty cars because some people put glass packs and spoilers on a Civic -- people use languages and tools wrong out of ignorance and inexperience.

I think it's clear that PHP has been taken seriously for some time, even if largely because of WordPress. It's not going to die off or get relegated to the language ghetto because it has some (obvious, well-known) flaws that serious programmers have known how to live with for literally decades. Regardless of what you think or see on Upwork, PHP contractors are not cheap. No one who can and will work on legacy code is cheap because most programmers won't even do that work if they can help it. Supporting legacy software, which includes improving and securing and upgrading it, is maybe the most lucrative and secure niche for programmers sitting there in plain sight.

That command doesn't even exist anymore and hasn't since 2013

That's not completely true. Over its evolution, PHP has removed some functions completely to provide better and more secure functionality, such as mysql_* in PHP 7.

As another commenter points out, this is actually a quirk of the underlying MySQL C library, which has (or had) both functions.

PHP (and old languages in general) are full of ASCII-only English-centric assumptions. I think both functions are now considered deprecated since we have even more variations like mysqli_real_escape_string (or just use PDO with bound params).

At this point there's a ton of CI tools to check for injection and dangerous patterns, and serious companies have been using them for years/decades now, ranging from local options to online tools like Scrutinizer or Sonarqube. I'd wager even PHPCs would catch the copy/pasted ones.

To me the language or online examples is no excuse for SQL injections for a long time now.

It’s no excuse for professional developers working at “serious” companies. It is an excuse for the legions of amateur developers just trying to get something to work.

If Google and Amazon can’t find and hire enough developers imagine what that supply/demand and cost problem means for small companies. I have clients who have been trying to hire a f/t or p/t programmer for years. They can’t pay $100/hr for a simple web site. So they hire amateurs trying to get that experience needed to get a real job at a serious company.

Yes, they leave a trail of crappy code full of vulnerabilities and bugs. The only way to blame that on PHP is to criticize its low bar to entry, which is a good thing for beginners.

> Some of those are terrible, either very old or written by amateurs excited to show how they got something to work.

And the current bootcamp trend will only amplify that. Lambda has "instructors" that are students only 4 months ahead of the students they are teaching...

Some time ago I found this video series on YouTube about how to build a PHP application from scratch. Ten hours worth of XSS, CSRF, SQL injection, badly coded authentication, you name it. When confronted, the instructor said that he did not want to confuse the beginners with all of that security stuff. I just thought "ok" and moved on.

Now I went back to this guy's YouTube channel and saw that half a year later he finally did upload a bonus episode on how to mitigate SQL injections. One person in the comment section actually thanked him for the much needed video because their site was getting hacked. It is pretty hilarious to see this unfold but I do feel bad for the ~10k people who watched his videos.

XSS and SQL injection should be impossible by design, by using proper libraries which treat HTML/SQL as structured formats, and use this structure to properly embed text as text, rather than allowing user input to be interpreted as surrounding HTML/SQL constructs. (I suppose parameterized queries are "good enough" even though they treat SQL as a string rather than an AST, because the SQL engine hopefully interpolates the text/numbers/etc. after parsing the query into an AST.)

This gets at the heart of why the problem is so widespread. Beginners are the last people you can expect to figure out how to install and use the “right” library. Watching someone learn programming (from actual nothing) is very insightful - they often don’t stop and think because they have no intuition. Rather they flail in the dark until they land on something that almost works and use that as the kernel of their solution. PHP used to have very poor flail-performance and you still see it in things that are trickier to index and refute like videos.

Maybe the solution here is to just have some kind of legal penalty to losing user data due to incompetence. The problem here is that self taught programmers are going out to the real world and writing code that gets used to process sensitive info without any senior developer guiding them or reviewing.

If there was a penalty to the business, they would stop getting the bottom of the barrel programmer to work on their own. Yes it would make it a little harder to enter the market but any large business could still hire juniors and review their code properly.

In most other industries, you are responsible for your work. Usually you even need a formal certification first.

Conceptually how is this different than someone building a staircase it their house with tools, lumber, and no interest in accessibility and building codes?

This analogy still works. The staircase is not public, its in your house. Which would map to running on your local computer or local network.

As soon as you turn your house in to a public venue (put your code in use for the public) you now have to worry about accessibility and safety. If that stair case collapses because of your dodgy building, you are liable. But you are free to fall off your own staircase in your own house.

So people are free to run whatever they want on their computer. But once you start taking user data, you now have legal responsibility. User data is hazardous waste that needs ultimate care.

> In most other industries, you are responsible for your work. Usually you even need a formal certification first.

That would go against the "Everyone can Code" trend and be perceived as gatekeeping.

It's handled by the GDPR. Companies are forced to report a leak to the authorities and the max. penalties are very high.

I still feel bad about the image of people towards PHP, but given the amount of bad tutorials and packages that are outdated out there I understand that most enterprise / startup wants to avoid having to code in PHP again.

For the past couple of years I have been working with laravel in a small company, and I really enjoyed it. The environment that it provides honestly is amazing. Documentation is super easy to read, laracast is amazing to bootstrap your knowledge in couple of weeks, and community is huge that you can find almost anything already built by them.

However its hard to find any big companies here that uses PHP, jobs popping up is mostly python, java and c#, thus sadly I have to leave php and learn java / python for the new big tech job (also for my own future). Its not that java / python community is bad, but I'll surely miss the laravel ecosystem.

I’m a tech lead for Vanilla Forums (recently acquired by Higher Logic). We have a few PHP positions open (remote or onsite). The core product is open source https://github.com/vanilla/vanilla If your looking to stick with PHP you can check the open positions or reach out to my contact info in my profile. https://www.higherlogic.com/about/company/jobs/

> However its hard to find any big companies here that uses PHP

There's one pretty big one....Faceledgar? Peoplebook? Something like that.

Dont rhey have their ownn PHP Frankenstein?

Wordpress. I would google it. It is pretty big. Runs %60 of the internet. (made that part up, but I bet I am close).

It's literally called hack. Quite self-aware ;)

Though, recently PHP got quite a few features similar to hack, like type annotations.

Sure - fork something, call it something else, pretend it is new. There are a ton of dollar signs in the FB source code.

It's nothing like vanilla PHP.

Yeah, 1, and not many jobs opening here in AU lol

If you are going from php to python just remove the dollar signs $ and update how you treat strings. Those languages are cousins. Honestly, within 2 weeks you wont look back. However, Python2 vs Python3 makes php look like a bunch of geniuses and you will miss php.net. The php8 team is really awesome and the language is pushing forward. But if you need to be 'hip' yeah learn python a 30 year old language from 1991 :)

I don't blame google for this. I blame crappy tutorials that either gloss over important details or are written by people who don't know them in the first place. If Google could develop a search algorithm that selected results for quality code, they'd have an entirely separate product they could sell, perhaps as part of an "AI Cloud" assisted programming environment.

I'm pretty sure all these crappy tutorials only appear in the top results because of shady SEO tricks. Do you really think these are "authority" sites which Google claims to rank highest? It's a sad fact that Google is still unable to stop black-hat SEO, so they are at least partly to blame.

Programming in PHP using GitHub's Copilot must be exciting :)

In a few years we might regard it the same way as MS Office's Clippy:

"It looks like you're trying to create a web app front end! Do you need assistance with A) Implementing a dark pattern or B) Avoiding the use of dark patterns?"

It makes one wonder: was copilot development started before or after MS's acquisition?

so true

I feel that it actually understand what I'm doing. Is very wicked how sometimes copilot nails a line or an entire block totally in my context and writing style.

Scary stuff.

This already was a problem ten, fifteen years ago, and I see it hasn't changed. IIRC I wrote a blog post about it back when (I get spurts of inspiration, then some time later I delete the whole thing because I'm embarrassed about stuff I write, including HN comments).

There was a big "grassroots" push some years ago about pusing W3Schools docs out of the top Google results in favor of MDN; the same should be done with bad PHP code / examples. Because in practice, 90% of code is copy / pasted and adjusted.

There's just no big player behind PHP though, a party that wants to professionalize the language and more importantly its community. If there were, they would push for more authoritative tutorials and documentation. As it stands, the PHP docs are fine but are lacking information about SQL injection, and it's 10+ year old comments to that documentation that is often more valuable than the docs itself.

PHP is still one of the top languages out there but it has so much more potential.

The clean code may still have other issues. For example [1] give away if a certain email address is registered with the site or not.

[1] https://phppot.com/php/user-registration-in-php-with-login-f...

IMO trying to stop user enumeration is a lost cause. People with many email addresses and no password manager still need a way to figure out which email/username they used on your site. If they can't enumerate themselves, you end up with bad retention or lots of support tickets.

Protecting PII is more important than your user retention.

Too bad, roughly 0% of websites attempt to stop user enumeration. The ones that try usually fail because they don't do it consistently, but hey the ~0% of their user base that is composed of HNers appreciate the effort. Also, Santa Claus isn't real. When it rains, it pours.

Everyone on HN is obsessed with that "vulnerability" without considering that: a) almost all sites that try to avoid it fail anyway when you try to sign up - "that email is already registered" and b) it's really user-hostile for a tiny increase in privacy so it's perfectly reasonable to not implement it on most sites.

If I did want to close that hole, is there anything more clever than just having a username that isn't an email address, and allow more than one account to use the same email address?

Just return the usual "please confirm registration by clicking a link in the email" - in the e-mail you just notify the email owner that someone tried to create a new account while another account was already registered for this e-mail. This does not disclose anything to anyone other than the owner of the e-mail.

I don't think people in general want to remember a username for every site.

This. Risk assess whether user enumeration is actually bad, the go fix it if you need to.

I work on a major mobile application. We have rigorous interviews and pay well. We use a much trendier stack than PHP. Still, the number of my colleagues who can fathom that a request to our public API might come from elsewhere than our first-party client is… maybe one in a hundred.

Don’t think this is just a PHP problem. All across the industry, people think of the OWASP Top 10 as some hyper-nerd shit that they don’t have to care about, and are indignant that you’d even mention it in design review.

Should tutorials be expected to teach people sufficient security? Should code snippets be production-ready to put on the Internet for the general public? I tend to think not. Is this just me?

So here's a partial list of issues you'd need to deal with:

- sanitizing input

- Escaping output

- SQL injection

- HTML injection




- Clickjacking

- DDoS and other resource exhaustion attacks

- Various timing attacks (eg password hashing)

- How to store passwords

- Depending on language, buffer overflows

That's... a lot. You can take this even further: you should assume you're going to get compromised at some point. What are you going to do to detect a breach? Or an active attempt to find a breach? What's your strategy for handling a breach?

Here's an analogy: we can tell you how to treat Poison Ivy without having to add a disclaimer that you're not qualified to be an attending dermatologist.

You're making the perfect into the enemy of the good.

Should every tutorial be an entire 200-page course on all web security practices? Of course not.

But should every tutorial that inserts a user-provided value into a SQL statement ensure it's escaped? Of course. There's essentially literally never a situation where you shouldn't do that. It's not just to be "production-ready" -- it's so basic as just to make sure the query will still even parse if the user includes a quote character in their input.

So why on earth are you defending this mistake?

I'd go further and say it's easier to teach prepared queries than string building (or at least equally difficult).

That might not be true for other security issues but I think tutorial writers should be fine saying "here is a bit of magic that fixes a problem called csrf. We won't cover that in this course but leave this in here"

>- sanitizing input

Oh, that unicorn that everybody takes about, yet nobody does anything with it

So, how do you sanitize input?

you save in database escaped strings?

allow only "english" letters?

>- HTML injection

>- XSS

those two belong to

>- Escaping output

don't they?

>- Various timing attacks (eg password hashing)

>- How to store passwords

just use state of art auth/login handling libs?

>- DDoS and other resource exhaustion attacks

I don't think I'd add it there, isn't it handled by firewall / infrastructure than app directly?

> I don't think I'd add it there, isn't it handled by firewall / infrastructure than app directly?

Although most DDoS attacks happen on the layers 3, 4, and 6 of the OSI model, your application still has to be hardened against resource exhaustion and other DDoS attacks.

For example, if you have a REST endpoint that starts a complex query which might return a large result given some specific query parameters (e.g. your limit parameter is not bound, so I can set limit=1000000), running 10000 requests against it from different hosts (malicious or not) may bring down your database server.

You're right, thanks for reminding

Things should be secure by default. With a basic query builder you can't ever have a SQL injection bug, not even by mistake, not even if someone maliciously tell you to do something wrong. This should be the default for tutorials.

The tutorials don't even need to talk about security: just tech secure by default APIs. If the programmer later needs more flexibility (which should be understood as an advanced topic), they ought to learn that you must sanitize user input in raw queries.

I agree with you. Back when I was a little kid learning to code on the internet by myself, tutorials that I could easily understand and quickly use to build things were important. Because without those, I would have probably given up.

Those stupidly bad PHP tutorials that show you how to concatenate a URL parameter to an SQL string were easy to understand, and taught me about everything from writing PHP/SQL, setting up a MySQL server, networking/opening ports, designing database tables, designing an application, and much more. The motivation I got from learning so much in so little time led me to keep learning new things, including learning and understanding why all that code I wrote when I was younger was so bad.

If those shitty tutorials get copy and pasted into production codebases, that's the company's fault for hiring a lazy/bad developer, and for not catching the vulnerability.

Yes, tutorials should be expected to teach people about security if it is within their scope. But the problem with the tutorials from the article is that the authors seem to be genuinely unaware of the security issues. That may also be the reason why they are trying to do the impossible, that is teaching web authentication to beginners in a 10 minute blog post format. And then they often get positive feedback because people like simple but wrong answers to difficult problems.

It shouldn't take anti pattern shortcuts that open up vulnerabilities, especially without adding a disclaimer.

25 years of plentiful SQL injections and XSS, and 25 years of posters rushing to PHP’s defense with variants of “you could do this in any language. All languages are potentially insecure.”

At a certain point it turns from tragedy to farce.

Amazon literally _banned_ php for a reason.

Don't get me wrong, PHP taught me a ton about programming and was a very important language, but yeah...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact