Apologies for posting something so out of date, it was a request on an ask HN and I goofed up. By means of compensation, please consider these instead:
Examples of such user input include GET variables, POST variables, cookies, email, web services (yes even your most trusted partner site can be hacked), etc. If you don't have control over it, sanitize it.
That means using the appropriate parameter binding functionality of your database / abstraction layer when inserting data. Anything that displays user input text also needs to be sanitized for preventing malicious client side code (XSS).
As noted in this article, I recommend having a global include file that all other files can utilize. This lets you define the absolute path to various directories that your application can utilize throughout the code base. If you ever need to switch locations, you only need to change the config variables and not your entire codebase.
How to work with databases is often a touchy subject. On one hand you have data abstraction which helps remove many of the deep down logic on how to interact with databases, saving the programmer's time. Indeed, if you switch to another database that is supported by your abstraction layer, it will be easier to migrate. Unfortunately, if you start writing SQL specific to that database, the "easy switch" advantage starts to become null and moot, as you'll have to rewrite much of your SQL anyways.
On the other hand you have those who write SQL code specific to the database at hand, making sure to utilize the best performance possible. One thing to also take into consideration is that with data abstraction, you will have to go through the logical branches that the abstraction provides every time you want to work with the database. If you need to do a very large amount of queries constantly, you may start to notice the side effects of this in terms of speed. It's best to take these issues into consideration when designing a database driven application.
With regards to localization, I recommend looking at PHP's gettext extension:
With this approach, you can provide links or a dropdown to set a session variable, and all strings can be localized to that session variable's value. Obviously you will have to use the gettext functions wherever translatable strings are available, but it's well worth it to make the ease of providing newly translated versions of your site a matter of just doing the translations with minor tweeks here and there.
I really don't like that word. Or "filter", "clean" or whatever people call it.
It implies that you can somehow run a variable through a filter and scrub of all the evilness. But that is not how to deal with the problem. What need be done, is to encode strings properly when embedding them in a context, that is interpreted in some other context. For example encode entities when inserting into html or urlencode when inserting into a url.
Bottom line is - This has nothing to do with user input. It has to do with how you output data.
If you're preventing SQL injection, you do need to run the variable through a filter (mysql_real_escape_string, parameter binding, etc.) in order to prevent a successful attack.
The other issue is that if I encode entities when inserting html, that means I have to call this on every page view. For simple strings that's not too big a deal, but if I have a lot of content, this gets resource intensive pretty fast for larger scale sites with lots of people hitting a page at once.
In this case I'd rather due the encoding once, which handles a majority of cases.
Encoding is dirt cheap. You probably pay more just for a TCP stack that isn't perfectly tuned for your workload.
And you can't pre-encode a string without baking in bad assumptions that you know how it may ever be used. You're going to regret having your data store polluted with HTML-specific encoding as soon as you build an API or integrate a third-party tool (e.g., analytics or sales support), because they will need either raw values or some different encoding than HTML text (SGML entities).
I'm sorry to be so negative, but I disagree with a lot of those points.
> short_open_tag: Always use the long PHP tags: <?php echo "hello world"; ?> Do not use the echo shortcut <?=.
It's just a minor point, but why are we advocating the longer version which is also more difficult to read? It's not like there are typically any other XML PIs in a PHP file, so there is really no potential for confusion here.
> Use Value Objects (VO)
It may be a widely used J2EE pattern, but that in itself is no justification for using it. I love VOs in JavaScript, they're handy and light-weight, they're the core of what's awesome about JSON. But PHP has perfectly good associative arrays that can do the same job, without all the declarative overhead of VOs. If we're not going to implement behavior inside these objects, really what is the point?
> Use Data Access Objects (DAO)
And once again with the J2EE worship. Why?
> Generate code
This has got to be the most unnecessary of all. Sure, if you're going to use lots of VOs and DAO, you'll probably need code generation, but that's just circular logic. But this is also thoroughly besides the point, and it's a performance nightmare. What happened to efficient, terse code? A CGI call in PHP is not made against a static class environment: all this stuff has to be initialized for every single request. At a fundamental level, this betrays a failure to understand the crucial differences between the PHP runtime and, say, a servlet container. It is essential that programmers make use of the individual strengths those systems have, instead of translating Java code and design patterns blindly into whatever environment they happen to work with.
That said, there are very many valid design points that benefit almost any code in any program language. But in the cases above, PHP best practices simply should not be derived from a wish to mimic Java (probably stemming from a misguided attempt to fake the appearance of sophistication and academic validity).
I noticed a somehow weird thing -- most of this guides mention "use namespaces".
As far as I remember, having no namespaces wasn't that much of a problem 5 years ago. They used static class methods and that actually looked like a good ol' double-colon syntax. With that kind of atrociousness php team had put into the language and labeled "namespaces", I'd still resort to doing static methods, if I was still doing php. Backslashes? Really?
Cool thing! PHP is really great if you want to get things done in a quick-and-easy way but it also tends do get a little messy! This article is also good for people (like me) using PHP in a pure functional way (=no classes) - You others may now boo :)
Note that while some of the practices hinted in the article are good, the article is over 5 years old and not all facts stated and practices preached in the article are valid today; PHP has gone through dramatical changes since 2005. Just to mention one thing that stood out: "In PHP there are no database-independent functions for database access apart from ODBC (which nobody uses on Linux)." - PHP has since long had 3 abstraction layers available apart from ODBC (they are PDO, dbx and DBA).
http://blog.macronimous.com/php-best-practices-and-worst-mis...
http://fwebde.com/php/best-practices/
http://www.devtheweb.net/blog/2010/03/02/common-php-best-pra...