Hacker News new | past | comments | ask | show | jobs | submit login
Making PHP Safer (box.com)
49 points by jromma on Jan 31, 2014 | hide | past | favorite | 52 comments



Type hinting for scalar types has been on and off the TODO list for PHP for a while now. PHP already has type hinting for arrays and objects, but not for scalar types like int, float, and string. It's been on the roadmap forever but nobody seems to know when the feature will actually land in a stable release.

That's because type hinting for built-in scalar types is more difficult than it sounds in a dynamic language like PHP. There's been a lot of debate about what to do, for example, when you pass '123' (string) to a function that requires an integer, or even worse, when you pass 123 (integer) to a function that requires a float. Do you throw an error, as strongly typed languages do, or do you quietly convert that integer to the equivalent float, as PHP has always done? Or do you try to convert first and only throw an error when you can't? Above all, how do you do this without giving up all the benefits of being a dynamic language in the first place? Note that this problem has nothing to do with the '01' == 1 issue. Things can get messy even with reasonably straightforward conversions like '123' = 123 = 123.0.

The problem is compounded by the fact that even built-in libraries don't know how to make use of built-in types. For example, if you use PDO to select a row from MySQL where the 'id' field is an integer, and you write something like ($row->id === 42), you get false because all the fields are returned as strings. Meanwhile, the PostgreSQL driver returns boolean values as 't' and 'f' (string), not true and false (bool). This is retarded. I'm a big fan of the improvements since PHP 5.3, but both the language and the standard library still have a long way to go.


First, you're right - the core library and API does have a long way to go. Imo, we need a new major version (6) so we can break backwards compatibility for some of this craziness.

But as for the annotations, the most compelling argument I've seen is to make it string (===). If you supply "123" and the method signature requires an int, it should throw an exception. If you don't want that, don't include the typehint.

That also gives the option for something like a "~number" typehint if necessary, but I don't care about that. I want strict type checking, anything less than that does not solve the core problem.


I disagree. Strict type hinting would be ridiculous in PHP and lead to a lot of unnecessary and unsafe casting.

nikic goes through all the scalar type hint options here:

http://nikic.github.io/2012/03/06/Scalar-type-hinting-is-har...

Strict weak type hinting seems like the most realistic and pragmatic choice except I would add a cast to make sure the type is actually an `int` in the function.


I've read that article, and like most of the commentators I'd like the option for strict type hinting. It's a choice I make, it's not forced on me. Having strong type hints removes any doubt from the code. I know exactly what's being passed around and what it represents, and I can be comfortable in knowing what that means. I don't need the engine to do anything magic[1] in the background.

While the idea of "strong weak type hinting" is interesting, and would certainly be useful for some code (especially libraries), it's not what I want from type hinting. At least 70% of the bugs I encounter in the wild are due to unexpected types. SWTH doesn't fix that - if I pass in a string and it looks like a number, the function will blindly accept it. That's not a behavior I want, that's something I want caught before I even think about deploying.

I guess it depends on what you expect type hinting to provide you, but it looks like I'm not alone in wanting guaranteed safety (Facebook and Box seem to agree). Only strict typehinting fixes that.

But, if the community decides that SWTH is worth pursuing I'd like to choose a syntax structure that allows both to coexist. "foo(int $int)" can be strongly-typed, while "foo(~int $int)" can be strongly weakly typed.

[1]: http://www.php.net/manual/en/security.magicquotes.php

Edit: forgot link


Weak typing isn't magic, it's just weak typing. Weak typing for a web language makes a lot of sense.

> At least 70% of the bugs I encounter in the wild are due to unexpected types. SWTH doesn't fix that - if I pass in a string and it looks like a number, the function will blindly accept it.

I'm trying to think of an example where that wouldn't be what you want, and I can't! All the input you get from users is going to strings, most input you get from databases is going to be strings, text files are going to be strings. And since casting in PHP is so permissive, forcing strict typing is like having no typing at all.

The fact that a scalar variable has a specific binary representation is an implementation detail and PHP mostly doesn't care. The fact that a number is represented by 8 bytes two's-complement or 5 bytes ASCII really shouldn't matter. PHP plays a bit too loose with those rules but SWTH would fix a lot of that. I don't why it's necessary to force the caller to provide the correct internal representation for a value when it can be converted to the right representation without data loss or confusion. The function still gets the value it expects.

The only amendment I would add to SWTH, as I said already, is having it cast to the hinted type before the function call. With that, I can't see what argument there really is against it.


>PHP plays a bit too loose with those rules

I don't think you or I disagree on this. The language does not handle strings gracefully.[1][2] The more magic casting and conversions I can remove from my codebase, the happier I'll be.

I'm sure there are some individuals that would misuse strong typing, but it would help a whole lot of people avoid some really, really dumb bugs. This isn't so much about knowing what's going on internally (binary representations), this is about being able to catch bugs early and often. It also helps in refactoring and improves the quality of linters.

But yeah, if SWTH casts if a parameter matches the definition of a type (e.g. 1.5 is not an int but 1.0 is) then SWTH is fine. There'd probably be a little bit of a performance penalty, but as long as the devs implement strict casting rules (according to Nikita's favorite proposal) then it solves my problems. As a sidenote, Nikita actually agreed to the casting change after posting that article[3].

[1]: http://phpsadness.com/sad/47 [2]: http://stackoverflow.com/a/1995196 [3]: http://www.jejik.com/articles/2012/03/a_php_type_hinting_alt...)


> This isn't so much about knowing what's going on internally (binary representations)

Strong typing is all about the binary representation. SWTH just cares about the values. I don't see the advantage in caring at the call site whether or not this integer is a string of numbers or a 64bit value.

But I'd probably be as ok with strong typing as you are with Strong-weak typing if PHP's casting rules weren't so permissive. If you have a function that takes an integer but you have a string, it would be logical to cast that string to an integer. But PHP casts are very permissive, and any value can be casted from a string to an int. So casting negates the benefit of type hinting.


It would be nice if you had an option, like

   function foo(weak int $i)
for weak type hinting, and

   function foo(int $i)
for strict. The arguments for weak type hinting make sense but sometimes you don't want any ambiguity.


Or something like "numeric" that allows both int and float, as well as any string that would pass the test of is_numeric(). It would be just like class inheritance: int and float would be treated as subtypes of "numeric".


Yeah, as long as there's a way to specify a strict type then I'd be happy. Libraries and other code that's intended to be used by 3rd parties could probably benefit from weak type hinting, but internally I'd really like the confidence that comes from strict hinting.


> if you use PDO to select a row from MySQL where the 'id' field is an integer, and you write something like ($row->id === 42), you get false because all the fields are returned as strings.

this is not universally true. if you're using the mysqlnd drivers and disable prepares emulation, you get the correct types back.

    // Turn OFF emulated prepared statements
    $dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);


This seriously should be the default behavior.

To anyone who is wondering what the heck the above is about, PDO/MySQL doesn't actually prepare and execute a statement when you use the prepare & execute methods. Instead, it automatically escapes all the parameters, interpolates them into the querystring, and sends the query to the database server.

This behavior is the default because, well, somebody might use an old version of MySQL (3.23 or 4.0) that doesn't support prepared statements. Never mind the vast majority of MySQL 4.1+ users who are sacrificing performance because of this outdated default.


If the function requires an integer, it is an error to pass a non-integer. Everywhere outside PHP, including in dynamic languages, this is not a hard decision


Java has trouble with this as well:

``` Integer foo = 5; int bar = 5; System.out.println(foo == bar); //true ```


A good starting point would be to implement type-hinting for a generic 'scalar', i.e.;

    mymethod(scalar $foo)
This simple type-hint would already solve a lot of problems where non-scalar values cause havoc (least of all, automatically casting of arrays to 'Array' as string and other cases, like trying to use an array as key in an array).

I've seen a proposal to use a generic 'numeric' type-hint. I'm not sure this is a very good idea, really. If it handles values in the same way as PHP's is_numeric(), this will lead to confusing situations; "0xA", after all is a numeric value. To allow values like this, it's better to specify 'scalar' as type-hint and handle conversion/casting manually.

I personally don't like the fuzzy '~int' (int-'ish') type-hint, it simply doesn't look very clean (looks like some kind of a RegEx). Besides, how should a method pick the best way to cast a non-integer value? (Again, is "0xA" a typo by the user, or does it represent a hex?).

The only situation where automatic casting/fuzzy matching may be suitable, is for Object that implement a specific interface (e.g. ArrayAccess interface).

By lack of real polymorphism in PHP, I'd opt for a more explicit approach, while keeyping flexibility by allowing multiple types to be specified, similar to the PhpDoc notation. Something like;

1. The 'classic' approach

    // Accepts any type
    myClassicMethod($foo)

2. Explicitly 'classic' approach

    // Explicitly accepts any type
    myClassicMethod(mixed $foo)

    // Effectively, this results in the same behavior as 1.,
    // however, it clearly shows that 'mixed' values are accepted
    // by-design and not because of omitance
3. Strict-ish approach

    // Only accepts scalar values
    myClassicMethod(scalar $foo)

4. The 'polymorphic' approach

    // Accepts 'int' or 'string', but NOT 'bool', 'float' an non-scalar types
    myFuzzyMethod(int|string $foo)
    {
        // Further type-casting and checking
    }

5. The 'strict' approach

    // Only accept 'int' values, NO 'int-ish' values
    myStrictMethod(int $foo)
    
    // The code *calling* the method is responsible for converting
    // arguments to the right type, using any method nescessary 
    // for the situation
    
    echo myStrictMethod(intval($foo));

    // or, maybe a simple cast is sufficient;
    echo myStrictMethod((int)$foo);
    
    
In all approaches above, handling invalid arguments (out-of-range etc.) should be the responsibility of the method.

Having said that, unless the method's actual purpose is to convert/cast types (e.g. convertToBool(int|string $value)), the last approach (strict) offers the clearest separation of responsibility (again, IMO);

In many cases, a method cannot be held responsible for picking the best approach to convert values, simply because it does not know where an argument originates, what it should tollerate and how conversion should be handled.

Take for example, a web-form containing a 'currency' input. To offer a user-friendly experience, in many cases it is preferable to be 'forgiving' when handling 'faulty', but (sometimes) trivial user input, like;

- User includes currency-symbol in the field (€ 123,45)

- User uses incorrect decimal-separator for the website's locale (123.45 in stead of 123,45)

Conversions like this are part of the 'business logic' of the application and cannot be reliably performed automatically by the type-hinted method without causing unexpected results.

Type hinting should just do that; 'hint' what is expected and 'throw' if this expectation is not met.

I would really oppose to too many 'fuzzy' matches. Automatic conversion between types (the infamous PHP type-jugling) is a major reason for many headaches while developing with PHP. Disguising them as 'type-hints' if only fooling ourselves.

I would like an option to define custom type-casting handlers to have more control over the way PHP will 'juggle' values and to handle specific situations.


Having this enabled in development and testing environments sounds like a good idea to me, any thoughts? Will definitely try it out locally asap.


That is what we do at Box.


I hate annotations in PHP. They feel very tacked-on. I'd much rather see actual enforced typing in the language.

This is a step in the right direction, but it doesn't go far enough, in my opinion.


In a very literal sense, everything in PHP is tacked on. It's a conglomeration of crap. You could replace "crap" with something more politically correct in that sentence, but it could not possibly be more literally correct.

Annotations. Namespaces. There is nothing in PHP that was not an afterthought. While it could technically be considered "unfair" to criticize the language in that respect, the total failure to push it in any rational direction in the last decade or two says nothing about the language, but speaks volumes about the people in charge of it.

PHP is a horrible, horrible failure. Not a failure in the sense that it doesn't work; it does. It does what it was and is intended to. But it does that in such an absolutely shitty way, and with such a grotesque mountain of horrid workarounds and terrible hacks that are sometimes called features, that there is no excuse for the way this language has evolved.

It works. And it's used all over the place. But that doesn't make it good. If the road to hell is paved with good intentions, then PHP is the express route.


Thank goodness there are people in this world who care more about shipping than about perfection.

Thank goodness there are people who have thicker skin than I, and release FOSS to the world despite comments like yours.


And if others listened to people like you, we would all still live on trees.


While I agree with nearly every assertion you make, I'm still hazy on what 'good' is. Surely the fact that so many people find PHP the most productive language they could've worked with, and produce mountains and mountains of working code that is at least mostly reliable - don't those qualities alone make it in some sense good?

Then again I think some pop music is good, so I'm starting from a point where I don't necessarily think being popular makes something mediocre.

That said, I really hate being forced to write in PHP because of its complete inelegance, so much so that I have left all my legacy PHP in a burning garbage heap and ran screaming to Scala, where type safety is a given. But as I'm being forced to move back to PHP for a project because of a client's demands I'm trying to look at the silver lining of programming with it. Sometimes it takes some squinting but it is possible to appreciate what PHP has accomplished.


There can be elegance found in writing well-formed code of any language. It may be tougher in PHP, but it's certainly possible.

To use a car analogy, I have more respect for the guy running an old Mustang he tunes up and runs a 11 second quarter-mile than the jackass who drops 200 large on an exotic that pulls a 10.


> the total failure to push it in any rational direction in the last decade or two says nothing about the language, but speaks volumes about the people in charge of it.

This claim has no basis and the rest of your rant is a tautology. There's been no attempt to improve the language but all the improvements are afterthoughts?!? You can't have it both ways. I'm disappointed this post is grayed out and your post isn't.


PHP doesn't have annotations.

The framework you are using may be emulating them, though.


Facebook is doing something similar, but instead of annotations they are breaking syntax compatibility (for the better, I think). This talk is unfortunately the only reference i can find: http://www.infoq.com/presentations/php-history relevant part starts around minute 22.


PHP doesn't have annotations.

The framework you are using may be emulating them, though. By parsing docblocks. Hence why it feels like that.


That's because PHP is a tacky language. :)


Yes I dislike the use of 'annotations'.

Would it not be better to catch the recoverable error that "function foo (integer $a)" gives, then use your error handler to check that $a is an integer?


This does not satisfy the requirement of low performance implications. Augmented Types works by wrapping zend_compile and adding opcodes to to the compiled PHP to check the types of the arguments. This way Augmented Types can be run once at compile time (yes, PHP is compiled) and ensure type safety of all subsequent invocations.


This is sort of a test/development only feature -- so making use of annotations makes sense. Running annotated code on a vanilla PHP install (without this extension) will work perfectly fine. You only get the extra checks with the extension enabled.


Good luck with "actual typing" - I don't think a language that believes "0" > "01" ('numerical strings' FTL) is really suited to it.


Reading this, wishing Facebook's Hack lang was OSS so I could really have a chance to sink my teeth in and compare/contrast these tools.

Edit: Clarity


Facebook's Hack lang is open source, though not publicly documented. This commit points out a lot of the implementation and test files: https://github.com/facebook/hhvm/commit/db9577b1409b74293fd9...


Fascinating! I had no clue.



I sure had no idea Box uses PHP.


Does anyone have a rough census of what top websites use? I've had this same reaction, but more than once with PHP: I know now that at least Box, Wikipedia, and Facebook use PHP.


Facebook doesn't use PHP the way other companies do. They ship & run binaries via HipHop/HipHop Virtual Machine (HHVM) in production. Other companies which use Facebook as an excuse (justification) to use PHP in production ship standard PHP scripts into Production and think they are doing the same thing Facebook does.


True, although didn't that happen only after many years of them using regular PHP in production? IIRC, HipHop dev only really got going in the 2010s.


True, although that was after four failed attempts to move away from PHP and realizing due to incumbent inertia they had to find an alternative solution. Enter HipHop. I am sure that today if they had to start over, PHP would not even be considered ;-) My main point is that in "today's times" using Facebook as a excuse (reason) for opting for PHP is not a wise move.


And didn't the MediaWiki people say they'd prefer to not use PHP, but it's hard to change?

Pointing out that some large system uses a specific technology just means that it wasn't impossible to scale and use that technology. You can write a book with a quill pen; it doesn't necessarily make it the best tech to use in a modern publishing workflow. But if that's all you know how to use, don't let it stop you from writing.


Mailchimp also does.


Etsy


Wufoo


Neither did I, I'd like to know more about their setup.


me neither


I'd really like to know why they chose to do this instead of trying to make spltypes and type hinting more robust. Annotations seem like an ugly way of doing this.


One of the authors here. We didnt want to change the AST in a way that would be backwards incompatible with vanilla PHP. There is some luxury in being able to turn off this extension when needed. We also have a large enough existing codebase that switching to SPL types everywhere was a non-starter. Our goal here was partly to be able to use PHP's native types and still get some good coverage.

Phpdoc may not seem ideal, and it's not, but it has afforded some significant flexibility


Thank you for your response, that's good reasoning and I definitely see it works well for your use case. It just seems as if multiple companies are coming up with solutions to work around deficiencies in PHP which is causing fragmentation in the language. That's not going to stop me from using this extension in a new project I'm starting though. Are there any areas of this extension you'd like to see improved since I find the best way for me to understand an extension internally is to work on it.


The reasons are right in the article: they didn't want to fork PHP and they didn't want a run-time performance hit.

It seems like an interesting solution to me.


Sounds very promising, will definitely give this a spin. Hopefully something like this will make its way into PHP by default.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: