

Critiquing Facebook's new PHP spec - pbiggar
http://blog.circleci.com/critiquing-facebooks-new-php-spec/

======
jerf
"One other thing they specified is that array cursors are internal.... This
would manifests if a new PHP implementation wanted to use a different
threading model: would two threads looping through the same array use the same
cursor? Sounds pretty racy."

Is it even worth it to leave room for a threaded PHP? At this point that would
_de facto_ be another language anyhow... the changes required to add threading
to a language like PHP [1] _20 years_ after it was born are too insane to even
think about. It would be so many years before it was stable enough to begin
building any sort of library infrastructure that would make it into any sort
of common use that the language would probably be on the way out by the time
this gauntlet could be run. See Perl's experience with trying to add
threading.

It would certainly require another specification document anyhow.

I'm of the opinion that languages around the 10 year mark (which PHP is
obviously well past) really ought to focus on becoming the best X they can be,
rather than chasing the tail lights of constantly-moving best practice. Seems
like you're better off just making PHP be the best PHP it can be rather than
try to add threading in to the mix. PHP certainly simply isn't going to be
able to compete with existing languages that build some sort of solid
threading story in from day 1, and, frankly, it shouldn't try. (And most of
_those_ languages simply can't compete with PHP on its home turf, after all.)

[1]: Mutable-state imperative dynamic. Basically, the answer to the question
"What's the hardest sort of common language to add threading to after the
fact?"

~~~
damncabbage

      I'm of the opinion that languages around the 10 year mark 
      (which PHP is obviously well past) really ought to focus on 
      becoming the best X they can be, rather than chasing the 
      tail lights of constantly-moving best practice.
    

Would that include not adding:

    
    
      * Anonymous functions and closures
      * Namespaces
      * Late static binding
      * Dynamic dispatching to static methods
      * Rackup/SimpleHTTPServer-like web server
      * "finally" in exception-handling
      * Being able to call foo()[0] instead of the clumsy
        assign-then-dereference two-step.
    

What does "the best PHP it can be" mean? Anonymous functions (for example) are
arguably way outside the style of the PHP from ten years ago, but they're
tremendously useful.

(I used to use PHP. I dislike it as a language, but I have nothing against
them continuing to improve it like they have over the last nine years.)

PS: PHP already supports threading via a PECL module.
[http://php.net/manual/en/class.thread.php](http://php.net/manual/en/class.thread.php)

~~~
jerf
"Would that include not adding:"

A whole bunch of completely standard features to add to a mutable-state
dynamic scripting language, almost each and every one of which had been
successfully added to a mutable-state dynamic scripting language before PHP
did it? Of course not.

(I'm not sure about that last one, though I suspect one could find a version
of Perl early into its reference experimentations that would have an
equivalent problem. Even today, I'm not sure if a function that returns a list
(not list ref) can be directly dereferenced into via [] notation. Or at least
it's klunky. And let me clarify that I do mean _added_ to an existing
language, not programmed in from the beginning, which is a very different
problem.)

I'd also add generators into that list.

Adding pervasive threading is a different story. Adding threading to a
mutable-state dynamic scripting language has a long and sordid history... even
when it is nominally successful (as in Python) it is still not very useful,
and at times it has been simply a failure (like Perl).

As another example, though, I would suggest the principle would be firmly
against trying to add Hindley-Milner typing to the language, or making some
big move in an immutable direction. That's not PHP... that's not "mutable-
state dynamic scripting language". I'd also suggest against burning any time
in trying to make PHP not a "scripting" language anymore, on the grounds that
Hack is the right approach; create a companion language that may integrate
well, may even still be a "dialect" of PHP, but is not "PHP" anymore, and can
do the non-scripting work without actually trying to bodge that into PHP
proper.

"PS: PHP already supports threading via a PECL module."

Yeah. Perl "supports" threading too... for sufficiently small definitions of
"supports". Unsurprisingly, when I googled "PHP thread" (without quotes in the
search), once you get past what are for me the first three links which are for
the documentation itself, the remainder of the results consist of people
asking and/or explaining why you can't trust it or use it. Compare with the
Google search for "perl thread". (Pretty much the same except with Perl, even
the documentation warns you away from using it.)

PHP is a mutable-state dynamic scripting language. It can continue importing
anything it likes from that realm as all the mutable-state dynamic scripting
languages continue to converge on the same basic core of features, which I
have in the past called CLispScript but really there's any number of things
you could reasonably call it. Good threading support is not in that core set
of features. The history of other extremely similar languages extremely
strongly suggests it would be little more than a staggeringly enormous waste
of time.

But of course, if PHP would like to ignore that history, go nuts. I see no
reason to believe that its internals and API are so especially cleanly
designed that threading will be a breeze to add on, but hey, go prove me
wrong. (That might sound like a sarcastic snipe at PHP at first. And I won't
lie, I don't like PHP. But the truth is simply that adding threading on 20
years later is _insanely difficult_ if you haven't been planning for it all
along, and that's pretty much regardless of the underlying language. If you
have been planning for it it's merely very difficult. So very, very many
things will fundamentally depend on the implicit lock that you get by
everything being single-threaded that you don't even realize it until you try
to add the threading on and realize just how _thoroughly_ the assumption has
been baked into the VM, the runtime, every library, every framework, every
binding... everything.)

For another "see also", look at Javascript, another fairly similar language to
PHP in the grand land of programming languages (mutable-state dynamic
scripting language). There is a _reason_ we have "Web Workers" and we don't
have "Javascript Threading", and it isn't all "browser".

Incidentally, since for I don't even know what reason HN has somehow decided
that my first post is unworthy, do note that it's a serious engineering
question. Spending design capital on "keeping things safe for future
threading" is not free, and if there really isn't any chance that PHP is going
to be threaded in the future, you're better off claiming the bounty of staying
single-threaded in the spec than complexifying it with things that will never
be used, because being able to _guarantee_ single-threadedness is actually a
big win in its own way... if you've got it, _use_ it. And, again, the actual
_history_ of adding threading to this sort of language is not very promising
at all.

(Edit: A bit more searching finds a few people suggesting that PHP threading
may be usable, though it made the transition to that quite recently. It still
doesn't sound like it's something I'd touch after all the other times I've
been burned in other environments with similar level of promises and
experiences. YMMV.)

~~~
aaronem
> Even today, I'm not sure if a function that returns a list (not list ref)
> can be directly dereferenced into via [] notation.

Unless it's changed very recently, you cannot; you have to wrap it in an
arrayref, e.g.

    
    
        [returnsAList()]->[0]

~~~
jondum_bau
all lists can be sliced with the [] notation.

the above example would returnsAList, put it in a new array reference, then
dereference it. probably not want you want.

To get the first item from the sub with [] notation would be (returnsAList)[0]

perl -e 'sub returnsAList { return 0..5 }; $v = (returnsAList)[3]; warn $v'

------
arenaninja
After reading your critique, I'm now very confused on where this spec stands.
Is Zend bound by it (I don't think so)? For example, you mention absence of
RAII, I don't think that Zend's PHP will lack RAII (because they don't break
BC), but new implementations following this (proposed?) spec are free to
disallow it

~~~
pbiggar
No, but I believe they're excited by it (according to Facebook's post at
least).

I'd expect Zend to keep supporting RAII, but I doubt new implementations will
- its a real drag on building better GCs, and that can be as much as half the
performance of a program.

~~~
wvenable
I think the language would really need more features, like using blocks from
C#, to be effective without RAII. The finally clause was only just added in
the last release.

The spec, as it stands, is flawed. Any removal of RAII should be left to major
version (e.g. PHP7). Since this spec is meant to represent the _current_ state
of PHP, it's completely incorrect in this area.

~~~
pbiggar
I disagree. They underspecified the language, so it still technically
represents the current state of the Zend implementation of PHP.

(I take no position on RAII being useful or not, I don't write PHP for a
living).

~~~
wvenable
It's in direct conflict with the PHP manual:

[http://php.net/manual/en/language.oop5.decon.php#language.oo...](http://php.net/manual/en/language.oop5.decon.php#language.oop5.decon.destructor)

"PHP 5 introduces a destructor concept similar to that of other object-
oriented languages, such as C++. The destructor method will be called as soon
as there are no other references to a particular object, or in any order
during the shutdown sequence."

It's _very_ common to rely on this behavior to do cleanup as PHP did not have
a finally clause (like C++) until recently.

~~~
pbiggar
I think we're talking at cross-purposes. I'm just saying that because it's
underspecified, the Zend engine can be both accurate to what's in the PHP
manual, and to what's in the spec.

Anyway, it's a slightly pedantic point, so I'm probably not contributing much
to the conversation here :)

~~~
wvenable
I understand your point. The spec as written means the Zend Engine itself is
accurate the spec. The problem is the spec isn't accurate to the language, as
it exists, in the wild. Existing correct and valid PHP code executed to this
spec will behave incorrectly. Therefore, it's not really a PHP spec.

PHP Code > PHP Spec > PHP Engine

------
thedufer
>Rather than trying to specify the exact algorithm for everything (which is
what the JS spec typically does, for example), they chose to describe the Zend
model (or close enough) and say “it has to appear to work like this”.

To be fair, the JS spec may say those things but no modern JS engine actually
does it that way. Despite the wording, "must appear to..." is the common
interpretation of that spec. It is nice that this one makes that explicit,
though.

~~~
gsnedders
On the other hand, there's been a fair bit of work (esp. for ES6, and to a
lesser extent ES5) to minimize the number of things the specification states
contrary to implementations.

------
wvenable
Under GC:

> I read this as saying that when a variable dies, you must immediately clear
> it up. I suspect that this will make the GC a little less flexible than it
> has to be.

I think the key reason for this is that objects have deterministic destruction
in PHP which allows for the RAII pattern. This is in contrast with garbage
collection in Java or C# where finalizers aren't called immediately (or
sometimes at all) and the RAII pattern is impossible.

I suspect if you could leave memory lying longer around as long as you
destructed objects and resources immediately but there probably isn't any
advantage to that. But changing the destruction characteristics would break
code that depends on it.

~~~
pbiggar
That part of the spec refers to variables (which the spec calls VSlots), not
objects (which the spec calls HStores).

Objects have different lifetimes, and the spec actually allows the destructors
to be run anytime between the object being dead and the program ending. So it
looks like there won't be any RAII here.

~~~
wvenable
No RAII would be a big change in the behavior of PHP; you should add that
information to your critique. I have code that relies on RAII for error
recovery and rollback. Any PHP implementation that doesn't do RAII is going to
subtly break a lot of code.

It doesn't make sense to force very specific memory lifetimes for variables
and then also not do it for objects.

~~~
masklinn
The problem being it's not RAII but a side-effect of reference-counting[0]
(Python had the same issue, it's been a pain for alternative implementations
and a big reason for `with` being added to the language)

[0] it doesn't work when there's a cycle for instance

~~~
wvenable
PHP developers specifically took into consideration RAII when designing the
language. This is why the finally clause for exceptions was not originally
part of the language -- it was unnecessary for cleanup if you have RAII.
Cycles don't really affect RAII in meaningful way.

~~~
masklinn
> PHP developers specifically took into consideration RAII when designing the
> language.

No part of that phrase matches reality.

> Cycles don't really affect RAII in meaningful way.

Which is irrelevant because once again what PHP has _is not RAII_ it's
reference counting, same as CPython and Objective-C. And because it's
refcounting it _does_ have an impact: objects involved in or linked from a
cycle are not immediately cleaned up once the cycle goes out of scope because
all objects still have a refcount of at least 1.

Cycles which, by the way, PHP did not bother breaking until 5.3:
[http://php.net/manual/en/features.gc.collecting-
cycles.php](http://php.net/manual/en/features.gc.collecting-cycles.php) so
these objects would live until process death.

~~~
wvenable
> No part of that phrase matches reality.

It's true. When exception handling was added to PHP, RAII was absolutely
factored into the design. This is why it didn't get a _finally_ clause to
start with -- it's unnecessary.

> Which is irrelevant because once again what PHP has is not RAII it's
> reference counting

[http://www.hackcraft.net/raii/](http://www.hackcraft.net/raii/)

[http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initial...](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization)

If you have reference counting or stack allocated objects than you can do
RAII. You seem to be confused as what these terms mean -- they are not
contradictory, they are complimentary. Ref counting allows RAII.

------
timdierks
I can see the appeal of allowing behavior variant from Zend in areas where
there is a big benefit, and I like the approach of defining these areas as
"implementation-dependent" rather than specifying one or the other.

However, it seems to me that some of these are in areas where PHP code would
need to know the runtime behavior of the platform they're on; the alternative
is just to avoid all such areas ("there be dragons here") or to give up on
portability and be implementation-dependent, in which case it seems to me that
there's not much value in having a PHP standard (other than as a base document
on which to build the Zend or HHVM specifications).

Why not have the specification allow runtimes to expose their choices of
behavior into the runtime to allow code to determine what their platform does?
Of course, one can probably write tests for these behaviors and build a
library, but it seems like it would be better to just delineate the
alternative behaviors and put them into the ABI.

~~~
coldtea
> _Why not have the specification allow runtimes to expose their choices of
> behavior into the runtime to allow code to determine what their platform
> does?_

Because the idea is that it should be transparent to the code. If you start
doing that you get into mess like the "feature sniffing" BS JS does in
browsers, IFDEFs etc...

So, which of these specifically seem to you to be "on areas were code would
need to know the runtime behavior"?

~~~
timdierks
I don't know the details of PHP, but it seems to me that all the areas that
are referred to as "implementation-dependent" are code-observable: e.g. the
deferred array copying decision discussed in the critique: if your code
depends on a behavior, you'll get different results on the other platform. So
if you want to be portable, you need to sniff the behavior and take a
different route on the alternate platform, or you need to avoid the quicksand
in the first place.

~~~
coldtea
> _are code-observable: e.g. the deferred array copying decision discussed in
> the critique: if your code depends on a behavior, you 'll get different
> results on the other platform._

I don't see how the "deferred array copying" ever leak to the program you
write. Copy-on-write and such is a classic example of implementation detail
that the higher layer is not concerned about.

~~~
timdierks
Read the post; apparently Zend doesn't make deep array copies in some
circumstances.

The quote from the spec is "Unlike conforming deferred string copy mechanisms
discussed in §§ that must produce the same observable behavior as eager string
copying, deferred array copy mechanisms are allowed in some cases to exhibit
observably different behavior than eager array copying."

Note the phrasing "observably different behavior".

~~~
pbiggar
It is definitely observable.

------
ars
I don't understand what you wrote about "deferred array copying".

When I ran your code I got "PHP Notice: Undefined offset: 1"

If I removed that line then I got "2 3" which seemed wrong to me since those
two variables should have been aliases to the same thing. But copying the
values instead of the reference seems slower, not faster.

------
dfilaretti
Good to finally see a PHP spec! You might also be interested in a formal
semantics for PHP which has been presented today at the ECOOP'14 conference in
Uppsala, Sweden. Details can be found at www.phpsemantics.org. As I wrote
before, I wish we had this spec a couple of years before - our life would have
been so much easier! :)

------
Kiro
> describe the difference between

$a = new Point(1, 3)

and

$a =& new Point (1, 3)

Answer: I forget! I think it’s that the next assignment to “$a` will do
something odd, but honestly I don’t remember the subtleties.

So what is the difference exactly?

------
georgelund
I don't think your comment on Overflows - "Zend is by definition a correct
implementation" \- is necessarily right - elsewhere we're definitely looking
at Zend having to change their implementation, right? Great article, BTW.

~~~
TazeTSchnitzel
No, the spec leaves implementing it open. Something I'm trying to change as
the behaviour in the spec is wrong.

------
benth
Glad the conditional/ternary operator's atypical associativity is documented
now...

------
webkike
When I read the new PHP spec, I threw up in my mouth after I saw that
empty("0") was a special case of empty that returned TRUE. I know that's how
PHP normally works, but that doesn't really make my mouth taste any better.

~~~
wvenable
The thing you have to understand about PHP, is it's meant to easily work in a
world where every value is in a string. You get strings from the browser, you
get strings from the database, and strings from the file system.

It was expected that these strings contain numbers and that you should be able
to rationally use them as numbers without conversion. So "10" > "5" returns
true in PHP where as that would be false in most other languages. The
underlying representation of the value (sequence of characters, 2s compliment
binary, or IEEE 754 double) means nothing in PHP; they are all supposed to be
equivalent. So "12", 12, and 12.0 are the same and "0" and 0 are the same.

The consequence of this design ripples through to every aspect of the
language. And it ripples down to the implementation of empty. Empty(0) is
true. So therefore Empty("0") is also true. The fact that one is an integer
and the other is a string is not a distinction that exists in PHP. It's super
weird and unexpected but it's not illogical given the rules of the language.

~~~
chriswarbo
> You get strings from the browser

Really? I thought browsers sent HTTP requests.

> you get strings from the database

Really? I thought databases returned tables, ie. ordered collections of rows
with individually-typed columns.

> and strings from the file system.

Really? I thought filesystems returned streams of bytes.

Just because lots of values _can be represented by_ strings, doesn't mean they
_are_ strings. "X is a string" is the cause of:

* XSS vulnerabilities: "HTML is a string" and "user input is a string"; why not concatenate them?

* SQL(/shell/eval/etc.) injection vulnerabilities: "SQL queries are strings" and "user input is a string"; why not concatenate them?

* Multilingual issues: "byte streams are strings"

* Malformed requests/responses (ie. 'message not understood', invalid markup, etc.): "requests/responses are strings"

I know you're stating the rules of PHP, rather than a personal opinion, but I
think it's important to reiterate this point whenever "x is a string" comes
up. Strings are an _implementation detail_ which should be abstracted over.
After all, strings themselves are just an abstraction over bytes/words.

[http://c2.com/cgi/wiki?StringlyTyped](http://c2.com/cgi/wiki?StringlyTyped)

~~~
wvenable
Much of the reasons for PHP being stringly typed are historical. For example,
extremely old database drivers used to stringify everything so you didn't get
individually-typed columns. An HTTP request, specifically an encoded GET or
POST, all values from the browser are strings. A file system is a stream of
bytes, but many common web file formats stringify all values (JSON, XML, INI,
etc).

> Strings are an implementation detail which should be abstracted over.

That's exactly what PHP does -- PHP doesn't care what representation a value
has. But that is also then the source of it's strangeness.

~~~
chriswarbo
> > Strings are an implementation detail which should be abstracted over.

> That's exactly what PHP does -- PHP doesn't care what representation a value
> has. But that is also then the source of it's strangeness.

I don't know if that's a valid use of the term "abstract", but it's certainly
not what I meant. Rather than passing around low-level values for as long as
possible, and only interpreting their contents when forced to (eg. when "+"
forces them to be treated numerically); instead, I meant interpreting values
as soon as possible so that they're treated as the appropriate type.

> An HTTP request, specifically an encoded GET or POST, all values from the
> browser are strings.

Again, they're only _represented_ as strings. When I use an "order" parameter
in my application, I want it to be an Order. I don't want it to be NULL. I
don't want it to be a string which may-or-may-not be the numeric ID of an
Order which may-or-may-not be stored in my database. If it's not an Order (eg.
if the "readOrder" function I declared as the handler for this parameter
returned [] instead of [$the_order]), the request is malformed so a HTTP 400
response should have been sent. My application should never even start if the
data it needs isn't available.

Of course, that's exactly what "dependency injection" is in the OOP world;
it's also a major use-case for the Reader monad in the Functional Programming
world.

~~~
wvenable
That is sort of the opposite of abstracting over the type but I get what you
mean. I totally agree that interpreting values as soon as possible into the
appropriate types is superior. I'm a big proponent of fail early. PHP, as with
many scripting languages, are designed for short-term programmer convenience
over long-term programmer convenience. Which given it's history and the
original tasks it was designed for, that makes sense.

But PHP is moving in the direction of interpreting types quickly and failing
early. Specifically, scalar type hinting seems like an almost certainty once
there is an agreement on the semantics.

------
brian_mingus
Critiquing the critique of Facebook's new PHP spec:

You used affect when you should have used effect. They probably don't teach
that in the CS PhD program:)

Of course my post may be subject to Muphry's Law. But still. If you're going
to spend so much time establishing your credibility, do you really want to
ruin it by demonstrating you don't have a basic understanding of English
grammar?

[http://en.wikipedia.org/wiki/Muphry's_law](http://en.wikipedia.org/wiki/Muphry's_law)

~~~
arenaninja
I'm not sure I would place any significance on something I have seen many
native speakers screw up consistently. If you're not familiar with the author,
you could take the time to read the blog post he linked to where he talked
about HPHP and phc

~~~
brian_mingus
If the affect/effect error was in isolation it wouldn't have been a big deal.
When combined with a rant of how much of an expert he was the contrast led to
quite a bit of cognitive dissonance.

~~~
kansface
He never claims to be an expert on English grammar.

~~~
Dewie
Any grammar mistake negates any intellectual merit on the Internet...

