Hacker News new | past | comments | ask | show | jobs | submit login

Sighs

I say the following as someone who tripped over PHP around 2008 and, because it worked for most of the problem areas I threw at it, I didn't try and learn anything else.

The following is an honest appraisal from about ~10 years of gently smashing my head against a soft wall. ("I don't have a headache, but I do wonder how much I've dislocated...")

Here's the thing. PHP is fast. Really fast. I custom-built a minimal copy with no extensions builtin, cutting the startup+shutdown time to nearly nil:

  $ time php -r ''
  real    0m0.014s
  user    0m0.006s
  sys     0m0.007s
THAT ^ is on a 32-bit single-core Pentium M!

The distro-shipped copy of PHP on my Core i3 box is _literally slower_!!

  $ time php -r ''
  real    0m0.017s
  user    0m0.013s
  sys     0m0.003s
(The two benchmark times are consistent.)

The Pentium M box is the machine I'm using the most at the moment (a ThinkPad T43), and my standard workflow basically resolves around repeatedly tinkering and hitting ^S ad infinitum, as that reruns my code (inotifywait is an amazing thing). So since I'm launching the PHP CLI a hundred times an hour, fast iteration times are strongly preferable.

And get this.

I have a standard debug file ("d.php") I include by default in all my code, which gives me a d() function that parses the source of what I'm editing to extract variable names. For example, `d($x);` produces something like "test:13: $x: [ {stream#42} ]", for an array containing a stream descriptor.

When I hit ^S with this standard-for-me file included at the top, PHP has to

- launch itself

- tokenize ("oh an included file, let's open and tokenize that too") and then compile

- start interpreting

- hit the first d() call

- my custom tokenizer kicks in (I am aware of token_get_all(); it doesn't work properly) and parses the entire source file

- d() prints whatever

- the code I'm working on executes and does whatever else, maybe it does a few more d() calls

- the code deliberately crashes itself at some point I want it to stop (via another call, z(), because "die" throws the stack away before PHP's shutdown functions run so you can't get line number info from die statements)

- PHP shuts down.

"0.02user 0.00system 0:00.03elapsed"

30 microseconds, to do ALL OF THE ABOVE...

...On my 11-year-old Pentium M laptop (featuring a 5400RPM HDD!).

On this machine, Go takes over a second to build "Hello world". I (laughs) haven't even (ahaha) _tried_ Rust.

PHP has, sadly, literally spoilt me.

I am honestly - honestly - not looking forward to the day when I have to tackle something that requires a consistently long build cycle. To me, at this point, that's anything over half a second. (I've only poked programming as a hobby, but I wonder if this is why I'm still sane.)

I get uncomfortable when my code takes >200ms to compile and start thinking, for example. I hate it when I have to repeatedly wait for my program to get to the point I'm iterating on. And I'm not in my comfort zone (and very easily distracted) when my script has to do a time-consuming sequence of steps to get to the point where I'm tinkering with it. (For example if I need to parse something from a network I usually dump the request that comes back to a file, jump to the top of my script, write the parsing code there using the file contents as a reference, then move the fragment into the right place.) To clarify, this doesn't make me anxious, per se, I just get fidgety and am very likely to wind up reading HN or something. It just completely throws off my concentration. (For example right now I'm in a slightly noisy environment with a lot of activity, and even this is less distracting than slow build times are.)

So, when it comes to fast iteration, PHP kind of wins. __It's not magically faster than every other language on the planet in terms of VM overhead__, however; it just happens to have fast warm-up time. But, in the sad state of language development nowadays, most "cool" and "hip" languages have questionably long warmup time, and PHP is better than all of them - Rust, Go, whatever - for me, because it makes it easier for me to concentrate than other languages do.

So. That's what I like about PHP, and where I get the impression that maybe I should keep using it.

Here's where things go wrong.

Every time I try and do something in PHP, I have an unfortunate tendency to leave a trail of bugreports and/or "O.o?!"s on StackOverflow in my wake.

I tried playing with SQLite3 in PHP a few months ago. I knew all about input sanitization, and I was using The PHP™, Itself©, so I immediately looked up how to use SQLite3 prepared statements; what I was making wasn't going to be Web-facing, but I most definitely wasn't going to do it the lazy way regardless.

After some hesitation, I decided not to use PDO, and to use PHP's SQLite3 extension instead, based on the conclusion that since the SQLite3 extension was surely simpler, it would probably be a bit faster.

All went well. Until my single-object transactions started returning multiple rows. Duplicate rows, in fact, which were even showing up when I manually inspected my database via `sqlite3 ... .dump` at the commandline. Wat.

After carefully inspecting the code and then creatively asking Google for help I finally stumbled across the PHP bug that reported the exact behavior I was experiencing... in 2013. https://bugs.php.net/bug.php?id=64531

Of course, I didn't discover this before asking on StackOverflow about it first. https://stackoverflow.com/questions/36617708/odd-behavior-wi...

That was fun.

Then there was the time I tried to do socket programming, and my script entered an infinite loop. That one took me quite a long time to figure out, so I ended up documented what was missing from the manual for the benefit of others here: https://stackoverflow.com/questions/39410622/detecting-peer-...

The TL;DR here is that socket_recv() is Speshul™ and returns snowflake-flavored error codes that are basically broken.

PHP's network I/O is a disaster though.

Unfortunately I don't have something I can cite for this example, but I vaguely recall working on something some years ago and realizing that the socket_* functions provide no way to sanely trap all possible errors that can happen with a socket, but that the stream_* functions generally do. Amusingly, the stream_* functions provide no way to intercept all possible obscure/esoteric errors that can happen when creating a socket, but that the socket_* functions do. And then I discovered the socket_import_stream() function... but realized that the moment I turned the "socket" into a "stream" I would lose a bunch of additional error-reporting I needed that the socket_* functions provided that the stream_* functions didn't. I unfortunately don't remember exactly what the details were with this, but I do recall going round in circles and then giving up because I realized it was an unsolveable problem.

Quite a bit more recently, I was doing some poking around with running child processes using proc_open(). (Yeah. I'm insane. :D)

This turned out to be amusing.

The first thing that happened that indicated something was going wrong was... nothing. I was staring at a frozen terminal. But something had abruptly started using 100% CPU as soon as my script started, and then the CPU usage dropped back to <10% (Chromium. Pentium M. Enough said) when I hit ^C on PHP, so...

strace time. strace showed that something was going horribly wrong and that I was getting a trillion EPIPEs a second. Oh, my code was trying to write to the wrong file descriptor at the wrong moment. Cool, mental modeling glitch, easily fixe--wait. WHY DID I HAVE TO USE STRACE TO FIGURE THAT OUT.

[Some Google Later...]

...Ah, it's because PHP is ___physically incapable___ of reporting EPIPEs and EINTRs. stream_select() has no awareness that these errors can happen, and it will just keep retrying infinitely.

More information: https://bugs.php.net/bug.php?id=39598 (reported 2006!!!!!!), ReactPHP issue: https://github.com/reactphp/react/issues/243 (reported 2013)

Capability Unlocked: "Headsmash on keyboard"!

If only there was a socket_import_stream() function, I could pull streams from anywhere into the socket_* functions to do I/O on them - as fiddly and fragile as the socket library is, it actually reports errors correctly!!!

Anyway. Back to proc_open(), which returns a set of stream-flavored file descriptors corresponding to the child process's stdin/stdout/stderr.

My script was essentially a the PHP-ified equivalent of "cat file | process". Of course, it took about 100 lines to express this: opening the file... reading a bit of the file... stream_select()ing... writing a bit of the file to stdin... reading stdout... you get the idea.

Except "the idea" isn't what actually happened. What happened was that the script did the "read a bit of the file..." bit a few hundred times and then ran out of memory. I carefully examined my code and the select loop was, most definitely, written correctly. ALL OF EVERYONE'S WAT?

It took asking in ##php for help to figure it out. This is already long so I'll let this nice bugreport continue this conversation: https://bugs.php.net/bug.php?id=75584

The bugreport is very new (2-3 days old, haha!) so it has no replies yet. Probably won't for some time. You know, I could literally place a legitimate wager on whether the next qualitative reply in that thread is "ah, I see, let's see about fixing this" or someone describing how many circles they went around for however many weeks before finding this bug that perfectly explains the situation they're in.

Anyway.

Other fun things I've run into include the fact that the CURL extension's error codes are platform-specific (https://stackoverflow.com/questions/41579771/looking-for-cor...) and that, there. is. literally. no. way. to seek beyond the 4GB point in a file on a 32-bit system, because PHP represents all values to userspace as unsigned integers. What's a long long uint?! (https://stackoverflow.com/questions/44354740/alternative-to-...)

EDIT: One last thing. Nearly forgot this! I was doing some tinkering with PHP a while back and decided to be cute and add visual extended-validation printing into some code I was writing, so if a site had an EV cert it would show the company name on the console.

This went horribly wrong, as I discovered that CURL's TLS info-extraction functions are horribly, horribly broken: https://bugs.php.net/bug.php?id=71929

This is unfixable.

I stop nao.

But I've demonstrated that PHP

- has bugs writing to databases

- cannot sanely perform network I/O

- has sanity-checking issues dealing with TLS

I think I have said enough.

I can has new REAL programming language... that's _faster_ than PHP? :D

Oh, two more things, as a footnote -

One, the reason why token_get_all() doesn't actually work is that I wanted to be able to arbitrarily malform my d() calls - "d" string on one line, opening parens three lines down with a bunch of newlines and tabs in between, pile of arguments shoved in the middle, etc - and I wanted the reported line number to be for the opening parenthesis. token_get_all() doesn't report line numbers for certain tokens - including parens.

Two, a couple days' headscratching helped me figure out PHP's build system. The following is not documented anywhere as far as I know. If you go into ext/ inside the PHP source folder, running 'find | grep config0.m4' will show you all the extensions that _must_ be compiled in. Currently this is just four things (libxml, streams, I forget the others). Every other extension ('find | grep config.m4') can optionally be built as a module by disabling it at PHP build time (via --disable-all, or manually disabling the extension) and later going into the extension directory and doing the "phpize; configure; make install" dance.




> I can has new REAL programming language... that's _faster_ than PHP? :D

Have you tried node.js 8 or 9?

There's also duktape but I don't know what APIs does it have.

    $ time nodejs test.js  # second run
    hello world

    real    0m0.066s
    user    0m0.058s
    sys     0m0.009s
    $ time duk test.js  # second run
    hello world

    real    0m0.002s
    user    0m0.002s
    sys     0m0.000s
edit: duktape has a very limited API built in, and it seems one needs to add any extra function in C. It doesn't seem to have file access. Node.js is very fast to me though.


The problem with super-minimal environments like Duktape, PicoC, Lua, mruby, TinyPy, etc is precisely what you describe: there's no batteries-included behavior.

I am seriously considering acquiescing to the performance impact of Go build times (yup, it's slow, who woulda thunk) because of the sane-batteries-included nature of the language.


How fast is a hello world in node.js to you?

I forgot to mention my formerly favorite language: Python. It's _much_ faster than Node.js in startup and relatively basic things. Node wins on almost everything I make, though.

    $ time python test.py  # second run
    hello world

    real    0m0.009s
    user    0m0.009s
    sys     0m0.000s


Best times from a few "do nothing" runs of Node:

  $ time node -e ''
  real    0m0.098s
  user    0m0.084s
  sys     0m0.010s
Of course on my desktop it's a

  $ time node -e ''
  real    0m0.002s
  user    0m0.000s
  sys     0m0.000s
tiny bit different.

Python on my laptop is just slow enough that I really notice it:

  $ time python -c ''
  real    0m0.032s
  user    0m0.022s
  sys     0m0.009s
Of course PHP is all 0.014, 0.010, 0.012, etc.

Incidentally, this T43 is a backup machine I'm using while my desktop is on indefinite loan to a family member after their laptop broke. This will be fixed eventually; I'm not sure how.

But I've discovered that this old machine is a remarkably good performance catalyst; something that runs blindingly fast on this machine will run really, really well on a faster box - and the thing is, if I write stupid or inefficient code on this older laptop, I'll notice I'm doing it wrong sooner, because it takes less to make this machine fall over from inefficiency.

If only I could tell the above to the Chromium team, though... sooo many Chrome issues... (and Firefox is unfortunately slower on old hardware than ever before! >.<)


It seems almost all the time node.js runs hello world is spent loading its own modules. You should try something like this:

    // watch.js
    var file = process.argv[2]
    console.log('press enter to run '+file)
    process.stdin.on('data', function(){
        var [s1,ns1] = process.hrtime()
        for(var k in require.cache) {
            delete require.cache[k]
        }
        require(file)
        var [s2,ns2] = process.hrtime()
        var s = (s2-s1)+(ns2-ns1)*1e-9
        console.log('time: ' + s.toFixed(9))
    })
Result: less than 1ms on first run

    $ node watch.js ./test.js
    press enter to run ./test.js

    hello world
    time: 0.000952656

    hello world
    time: 0.000284171
edit: script now deletes all cache, not only the passed script (result is the same if the script doesn't require others)


Hmm, very interesting. Something to definitely keep in mind, thanks.


Python is faster on my laptop.

    time php -r ''

    real	0m0.064s
    user	0m0.055s
    sys	0m0.007s

    time python -c ''
    real	0m0.021s
    user	0m0.011s
    sys	0m0.005s
These startup metrics are also super arbitrary anyway, as in web servers code loading is behind your app server anyway (uwsgi, php-fpm, whatever).


It's possible "php -nr ''" might be faster; -n stops loading php.ini, which is what points to all of the modules PHP ships with.

> These startup metrics are also super arbitrary anyway, as in web servers code loading is behind your app server anyway (uwsgi, php-fpm, whatever).

This is true; but only about 1-5% of everything _I_ do (as a hobbyist) faces a webserver, and then the things that do require a webserver are simplistic enough that I can write said webserver in bash and run it from socat.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: