Hacker News new | past | comments | ask | show | jobs | submit login
How we spent two days making Perl faster (booking.com)
193 points by elnatnal on May 22, 2015 | hide | past | web | favorite | 50 comments

Nice. One of the more interesting things about perl's internals is that it does spend a lot of time with data structures that 'pre-compute' or partially compute the desired result as an optimization. It is my belief that this has been the "secret sauce" of perl performance without a VM for a long time.

After spending some time with Python, I think one of the secrets to Perl's great performance is that nothing ever seems to get "stuck" in Perl like the way certain code patterns can be written in Python or written a different way to make use of some C optimization "behind the scenes". With Perl, it seems like it always just tries to use a C optimization whenever it's supposed to and you don't have to think about it too much.

The other thing is that Perl's built in data structure primitives have somehow hit a magical balance between performance, ease of use and power. Once you get used to the syntax, arbitrarily composing complex data structures is super-easy and they usually perform extremely well.

I think what's surprising is how snappy the language feels in aggregate, even if it benchmarks fairly slow. From execution to runtime, it just feels snappy and I think that may give it a false sense of performance while modern Java and C++ etc. really are faster, they have a clunky feel and lots of the user-facing things seem kind of creaky.

One other problem is that it's really easy to write really slow code in other languages, but even if you only do things mildly "the Perl way" it'll still run reasonably quick.

To this day, I'm still amazed at how many little Perl prototype projects I've written that outperformed their compiled rewrite cousins for months until lots of time was spent specifically tuning that rewritten code.

edit sorry for words I'm beer-posting tonight

Perl... performance?


Perl is solidly in the lowest tier of serious language performance, and that result is quite robust no matter how you query that site, and correlates to my personal experiences as well.

Perl isn't alone... it's joined by many of its 1990s-style dynamic scripting language buddies with very similar performance. And of course many fine programs are written in them nevertheless, so I don't mean this as a criticism... in fact if you read this as a "criticism" that needs to be "rebutted" in some sense you're entirely missing my point. I really shouldn't have to "qualify" this to fend off the obvious replies, but I've been around enough to know I do. But, it's true that a serious professional should be aware that Perl is in the slowest executing set of languages, and it is very possible for the difference to be significant for you in many commonly-occurring tasks.

Having developed in Perl for many years, on a project that required fairly high performance, I learned that the key to getting good performance out of Perl code is to use it idiomatically.

Perl has a lot of built-in functions, keywords, and operators. It's also a bytecode-compiled language (yeah, Perl is compiled when you run it, the compiler is just so fast you don't notice it.) But a lot of those built-ins don't compile to the bytecode equivalent of machine code for a VM like you might expect; they compile to a single instruction that's got an optimized implementation written in C. When you write idiomatic code, you tend to use built-ins in a way that allows those optimized instructions to be used, which gives you near-C level performance.

For example, if you want to loop over every member of a list, perform an operation on each member, and produce a new list based on the results of those operations, you could use a 'for' loop for that. But if you do than the Perl compiler needs to produce bytecode that mirrors the generic for loop and the full code in the body of the loop, pretty much as you wrote it. The compiler can't make any assumptions about the intended behavior. If you instead use the 'map' keyword, the compiler knows that you're transforming one list into another by running a bit of code on each member of the list, and it'll produce much more optimized code to do that.

It's kind of ironic. You can write 'C' style code in Perl but you won't get C-like performance that way, you'll get Perl-like performance. But if you write 'Perl' style code instead, you'll get C-like performance.

Perl is not bytecode based. It compiles to a data structure, essentially an AST. I've never seen anything close to C-like performance from Perl, not in anything computational.

I was being kind of loose in my technical description. Perl is similar to true bytecode-based languages in that it gets compiled into a lower-level representation that is not native machine code, and then that representation gets executed by a program that behaves like a cpu whose native machine code is the symbols used in the representation.

But you're right, Perl's implementation of this strategy isn't like classic Pascal or modern Java's implementations. But that doesn't alter my point; writing idiomatic Perl gives you a smaller AST with symbols that execute optimized code.

You're also right about Perl not having C-like performance for anything computational, because there you're dealing with Perl's data types, which are powerful and flexible but not efficient, as the article describes. My point was more about looping, set operations, string manipulation, conditional statements, and data structures. Perl has both efficient and inefficent ways to handle all of those, and it pays to learn to use the efficient expressions.

As my boss, who has built our nearly 20-year old company in Perl, says: "If you wanted fast, why did you choose Perl?" I think that the comment you were responding to implicitly meant "Perl performance vs other dynamic languages," not vs something like C. I didn't downvote you, but I assume that's probably why you've been downvoted.

I see no evidence that Perl is particularly faster than the other dynamic languages, either. Ruby was often slower for a good long time, but seems to have caught up to rough parity. CPython generally has a slight (but in practice irrelevant) advantage over it.

You know, the picture becomes to come into focus to me. I suspect due to the prevalence of the saying "Languages don't have performance, only implementations do" and people constantly slagging on benchmarks without thought, simply reflexively, and probably helped by a healthy dose of both fanboyism and instinctive revulsion to fanboyism, programmers are apparently nearly completely incapable of having a reasonable discussion about which languages are faster than other, at what tasks, under what circumstances.

Well, that's a pity, because there is such a discussion to be had, and a lot of people are getting hurt because of their inability to get good information on this topic, and consequently making perfectly-avoidable bad decisions. Absolute statements are hard to come by, sure, but there are engineeringly-useful guidelines that you can discover if you examine the problem space.

There is a ton of dogma there, one of the reasons why I studiously avoid any 'which language is better' arguments if I can. The only thing you tend to learn is how the camps are laid out, without any objective data about applicability or strengths and weaknesses changing hands.

I try to imagine a discussion between carpenters about chisels, saws or hammers and each having a favourite brand that they'd defend to the death in order to avoid an objective discussion about what the good and bad properties of each tool are.

I don't know about carpenters, but my cousin who is a mechanic rails on and on about how this particular tool is better than the other, and only idiots would risk their lives lifting a car with that automotive lift.

Where there is choice people will complain about the choices of others.

This is exactly why I wonder why some companies write things such as 'must be passionated about Java' in their job postings.

After all, programming languages are just tools. What really matters is the applications you create with them, and that people will find your application useful and use it with pleasure.

I tend to be worried when I see somebody describe themselves as "passionate" about a programming language. Good programmers, I have found, are passionate about programming, and all too often those who are passionate about a given tool are really much closer to zealots.

If someone has no preferences about tools, I doubt he has any experience at all.

There's preferences and there are religions.

How would you draw the line in the case, of say, text editors?

As soon as you start to try to win people over to 'your camp' it's a religion.

Up to that point it is just your preference.

What do you consider a "serious language"? In its class of languages Perl is one of the fastest. Anyway, this is about speeding up Perl and not "Why use Perl instead of <compiled language of choice>".

"What do you consider a "serious language"?"

Ironically, you're being prickly in exactly the wrong direction. Perl is totally a serious language. I do most of my work in it, in fact. I qualify it with "serious language" because when I say it's one of the slowest serious languages, it's no fair to counter that by pointing out somebody's hobby implementation of Lispcheme that's a hundred time slower.

"In its class of languages Perl is one of the fastest."

Well... I suppose... since essentially all original C-interpreter implementations of the 1990s-style dynamic scripting languages have performance within about 20% of each other, that's not false... but the marketing-style implication that Perl is the fastest certainly is false. (Also LuaJIT is way faster, but does "compromise" relative to some Perl features to get there.) Drop the "C-interpreter" clause and Perl's in real trouble... Javascript isn't the C-speed language it sometimes gets presented as but JIT'ed JS really will handily beat Perl, and PyPy is... weirdly complicated from a performance perspective.

The point is still that talking about Perl as a perfomant language on almost any measure is... very weird. It's not. It's possible to name several serious languages which are as slow, but not one that clearly slower.

It feels rare to find someone who can be honest about the shortcomings of their day-job programming language. Kudos to you!

Seriously. I've run into static from fanboys at pretty much every job because of it. People getting prickly because of criticisms about Java, because...well, they write Java. (I rarely say they know Java, but, different problem.) Seeing some frank discussion, as is going on in this thread, is really nice.

Perl's (C-based) regular expressions make it faster than many compiled languages for lots of real world tasks.

Indeed, a "slow" scripting language gluing together fast C functions is a very productive strategy. Tcl/Tk is another example. Or NumPy.

PHP as well which in many areas is just glue over C in design.

Context is everything right? perl is consistently fast enough for the tasks it is well suited for. And within its space it is quite performant.

Last year, my entry for level0 of stripe3 was a tiny perl script with no tricks (http://pastebin.com/XLZmmPX7).

Using perl, you can not win speed benchmarks, but a C program significantly faster than the "normal" perl program, needs much much more development time.

I have already encountered a C project that was rewritten in perl because it was a maintenance nightmare. The client was very surprised by the performance increase.

I do not know what are your "commonly-occuring" tasks, but in my experience, perl performance was never the bottleneck.

You shouldn't be comparing Perl and the rest with C, but Perl with other dynamic languages that have modern implementations like Lua, Lisp, or JavaScript.

While general "computational" tasks are bound to be slower in Perl, it holds its own in text processing. I've had better performance from Perl than awk, grep, and even sed in some of my own tests (not to mention that the programs are orders of magnitude easier to write).

It even seems from those benchmarks you posted Perl did quite well in the reverse complement and regex DNA tests, as I would have expected.

> Perl... performance?

You have to remember this idea comes from the late 90s when Java was slow as a dog, and Perl solidly cleaned the floor with other scripting languages like Python, Tcl, (was Ruby even known then?).

Perl even outpaced C++ for I/O related stuff due to the way Perl buffered input (or maybe the way C++ streams were so badly implemented). Of course a clean slate implement in C++ that avoided...well C++...and used mostly C style code would have cleaned the floor with Perl...but that was the state of C++ in the 90s.

Yeah...Perl had its heyday.

Nice to see that they made good usage of my pictures that I have to draw manually with postscript.

I use them also daily to come up with similar optimizations, but on a grander scheme, to support proper type optimizations (which might reach the php/lua/javascript performance range then).

But to be fair: Bodyless NV's and cached class pointers for methods are the biggest win in 5.22. OP_SIGNATURE didn't make it.

Very neat! Somehow booking.com is one of those bastions of perl and seems to attract some top talent that way. It would be cool to read / hear more about whether it's more or less of a pain to hire people, to find libraries to interface with newer databases and tools, etc etc.


The results for MRI and CPython are similar. I hate to sound harsh, but what's the point of incrementally optimizing perl, MRI, or CPython when the result is still going to be an order of magnitude slower than SBCL, LuaJit, V8, or CogVM? In fact, I suspect enough person hours have already gone into optimizing these C runtimes that had they known better when they started, they could have already had something like SBCL by now.

Performance matters. By simply changing your language to one with a modern implementation, you can save considerable money on hardware. The argument that Perl/Ruby/Python are so much more expressive/powerful/whatever and therefore are worth the added cost is much less compelling when there are other, comparably dynamic languages like JavaScript with implementations that are much, much faster.

>but what's the point of incrementally optimizing perl, MRI, or CPython when the result is still going to be an order of magnitude slower than SBCL, LuaJit, V8, or CogVM?

because you can't write Perl or Python for SBCL, LuaJit, V8 or CogVM.

My point is that the effort incrementally improving these C runtimes would be better spent producing modern implementations for them with just-in-time or ahead-of-time native code compilation, like other dynamic languages already enjoy. In light of V8 or SBCL, tweaking Perl's C runtime just seems like a premature optimization.

I don't know about Perl, but in the case of Python and Ruby, I think there are C extensions that would have to be replaced with a JIT. So PyPy has a 5-6x speed improvement, but can't be deployed in many places.

This hits us hard. I would love to deploy almost everything we do on pypy, but legacy code dictates the use of C modules that don't have pure python implementations that come close to the necessary functionality.

Compatibility with the massive ecosystem of existing code is something everybody who says "just rewrite the runtime with X" seems to forget.

Yeah, I can see that. Perl6 has gone through some rather painful VM stories trying to get to production. I'm an admitted Perl fan, so yeah, I'd love it if Perl had an ultra-fast native-speed run-time. But oddly, speed has rarely been something that's been called out as a major problem in the same way that it has been with CL, Lua, Javascript, Python etc.

It may also be that Perl 5 has always more or less had one implementation and has all kinds of messy parsing rules, so nobody bothered to complain since we all knew it was a disaster trying to sort that mess out.

>> By simply changing your language to one with a modern implementation, you can save considerable money on hardware.

Well, maybe thay can save even more money on developers by not switching languages. After all, if raw performance was all that matters people would probably still be programming in assembly.

It's not necessary to write macros in C if you want to avoid runtime function calls. Functions can be inlined. There exist keywords to request inlining, and most compilers have extensions to force it. But in practice, optimizers will inline small functions like this if they have access to the definition.

This is off topic, but I really CPython 3.x could make more speed improvements like these. We can not bet everything on Pypy.

Ruby 3.0 has a grand plan for JIT, removal of GIT and everything. Perl 6 is catching up, PHP 7 doubled its speed and will release this year.

> Ruby 3.0 has a grand plan for JIT, removal of GIT and everything.

Source? AFAICT, the Ruby community has in the last couple years just started talking seriously about ideas about what Ruby 3 might look like, but there is no "grand plan".


It's from RubyKaigi 2014

video https://www.youtube.com/watch?v=zt56zjNf84Q

You are right, it's about what Ruby 3 might look like

Great writeup! I've never done perl before but it's still a great read!

Nice write up

Somehow this headline reminded me of the time the GIMP developers accidentally integrated GEGL in a few weeks.


"What was planned as a one week visit turned into 3 weeks of GEGL porting madness. At the time this article is written, about 90% of the GIMP application’s core are ported to GEGL..."

That was in 2012 and it's still not done. Almost, but not yet.

That GIMP refactoring looks like the CADT model[1] while these (accepted) Perl patches look more like the work of talented professionals.

[1] http://www.jwz.org/doc/cadt.html

I hadn't seen that before. Good point. The purpose of a rewrite is usually claimed to be that progress can't be made with the code as-is. So one would expect that the rewrite would quickly show some progress beyond the old version.

Not necessarily. If you are near the top of Mont Blanc, there's no way to go up far in small steps. If you are confident Mount Everest exists, you can decide to go walk there, but you certainly won't see progress for quite some time.

There are equivalents in software, for example the removal of the GIL in Python (never done sucesfully in the sense that the majority of users started using it) or, equivalently, the effective GIL in OS kernels (done successfully a couple of times, AFAIK, even though it theoretically requires inspection of all driver source code)

A JIT-ted Python similarly may eventually be the better choice, but the road getting there may be long and harsh.

On the other hand, these patches are going to be in perl5 version 22, which is just getting its RC cycle started and should be out within a few weeks.

I guess the similarity to me had nothing to do with timing, but that when someone actually sat down to look at the problem the solution turned out to be relatively easy.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact