The conclusion I take away from this is that good performance isn't necessarily more useful than predictable performance. For many language implementations, the performance characteristics are quite fragile, and you'd better be benchmarking after every change or you won't notice a regression.
Even setting up a benchmark is tricky because the same binary can have dramatically different performance depending on the environment. At the machine level, caching effects from different processors can have dramatically different effects on performance. But compilers and runtimes can easily make things even more unpredictable, requiring things like warmup time to even hope to get an idea of what's going on.
I think Go starts out a bit closer to achieving predictable performance; any ahead-of-time compiler has an advantage here. Other than the use of garbage collection, the whole language seems to be designed with predictable performance in mind over ease of use, and you can avoid GC where it matters. Also, recently they replaced segmented stacks with copying of contiguous stacks, which both improves performance and makes it more predictable (while a goroutine is running).
I remember reading a thread once about Kenton trying to optimize some protobuf deserialization routines. After much profiling, the outcome was basically "I can't believe the compiler is doing this. It means it is basically impossible to predict the performance characteristics of source code." At this point, a number of old compiler guys jumped in (we have a number of folks on the C++ standards committee at Google, and most of the team who wrote HotSpot), and said "Yeah. The best way to make sure your program runs fast is to get it into a compiler team's benchmark."
Chrome/V8 has this problem as well - if you talk to really skilled web developers, they have a lot of performance "tricks" in their head, and the pitfall is that this knowledge decays much more rapidly than people think it decays, and so what was common knowledge in 2008 (or even 2013) is now no longer true. One major problem we faced with Google Search and specifically Instant was that it was optimized for the performance characteristics of the browsers of 2008; pretty much none of those rules apply to modern Chrome and Safari on mobile networks, and so performance is ridiculously bad on mobile.
Go achieves predictable performance only because it's relatively new. It's true that the team has tried very hard to keep things simple and predictable, but the problem is that the hardware that Go runs on keeps changing as well. As a language it was designed to take advantage of multicore chips that are just coming into use now. What if in 5 years we're all using quantum computers, or memristors, or flash memory, or the memory hierarchy no longer applies? What if everything is peer-to-peer over mobile devices?
> It's true that the team has tried very hard to keep things simple and predictable, but the problem is that the hardware that Go runs on keeps changing as well. As a language it was designed to take advantage of multicore chips that are just coming into use now.
This has already happened; what many people don't realize is that we're arguably past the multicore era. Desktop-class CPUs aren't adding more cores as quickly as we had predicted, because people aren't using them (although note that this is different in mobile). What they are adding is wider and wider SIMD units, and more and more SIMD instructions. To get maximum performance on the CPUs of today, as well as those in the future, using SIMD effectively is every bit as important, or more, as using multiple cores effectively.
In my opinion, programming languages have been fairly slow at responding to this.
Have CPU SIMD units been very successful? Many cases where you can use SIMD, you can also just go and use the GPU (the ultimate SIMD) instead and get a much better speed up. Larabee (or whatever its called now) hasn't seen very much use either.
Why aren't the video codecs GPU accelerated by default these days? Is there something about decoding video that prevents it from mapping well to the GPU?
Intel is only on their second generation of OpenCL-capable GPUs, so there are still tons of machines out there that have no GPGPU capability and don't have even a CPU-based OpenCL implementation installed (except on OS X). APIs giving access to fixed-function transcoding engines on GPUs are also not as standard and universal as they should be, and those fixed function engines also have poorer output quality than the good software implementations.
Video decoding is very branchy and not easily decomposable. e.g. each frame depends on decoding the previous frame. They are however relatively easy to implement in custom hardware (part of the design goals of the standard committee) so in most cases the GPU accelerated decompression does not actually use the standard shaders/pipeline etc... but a separate section of silicon.
Well Erlang is probably the only language that kept multi-core systems in mind and the implementation is pretty nice. It is kind of funny how we crash the entire application process when a thread crashes in it. Also worth to mention the GPGPUs with 1000+ cores on them. We should support those cards and systems better in the mother languages.
> good performance isn't necessarily more useful than predictable performance
That may be true, but on modern machines it's pretty much a pipe dream, as even the hardware nowadays (let alone modern OSs) is unpredictable. So once you get to the compiler it's a little late to get predictability. I think we're better off foregoing predictability altogether -- unless when writing real time applications (and I'm using realtime in its original sense) -- and getting the best performance we can. JITs currently seem to be the best hope for top performance, especially for a given unit of manual programmer effort expended on optimization.
(note: this comes mostly from a background in scientific/numerical programming in C)
You are certainly right if one takes "predictable" to mean something close to "deterministic"; on the other hand I think there are numerous almost-deterministic characteristics that can be evaluated without resorting to plain-ol' empiricism (particularly involving likelihood of hits in L1/L2/L3 and likelihood of being able to speculatively execute particular branches of code).
So I would agree that my a priori predictions may be a factor of 2-3x off; OTOH I often succeed in predicting that a particular tight loop will never have to leave L1, or that a low-entropy condition in the loop will be made essentially irrelevant by speculative execution.
With that proviso, I share your belief that JITs seem to be the best hope for top performance (and I am a heavy user of LuaJIT).
BTW, for anyone interested in an overview of the non-determinism built underlying modern hardware architectures, I recommend watching this great talk[1] -- A Crash Course in Modern Hardware -- by Cliff Click, one of the world's top JIT experts.
If you use CUDA + a GPU, you know for sure, since you are basically responsible for scheduling all memory movements yourself. On the other hand, even CUDA is doing some optimizations behind the scenes that can drastically effect the performance of your code in ways that are not so obvious.
A lot of the HPC work has moved over to GPUs, it is amazing what one can do when you have almost complete control over the memory hierarchy.
Memory and caching is a big part of nondeterminism, but it's not just about memory. GPUs employ a far more primitive execution strategy than modern server CPUs with their branch prediction, ILP etc. Also, GPUs, while terrific for parallel workloads, are terrible for concurrent workloads, which require very intricate branching.
Well, you can branch as much as you want on the GPU, as long as your branches are coherent among each thread sharing the same control unit :) I wouldn't say this is "primitive", just a very different way in thinking about computation that has been very very very effective for a lot of use cases.
GPUs are obviously not the solution for concurrent or irregular workloads (yet), but many are surprised how much mileage one can get out of Python + CUDA for scientific workloads. The only point I was making is that there is a world where the hardware is much more deterministic (even if most of us can't go there).
There is no good takeaway on this benchmark, honestly. If you are doing any sort of mathematically focused programming and you are not delving into the math yourself, expect odd results from compiler optimizations. Heaven help you if you start using libraries that are focused on this sort of thing.
I'm left with a feeling something else must be going on here in addition to the 'fast mod' optimization. It looks like the optimization would only be used on the first 2 or 4 calls to isEvenlyDistributed() which doesn't strike me as enough to cut runtime in half, nor does it seem to make sense that you'd see roughly the same performance difference for all values of lim (I'd expect the performance to get closer as lim grows). Am I missing something?
Maybe you underestimate how many numbers are weeded out just by checking divisibility by 3, 4 and 5? The first check regarding 2 will always return true, as `val` is incremented by 2. The check for divisibility by 3 will cause an abort for two out of three calls. The check for divisibility by 4 will cause an abort of half of the remaining calls (as `val` is incremented). And I guess the check for 5 will weed out an additional segment of 4 out 5. Thus just after checking 3, 4 and 5 only 1/3 * 1/2 * 1/5 of the `val`s will continue down the recursion call. So the majority of calls terminate in the highly optimized static division code.
Now you can still argue that the remaining calls that go deep into the recursion could account for a lot of time. I don't believe this is the case though, as in each recursion call you should terminate a constant fraction of each calls, so the drop down should be exponential.
An other interesting observation of this phenomenon is that running the recursion backwards (beginning with the large divisors) might greatly decrease the runtime as a more significant fraction of the `val`s are falsified by larger divisors in the first calls of the recursion.
I see what your saying and that was the type of thing I was getting at. If i understood the article correctly, the claim was that it was faster because the first 2-4 checks could be optimized by avoiding the idiv. But even if you optimize those check to zero time, that is a small number of checks in comparison the number of checks needed as lim increases (even at lim = 20, only 10%-20% of the checks would be optimized). It doesn't follow that the run time would be cut in half based on the article's conclusions alone.
With 64bit operands, IMUL+SAR is between 10 and 25 times faster than IDIV on current x86 hardware. (div speed is very dependant on the values being divided. eg: on Sandy Bridge arch, latency is between 40 and 103 cycles.)
With these deltas, the first few checks are essentially free. A large number of the numbers to check will be eliminated early and never try the expensive idiv tests.
While it felt a bit chaotically written (maybe due to my lack of in-depth JVM knowledge), a very interesting article.
Is there a way to detect when these cases happen? If there was, the compiler could choose to ignore the @tailrec directive if it wouldn't give any meaningful speed increase.
The purpose of `@tailrec` is not necessarily for performance, but more for correctness, otherwise you can end up with recursive functions that blow up the stack pretty fast on large enough inputs. Plus this benchmark only does at most 20 iterations per loop - which means nothing if you want to measure the cost of method calls.
hmm that's a very good point, I didn't consider that. We'd need additional benchmarking to know what the actual impact is and at what point the @tailrec version becomes more desirable.
If that's all you got out of this article then you missed the point. It's clearly targeted as a criticism of "kneel before my benchmark" posts; highlighting the fiddly technical details that make runtime scores nearly meaningless on their own. It's analogous to medical journals having to analyze all the confounders of their study in order to attempt to prove that what their graphs show is causation instead of correlation.
I'm going to comment on the political aspect and leave the technical JVM-centric bit alone.
In this example, did we learn what is faster, Java or Scala? Nope. But we learned a lot digging for explanations why the results are different, which will hopefully result in having better cohesion between languages and the underlying platform.
Performance is often FUD when it comes to languages. Clojure, Scala, Haskell, and even Common Lisp are plenty fast enough for most purposes. Hell, Python and Ruby are fast enough for most applications. Besides, runtime performance has more to do with how the code is written than anything else. You can write very fast C++, if you spend a lot tweaking it and hire experience and extremely expensive ($300k/year and up) programmers, but typical C++ isn't any faster than well-written code in other languages. (I've seen C++ projects fail for performance reasons related to maintainability issues that arguably wouldn't exist if Haskell were used.) However, invoking performance is a great (if unreliable, given who it brings on to the field) way to scare decision-making business people (toddlers with guns) into taking your side on an issue they know nothing about.
This is a place where it'd be better if programmers were a little more politically savvy. (Bringing The Business into a technical dispute is not politically savvy. It ruins everything, in the long run. Never invite executives, also known as toddlers with guns, to anything. As many a Chinese noble learned about opening The Wall and letting the Mongols in to fight one's battles, it's impossible to get them out after it's done.) Let's say you have a team of 5 programmers who want to write something in Python, which is (for most purposes) fast enough. One of them stands up and says, "oh no, we can't do this in Python because if we end up running this on 100,000 boxes it will be too expensive, so we can only use C++" (premature optimization). If he were more politically savvy, he'd build the thing in Python and then, if the software were to run on 100,000 boxes, rewrite performance-critical pieces in C++, and justify a bonus for himself by pointing to the 20,000 CPUs that were just deprovisioned. And a year later, a month before bonuses are disbursed, he can rewrite another performance-critical component. This is good for him (he actually gets recognized, instead of being that annoying guy who bludgeoned a team into writing C++ and taking 4x as long to deliver an MVP) and for the business (only performance-critical components, with price tags large enough for him to care, get rewritten).
Oh, and if you think that part is constructive and boring, and you came here for a language holy-war, I do think both Java and Scala suck as programming languages, because they both allow me to write stupid programs with performance problems.
I assume this is sarcasm, but I actually worry about the fact that Those of Us Who Care About Languages are too divided over minutiae like Haskell's syntactic whitespace and Clojure's parentheses, and that may be why The Business comes in, mushroom stamps us and says, "I'm sick of your shit, programmers. Everything has to be in Java."
(Actually, if you've watched Orange is the New Black, you know that The Business has taken to calling software engineers "inmate", but that's another discussion.)
I like Haskell and I like Clojure for very different reasons, and they are very different languages, but the day-to-day real-world differences between them are small compared to the very real risk of The Business overhearing our flamewar and saying, "fuck you guys, Java all the way, now lick my SCRUM or it's minus-5 story points for you."
One of the things that worries me about Clojure is that, while I'd argue that it's the best (for a definition of "best" that includes short-term business viability; this enables me to exclude obscure niche languages that may be better on paper, but that are just too numerous for me to know anything about) dynamically-typed language-- a great language on its own right, but sitting on top of the JVM and having access to those libraries-- there's been a mind-share split between Clojure and Scala. And while Scala/Java and Clojure/Java interop aren't bad, Clojure/Scala interop is a mess. On top of this, while Odersky's brilliant, I think Scala has taken in a little too much of the Java culture for it's own good. Scala's a fine language to write in, but large Scala codebases are generally things that I'd rather not risk my sanity and career by being anywhere near them. If Scala wanted to be "Haskell with Java libraries" it would been a different and harder fight... but then again, it might not have taken off at all without the "slightly better Java" crowd, so maybe the way things happened was the only possibility.
The mind-share split between Clojure and Scala scares me, because it generates a very real risk that the intellectual energy that I'd like to see benefitting both languages, or at least consistently benefitting one of them, might fall back down the tree onto Java. The real risk, to me, is "Clojure vs. Scala: Divided We Fall". But I don't know exactly what to do about it.
>The mind-share split between Clojure and Scala scares me.
I think Scala is going to win the corporate mind-share. I just accepted an offer at a large corporation that is writing all of its new back end services in Scala.
The lead architect is familiar with Scala and Clojure, and is trying to introduce the latter into the codebase (he successfully introduced Scala). There is a use case for an "Immutable Database" for one of the services, and It would make sense to use Datomic/Clojure for it.
The hesitation to use Clojure seems to come from some developers, and not the management team. I think it stems from the prospects of having to maintain a large codebase that is without first class static analysis.
"I think Scala is going to win the corporate mind-share. - very unlikely. Java 8 will (continue to) win corporate mindshare. Scala added the missing elements to java, but Java 8 adds them into the base line so the need for scala goes away. Corporations have too much invested in java and there's not enough differentiation between scala and java 8 to make the cost of that transition likely. I predict that we'll see a steady decline in Scala as Java 8 gains traction in the enterprise development world.
I was referring to the Clojure vs Scala as a Java replacement
competition, by if you think Java 8 is a Clojure/Scala Killer your mistaken.
First things first, Java 8's Lambda implementation is shoddy at best compared to first class function support in Scala/Clojure. The entire idea of having to explicitly convert a collection to a stream to access map/filter functions is non optimal. You don't just add lambdas to a language and automatically expect them to lift it's collections libraries to the level of Scala's and Clojure. The collection's lambda operations are the real benefit of using a language with first class function support.
The other inconvenient truth is that you still will be writing Java code, in all of it boilerplate glory. Still no Scala like type inference, no Clojure esque homoiconicity, no ability to reduce everything to a value like in Clojure/Scala, just plain old "it'll work" Java.
Java 8 did not add all the missing elements. It did not add even the most of them nor the most important. Sure, it added some, but I doubt it is 5% of what Scala has to offer, and I doubt they are the things that people primary choose Scala for. Also, even if we compare the features that Java 8 added, e.g. lambdas - they are nowhere near as nice to use as in Scala, and they don't play so nice with the rest of the language as it is in Scala. IMHO Java 8 will only help Scala in becoming more popular, by improving Scala's performance and Java interop.
Right, and there are some really big things in Scala which already got quite a lot attention from enterprises: Play Framework, Apache Spark, Akka. Funny, I know some people using those things from... Java even though it is very cumbersome to do so.
> If he were more politically savvy, he'd build the thing in Python and then, if the software were to run on 100,000 boxes, rewrite performance-critical pieces in C++, and justify a bonus for himself by pointing to the 20,000 CPUs that were just deprovisioned. And a year later, a month before bonuses are disbursed, he can rewrite another performance-critical component. This is good for him (he actually gets recognized, instead of being that annoying guy who bludgeoned a team into writing C++ and taking 4x as long to deliver an MVP) and for the business (only performance-critical components, with price tags large enough for him to care, get rewritten).
Call it political savvy if you want, but anybody that's actually been trained as an engineer (I mean formal engineering, not just computer science) knows that "good enough at low cost" beats "ideal at any cost" ninety-nine times out of a hundred.
Engineers shouldn't need political motivation for considering all aspects of a problem, instead of just the technical ones. It's our job.
> I actually worry about the fact that Those of Us Who Care About Languages are too divided over minutiae like Haskell's syntactic whitespace and Clojure's parentheses, and that may be why The Business comes in, mushroom stamps us and says, "I'm sick of your shit, programmers. Everything has to be in Java."
While Clojure is my favorite application programming language, and I certainly do care about programming languages, I end up picking Java again and again and not because of stupid Scala/Haskell/Clojure bickering. The fact that Go is gaining momentum among the SV early adopter crowd shows that Java-style languages have a great appeal, probably due to their simplicity and familiarity (I've been writing Java for many years, and dabbling in Go in the past year or so, and I need a magnifying glass to tell the difference between the two languages). Attributing (or faulting, as you do) the choice of Java/C#/Go to "The Business" is both unjustifiably condescending, and quite ignorant of software engineering in the industry.
"While Clojure is my favorite application programming language, and I certainly do care about programming languages, I end up picking Java again and again and not because of stupid Scala/Haskell/Clojure bickering."
Why do you pick Java again and again?
And I think a lot of the SV early adopter crowd happily chases after the latest "fad" particularly if it is pushed by a company they like and respect. That same crowd is very anti-Java despite the similarities.
There is interesting work being done in Go at the moment. There is also a lot more interesting work being done in Java but to the early adopter crowd Java is old and boring.
Not the OP, but I'm speaking as someone who's been a language early adopter over and over (I wrote one of the top Haskell tutorials on the net, wrote an Arc interpreter in Javascript, and also know Ocaml, Scheme, Common Lisp, Erlang, Dylan, Go, etc.) I'm now trying to figure out what language I'll use for my startup, and leaning towards Java.
Why? Because it's old and boring. Most of the terribly bad yet seductive ideas - JSPs, JSF, J2EE, RMI, Jini, ORMs, XML config files - got tried a decade ago. As a result, the new stuff that's coming out - Guice/Dagger, Guava, Java 8, Hadoop, Apache Spark, Quasar, Android - is the result of people trying to solve real problems, and tends to work a lot better. And it's still possible to find a library for basically everything. There's just a lot less bullshit in Java these days now that it's no longer the hot new thing. In hype cycle terms, it's reached the plateau of productivity.
One of the GitHub founders once said "Your tech should be boring. Make your product interesting." In my early-adopter experience, I've found that one major problem is that everybody focuses on how cool the technology is and all the neat abstractions they can do with the language, and that means that they aren't focused on how they're going to make users' lives better. (This is what killed Common Lisp, IMHO: it's just so much fun to extend the language and play with the technology that everybody in the community spend their time extending the language, which gave us awesome devtools and pretty shitty products.) The advantage of using boring tech is that you attract developers who are smart in the "What can we do with technology?" way rather than "What can we learn about technology?" way.
Couldn't you just use Scala as a better Java in that case? Or do you just generally like wasting lots of vertical space on braces & monstrously verbose and sparsely distributed code chock full of boilerplate?
Also, this isn't about 'technology': both languages run on the JVM and are capable of utilizing it equally. Rather, this is about information density of source code; i.e., the signal:noise ratio. And, if you hire experienced programmers then no learning will be necessary and they can do what they want to do 'with technology' much more quickly with more powerful tools in their hands.
Couldn't you just use Scala as a better Java in that case? Or do you just generally like wasting lots of vertical space on braces & monstrously verbose and sparsely distributed code chock full of boilerplate?
You must not have used a modern IDE. IDEs such as IntelliJ can fold a lot of boilerplate [1]. Besides that, modern Java IDEs offer so much functionality for refactoring, code generation, etc., that it's often hard to be more productive in other languages (I have used C++, Python and Haskell for years, but I am more productive in Java after using it for work the last 1.5 years).
Scala simply doesn't have the same level of support in IDEs as Java does.
No, I've used plenty of modern IDEs. In fact I use IntelliJ for writing Scala and most of the features you list are implemented for it too. But, as others pointed out, even if your IDE can generate ungodly amounts of code for you, you will eventually have to read and maintain that monstrosity (i.e., it's not truly 'hidden').
You don't, really. Bugfixing by whole-program code inspection is incredibly inefficient and quickly because impractical with large codebases.
I've found its far better to hunt down bugs using a combination of debuggers, stacktraces, divide-and-conquer, log statements, assertions, and unit tests. I've gotten bugfix rates as high as 5 bugs/day in a fairly complex library (an HTML5 parser in C) using these techniques.
I think you're missing the fact that you can use all of those same techniques in languages without all the noise. You can also use a plain old editor with such languages (which can be handy if debugging/writing code on a headless server).
And, some languages make writing most of that sort of bug nigh impossible to begin with.
You're basing your business case on Java having some vertical space wasted on braces instead of availability of programmers and ease of inter-op with existing libraries? No words.
Scala can consume any Java library with ease. The programmers I am interested in hiring can program in any language but choose to use the most powerful ones because they are vastly more productive in such than the hordes of 'freely available' mediocrity out there.
Scala's reduction of code size is way overrated, IMO. We're not talking Clojure level succinctness, and you'll be very lucky to get a 30-50% reduction (as compared to, say, Clojure's 5x-10x). In addition, that reduction is -- as you state yourself -- mostly of boilerplate code, which is "cheap" and doesn't tend to have lots of bugs. OTOH, it comes at a cost of very high language complexity, interop, and maintainability. So you pay hard currency for reduction of boilerplate (a mere annoyance).
I would certainly consider Kotlin, though. Kotlin reduces the boilerplate but doesn't add much complexity, and doesn't hurt interop in the least. So you pay cheap for the (modest) benefits. This seems like a much better deal for me.
My experience directly contradicts your claim. I have consistently seen code size reduced by 40-70% by switching from Java to Scala (and that's without doing anything fancy).
If you get only 30-50% code size reduction, then you don't write real Scala, but some kind of Java with Scala-like syntax. Stay with Kotlin then, Scala is no good for you.
Let me clarify: we just recently implemented Spark-Cassandra connector in Scala, basing on the previous similar work done for Hadoop-Cassandra in Java and the reduction in size was about 3x and up to 5x in concurrent code; however it is still hard to measure properly, because the new driver offers more functionality than the original. Of course YMMV, but writing in Scala is not just getting rid of meaningless boilerplate.
It is because I belong to that tiny group of developers who do care about programming languages that I even mention it. But I care a lot more about software in general, and if you look at the industry at large, absolutely no one (for all intents and purposes) cares about any language other than Java, C, C++, JavaScript and C# (plus some Matlab and maybe R, I guess). So I do have some strong opinions, but I put them in perspective. Until IBM, Oracle, Microsoft, SAP and a couple of large defense contractors -- all the true big players in software -- put their weight behind one or two "new" languages, all arguments over PL merits are kind of moot. Frankly, I don't blame the big guys for not switching languages just yet. None of the challenger languages have shown such incredible advantages -- in correctness, performance or time to market -- to justify such a costly switch (certainly no world-changing benefits as C and Java offered). In fact, some technical leads in prominent SV companies have told me that their main reason for picking a non-mainstream language wasn't technical: it was to appease their novelty seeking developers. I think there are two languages that might make a real difference -- Clojure and Haskell -- and they are, unfortunately, still far from conclusively proving their case. If Rust makes it to the embedded space, it could prove a real contender, too. In general, most new languages overpromise and underdeliver.
A) appeasing novelty seeking developers is not a negative. If a company using a particular language can attract developers that are more productive in a novel language than those other developers they might find in a "mainline" language. That is certainly a win.
B) You left out at least 3 major languages that are hugely represented in large enterprises. Visual Basic (especially in the Access/Excel variety), SQL (in it's standard and non-standard formats) and COBOL. Those languages also offered huge advantages that any language trying to gain widespread appeal might want to study. That said, I certainly am not taking any job that is majority VB, SQL, or COBOL coding and none of the best developers I know (selection bias could be an issue) will either.
If a company using a particular language can attract developers that are more productive in a novel language than those other developers they might find in a "mainline" language. That is certainly a win.
I don't think it is. Those CTOs/lead developers told me that the "novel language" teams are just as productive as other teams, and, in fact, tend to be less disciplined (and I'm talking top tier, millions-of-customers, technologically advanced, SV companies here). The only reason they allow those new languages is to attract young developers who get their kicks out of a new PL. If anything, this shows that some developers are immature, or that tech companies aren't doing stuff that's challenging and interesting enough on its own that developers need occupy themselves with the novelty of the language rather than with the novelty of the problem.
IBM, Oracle, Microsoft and SAP is not the end of all IT. Don't underestimate the long-tail. Anyways, there are already quite a few big names using Scala. So this is not "absolutely no one". And you forgot about Google and Python in your list.
Thought about it. I've heard a lot about Scala's learning curve, but I don't think it'd be a huge problem in my case since I already know Haskell. I am a little worried about finding Scala programmers, but there's a decent-sized community around the language. And it can use Java libraries fine, and you can write native Scala code for Apache Spark.
The deciding factor against it, for me, is that Java/anything interop on the JVM is quite easy, but Scala/anything interop is hard. That makes it relatively easy to create a mixed-language Java/Jython system or Java/Clojure system, or even Java/Jython/Clojure, but very hard to do a Scala/Jython system. And even with its advanced features, I doubt that Scala beats Python for quick & dirty prototyping. I've got a bunch of past experience in multi-language scripting + compiled core systems, and I know the benefits of doing a system at scale like that. The value proposition of Scala (and to some extent Go and Haskell) is that you get one language that is both fast and concise, but in my experience you want to separate that out into scripting and core languages, because the styles of programming themselves are very different.
Yea, I second what froaway said. 'Scripting' in Scala is quite easy and works like you would hope it would. Integration w/ the shell is quite nice and it's as concise as Python, Ruby et al and yet also typesafe. Win-win.
Also, I wouldn't worry too much about finding Scala programmers-- in my opinion, what you want to find rather is good programmers. And, good programmers can pick up Scala (or any other language) in short order.
I use it from quick and dirty scripts and ad-hoc data-processing (the stuff you would usually use Perl/bash/...) over exploratory programming, database work (SQL) and architecture sketches (GUI tools) to full-blown (web) applications (JavaEE, Python, Ruby, ...) and browser scripting (usually JavaScript).
Well, first because of the JVM (excellent performance, great dynamic linking, great runtime monitoring/profiling/management), and among the JVM languages Java is simple, fast, well supported with a huge user-base, and very suitable for large-team development (other people's code is, for the most part, easy to understand and maintain). I consider Java's downsides (verbosity -- pretty much on par with Go) minor annoyances at most. I find advanced features offered by other languages (like more elaborate type systems) not worth their cost in complexity and maintainability (and in that respect Clojure is different: it doesn't add complexity, and it tackles really big problems rather than minor annoyances).
> The mind-share split between Clojure and Scala scares me ...
I'm not sure what else you expect; Clojure is not statically typed, and so, it's totally uninteresting to someone looking for a statically typed functional language.
I'm vaguely aware of work such as http://typedclojure.org/, but I already have a statically typed by-default non-lisp language.
> You can write very fast C++, if you spend a lot tweaking it and hire experience and extremely expensive ($300k/year and up) programmers, but typical C++ isn't any faster than well-written code in other languages.
I disagree and so does Andrei Alexandrescu:
"The going word at Facebook is that 'reasonably written C++ code just runs fast,' which underscores the enormous effort spent at optimizing PHP and Java code. Paradoxically, C++ code is more difficult to write than in other languages, but efficient code is a lot easier [to write in C++ than in other languages]." – Herb Sutter at //build/, quoting Andrei Alexandrescu
Here's the thing about performance. Sometimes you care and sometimes you don't care. When you don't care- you don't care. You can write it in Python and even though it runs 10000 times slower, you still don't care. When you do care a factor of 10000 for Python vs. C++ or a factor of 3-5 (or more) of JVM languages vs. C++ can mean running 5 million servers instead of 1 million servers, or 10 frames per second vs. 60 frames per second, and the success or failure of your business. This is why pretty much all the big players who care about performance use C++ (from games to web at scale). The maintainability of large C++ projects is pretty much field proven and C++ is also evolving and while not quite fixing some of the causes for grief (because it maintains backwards compatibility) it offers new ways of doing things that are safer, more maintainable, and just as fast.
The only thing you said that I can slightly identify with is that you need good people in order to build things in C++ (and no, they don't cost $300k/year. I wish.) This is not a con, this is a pro. You want that regardless of language and good people will be expensive. Yes, you can get cheap people to write bad code in any language.
EDIT: Another data point. I worked on a huge Python project where performance did turn out to be an issue and it was virtually impossible to find a "critical" part to apply C++ to. It was just slow throughout, it was built over a huge base of meta-programming and Python specific magic. There was simply no single piece you could point to that if written in C++ would make it go significantly faster. My point there is that while it's certainly possible in a well designed system to mix languages while applying fast languages to performance critical portions it's not always possible after the fact. Language choice is an engineering decision and there's no single answer but you have to be very careful with the attitude of just throwing something together in the mistaken hope that it can be fixed later.
EDIT2: I could say more to defend C++'s "honour" but it doesn't need me as a champion... The choice of programming languages though is important and we need a way to eliminate some of the FUD. Part of that is through sharing real world successes and failures. Naturally there is some cognitive dissonance happening, that is if I chose language X, therefore I'm smart, therefore language X is the best, therefore other people who chose language Y don't have a clue. Where this turns from religion to data is when we can say share data about the project N years later that is somehow comparable to other projects and people can try to gather some insight from that data. The nice thing about performance is that it has an objective component to it, that is if we look at a certain problem we can get some numbers that we can compare. It's quantifiable. Factors such as development time, maintainability etc. are less quantifiable. Developer salaries, while quantifiable, are also hard to compare but are definitely a factor in making language decisions.
I think you're exaggerating performance differences. Reasonably written Python is not 10000 times slower than reasonably written C++, nor reasonably written Java is 3-5x slower. And while I agree that creating a super-optimized tight loop of a program in C++, C or (better) assembly might be much easier to do than in higher level JITted languages, but this does not scale to huge and complex codebases. At some point you'll be struggling with overall project complexity, and architecture / high level design can affect performance more than those low-level bits. A higher level language will be more amenable to refactoring and fixing high-level performance problems than a low-level language like C or mid-level language like C++. So then, you might be much better choosing a high level fast language like Haskell or Scala or even C#/Java than C++.
Not every Python program is slower than C++ by a factor of 10000. Some are. Some Python can be reasonably fast as long as all the heavy lifting is done in native code, e.g. numpy or scipy. Let's say it would typically be more in the x250 range for algorithmic code that uses native Python types. You get to the 10k range when you create your own types and do meta-programming and try to do algorithms over that. YMMV.
The point I was making though was that if you don't care then you don't care. I wrote some Python for a friend who wanted to scrape and process financial information from various web sites. I'd be an idiot to do that in C++. None of us cared how long it took to run (as long as it wasn't weeks). Having access to various Python libraries made this task a breeze and maintainability wasn't much of a concern either.
Andrei wouldn't have made that statement about C++ vs. Java if the difference was in the noise. I think x3-5 is a reasonable rough number to put on it but while I can point to benchmarks I can't point to a comprehensive study that shows "reasonably" written across different domains. You're welcome to take those numbers with a grain of salt and do your own investigation. Another data point there is that C++ is the dominant language in Google Code Jam and TopCoder SRMs where writing fast and correct code quickly is a competitive advantage.
EDIT: I find a lot of people will underestimate the performance advantage of native languages vs. JIT or interpreted environments. The x10000 is something I've seen in a real world system. Another thing to consider is that in a native language you can drop to assembler to optimize performance critical sections. You have 100% absolute control over your hardware. Anyone know how this: https://code.google.com/p/h264j/ compares to the original implementation?
"The point I was making though was that if you don't care then you don't care."
I disagree with this. You always care, to some degree. If I write a one time, 100-LOC script to process a 1GB of data, I don't care if it takes 1 second or 1 hour. But I would care if it took 1 week or 1 month. That's why Java performance is good enough for 99% of programs I write, Python is also good for many, but if I completely didn't care for performance and wrote sloppy code in Python (or Java) I'd get into 10000x performance penalty region and this would be unacceptable in almost all cases.
"I think x3-5 is a reasonable rough number to put on it but while I can point to benchmarks I can't point to a comprehensive study that shows "reasonably" written across different domains" YMMV. The micro-benchmarks in Great Language shootout disagree with this. Most of them are within 2x range and the one outstanding is actually a benchmark of particular regular expression engine. Ok, we should not believe microbenchmarks, so what about real, optimized applications? Compare performance of Netty vs nginx. Or Tomcat vs Apache. This is a tie. Or Hypertable vs HBase (yeah, despite huge expectations and marketing, even in Hypertable own benchmarks, Hbase comes only... 50% - 2x slower). Or Jake vs Quake2 with Jake2 again not even 50% worse (actually better in some cases).
"C++ is the dominant language in Google Code Jam"
This only supports what I already wrote - when writing a very small piece of code you have full control over, like for a competition, it is much easier to achieve high performance code in asm/C/C++ than in Java/C#/Scala etc. In a competitiona like that, even a 20% overhead is not acceptable and I'd also use C or C++. But you can't extrapolate that on large-scale programming, where benefits of using a high level language matter much more.
"Another thing to consider is that in a native language you can drop to assembler to optimize performance critical sections" I can do that in Java or C# as well.
Sure, there's some transition region between "don't care" and "care" but I find a lot of people tend to be strongly in one or the other (sometimes wrongly).
Keep in mind though that the JVM is a moving target. It keeps getting better (on one hand) and on some platforms it's worse (e.g. Android, though the upcoming new version looks promising). It's possible the gap is smaller now than what I remember seeing in the past.
At any rate, if x2 is a number that feels right for the stuff you're doing I can't argue with that. You need to choose the language that works for you. Maybe you choose Java because of the libraries. Maybe you have more experience writing in Java and you're a lot more productive. Maybe there is better tooling. Maybe it's just more in line with how you think.
A web server does a lot of file I/O and a lot of network I/O. The performance of a web server is more about how efficiently you can juggle those given highly concurrent loads and what mechanisms are used at the native layer to interface to those systems. It's not so much about the "raw" power of the language. In your game engine example a lot of the heavy lifting is done in OpenGL which is native code. I'm also not intimately familiar with the details there. At some point it's also about how much effort went into optimizing things and whether or not something else was traded off.
To contrast that, Google Code Jam tasks are typically algorithmic and they stress the "raw" power aspect of the language. That said it's certainly not representative of real-world product development.
If you are thinking about the Android runtime, Dalvik and now ART, those aren't JVM (I think they compile to a different bytecode). There probably is a port of JVM but I didn't get the impression you were talking about that.
Dalvik does indeed use a different VM/bytecode. Your Java Android app is compiled JVM bytecode which is translated into Dalvik. The point is/was that the program you write in Java may run faster or slower depending on your target platform and that if your target platform is Android you are taking an additional penalty vs. e.g. using the NDK...
(EDIT: Yeah, you're right, the name JVM should only be used to refer to the specific type of VM that runs specific bytecode and not to other VMs. As Android shows Java does not have to run on the JVM. Thanks for the correction.)
"In 2012, academic benchmarks confirmed the factor of 3 between HotSpot and Dalvik on the same Android board, also noting that Dalvik code was not smaller than Hotspot" (from Wikipedia)
Do you think maybe that Herb and Andrei might have a (completely understandable and probably unavoidable given human psychology) bias? Besides, appeals to authority tend to be frowned upon.
Now, I don't disagree that C and C++ are probably generally the "fastest languages"[0] at the current time. For most programs that's completely irrelevant -- what's more relevant is that the most dangerous security issues of the age almost exclusively come from these languages[1]. This is why, e.g. Rust matters, but also why languages without undefined behavior in general are a big deal. Undefined behavior is an abomination which has caused untold damage (even more than the billion-dollar mistake of "null").
[0] Somewhat of an absurd term, but let's just say that these languages encourage a style which leads to efficient machine code.
[1] Lack of escaping (particularly SQL) is probably a close second.
EDIT: Btw, if I were really performance-bound, I think I'd actually use a higher-level language to create a domain-centric DSL to describe my solution and then use that to generate a lower-level program (in C/C++, assembler or whatever) which solved my problem. People have been doing that kind of thing for a few years now. See e.g. https://hackage.haskell.org/package/atom
It was not meant as an appeal to authority. It was meant as a reference to the experience of a larger software company who writes code in both languages. That said I'll agree they have a bias, everyone has. Full disclosure: I like C++. I also do like Python but it is my preferred language for specific scenarios.
I also agree to some extent that languages themselves aren't fast or slow (though to some extent they are). In theory equivalent code in two languages should result in machine code that is just as efficient and it is sometimes a fluke or some implementation choices that make it not so.
I can talk a little bit to the security question. One thing we need to consider is that we're dealing with systems, not strictly with bits written in some language. Various injection attacks are almost universally carried in platforms that are not C/C++ and their root cause is at the interface between systems (e.g. your web back end's interface to your SQL database). Even in systems that run in various isolated sandboxes there is always the possibility of penetrating that sandbox or to have other holes in the interfaces to the rest of the system. Honestly, the undefined behaviour bits in C/C++ isn't something that I've seen matter in practice. When all system code in the world is written in C you're bound to be able to point to some mistake causing security issues but we have no way of comparing that to anything (at least none that I can think of). There's always some way someone can screw things up. That said there's no harm in trying to do better- if we can get safer languages, where it's harder to make mistakes, and have all the benefits of the less safe language it's a win-win. No downside.
Fair enough. I apologize for the accusation -- it's just that you never know.
For the record, I also like C++. Especially in its modern incarnation C++11. It's a vast improvement over plain C (if you care about generic programming, as I do). I just think we need an even better language to take its place -- hopefully Rust can deliver, in time.
Re: Injection attacks: Most of the "holes" in e.g. VM emulation code also come from C/C++ legacy -- at least that's been my (admittedly anecdotal) experience.
> Honestly, the undefined behaviour bits in C/C++ isn't something that I've seen matter in practice.
This happens all of the time, especially as optimizers get even better at exploiting the fact that undefined behavior "cannot happen" (per the semantics of the language.)
Ultimately: Yeah, definitely agreed on striving for better and safer languages as long as they deliver what we need even if those are different languages and different needs -- as long as we each get better and safer and faster* code in the end! :)
I hate to ruin if for you, but usually Python is only a few hundred times slower than C++, typical Java is maybe 20% slower, and well-written Java is about as fast.
This is not an argument I can win, that is constructive or necessarily meaningful, but here's what happened when someone tried to translate C++ to an equivalent Java program:
http://codegolf.stackexchange.com/a/26536
Java is not 20% slower or just as fast. But if it is please submit your Java solutions to the bench mark game for us to see.
Python is a few hundred time slower as long as you restrict yourself to native types (e.g. strings, integers, dictionaries, lists). Once you start defining your own classes and do more sophisticated things you get a lot slower. At any rate, my point was not to pick some specific number, my point was that sometimes you just don't care at all about how fast it runs. The x10000 was from a real world system but this can vary - a lot.
It's not hard to write single-threaded C/C++ code that's significantly faster than Java (although it will become harder once Java gets value types, that will address Java's biggest performance pitfall, namely cache-misses), and the simpler the code the better the chances for success (as complex code requires architecture employing virtual methods, and virtual method invocation in Java is faster than in C++ due to JIT inlining). But writing multithreaded code in C++ that's faster than Java is a lot harder, so much so that the C++ code will probably be significantly slower. The reason is that general-purpose lock-free data structures are a lot easier to write when you have a good GC (plus, the JDK includes state-of-the-art concurrent data structure implementations).
About optimization: If you really want to, the JVM can be surprisingly unrestrictive. You'd be amazed at the slightly mad things for example HFTs do with it. Advanced techniques include object reuse so you never activate the GC and unsafe calling of native code. Then there's the weird things you can do with the JVM directly. I can't find anything right now, but I recall weird annotations and multiline JVM arguments more wizardlike than anything. Finally there's alternative JVMs for those who want that.
In regards to weird things you can do with the JVM: You can actually allocate memory outside of the garbage collect from inside pure Java code. (Unsafe.allocateMemory, specifically. or you can use ByteBuffers, but I'm not sure which one is faster or if a ByteBuffer still interacts with the GC) This can allow you to deal with "objects" which are entirely outside of the GC, and can actually be a decent chunk faster then using normal objects. I did a microtest recently (which can mean absolutely nothing in practice) where allocating two normal objects and doing something with them was about 30% slower than dealing with memory outside the GC.
As for the other things you mentioned, those are actually pretty routine. At least in some places. If you're working on a game, then you're going to be reusing objects and calling native functions as a normal thing :)
I don't think unsafe calling is standard at all. It's non-standardized as well. Normally the JVM would copy objects between the JVM and the native process. With unsafe, they are operating on the same bits of memory.
Aaah, that's what you meant by unsafe. I thought you meant that using a native code call was technically unsafe. Thanks for the clarification. (and that actually sounds pretty awesome for game development, I'll have to look into it)
That is somewhat misleading, since the optimized versions use different algorithms. There is also an optimized Python version that runs 30 times faster.
It is a combination of better algorithms and closer-to-the-metal language that gives the 5000x speedup described in the thread.
You can get arbitrary large speedups by just choosing a worse initial algorithm and then improve it.
At least I linked some numbers. Your statement "usually Python is only a few hundred times slower than C++" is so vague it is completely useless. The Python code was sped up by replacing idiomatic Python with non-idiomatic code (not your "usual" Python) and also replacing a large part of it with calls to optimized code written in C. If your claim had been "Python is only a few hundred times slower if you outsource most of your work to C libraries" I would have had no objection. But that's not what you wrote.
Great post, I agree with almost all of what you said except the C++ part. I think the difference is that allocations in C++ tend to be a lot more obvious since almost all of them are pretty explicit and object lifetime has to be managed. I think with JIT's and so on, a lot of highler level languages can be almost as fast calculation wise, but they encourage a style that's just a low slower, because a lot of of slow operations don't look slow.
"I think the difference is that allocations in C++ tend to be a lot more obvious since almost all of them are pretty explicit." Once you start using standard library classes, even the basic ones like std::string or std::vector, they are not. There are lots of allocations/deallocations happening below, and they tend to be slower than in Java.
What you're saying may be true for Python, but not Java. Java achieves a higher memory allocation/deallocation throughput than C++ -- the problems associated with Java GCs have to do with worst-case latency (which is solved by some commercial GCs).
Regarding your latter thoughts: You're right in many ways that whilst bikeshedding about a 'better' language, we end up stuck with the old language that no-one really wants.
The issue is, people are never going to agree. Your post implies that Clojure is more worth while than Scala. Personally, Scala's type system makes it far more valuable to me than Clojure, so I lean the other way. Trying to get a consensus, even just in the world of JVM-compatible languages, is intractable.
People individually need to choose their languages, and choose them carefully. If you don't have control over the language you work in, consider changing job.
> Performance is often FUD when it comes to languages.
Not in my experience. All other things remaining constant, the D Language forums (written in D) are significantly faster for me compared to almost any other forums I visit (including HN, any PhpBB stuff, Stackoverflow, etc.).
I also think that the statement, often made online, that web application performance is limited by network performance and not CPU performance, is absolute BS. My previous statement is an example of that.
There are so many things that could account for performance differences in forum software, pointing to the language as the cause without knowing them all is less than useless, it's borderline misleading.
Here's just a few of the things that can affect performance between different forum software written in the same language: Database schema, database software, database server resources, number of forum users, number of forum posts, average forum posts a day, number of forums viewers that are not users, webserver(s) resources, whether the webserver(s)/database server share resources with each other or other services, extra features supported by one forum and not the other, the network connection between the webservers and database server, the network connection between you and the webserver(s).
Upvoted. However, you provide great arguments why blindly assuming that performance does not matter and just "throw more cores at it" is not a viable solution.
You're comparing a custom forum written in a pet low-level language to PhpBB and concluding that the language is the reason for the performance difference? Clearly you have a lot to learn about web application performance.
One of the reasons, not the sole reason. The point I was trying to make is that "just ignore performance and use an expressive language" is not always a sound strategy - even for web applications. And yes, my example is probably a poor one.
Perhaps he's saying all current languages suck in that respect, and that the future of programming is languages that don't let you write unoptimizable code.
Even setting up a benchmark is tricky because the same binary can have dramatically different performance depending on the environment. At the machine level, caching effects from different processors can have dramatically different effects on performance. But compilers and runtimes can easily make things even more unpredictable, requiring things like warmup time to even hope to get an idea of what's going on.
I think Go starts out a bit closer to achieving predictable performance; any ahead-of-time compiler has an advantage here. Other than the use of garbage collection, the whole language seems to be designed with predictable performance in mind over ease of use, and you can avoid GC where it matters. Also, recently they replaced segmented stacks with copying of contiguous stacks, which both improves performance and makes it more predictable (while a goroutine is running).