I worked in Cray's compiler department for seven years. If we couldn't dramatically parallelize someone's code, we couldn't sell a vector supercomputer. Period.
Automatic parallelization is very possible. The problem is tends to be less efficient. A decent developer can often do a better job than the compiler by performing manual code restructuring. The compiler cannot always determine which changes are safe without pragmas to guide it. With that said, our top compiler devs did some amazing work adding automatic parallelization to some awful code.
We inevitably sold our supercomputers because we had application experts who would manually restructure the most mission-critical code to fit cache lines and fill the vectors. Most other problems would perform quite adequately with the automatically-generated code.
What this article lacks is a description of why Erlang is more uniquely suited to writing parallel code than all the other natively parallel languages like Unified Parallel C, Fortran2008, Chapel, Golang, etc. There are so many choices and many have been around for a long, long time.
I completely agree. As someone who works on a parallel functional language, it's very hard to sell a parallel language that isn't as fast as parallel fortran or hand-tuned C code that uses pthreads and the fastest parallel implementation of BLAS and other libraries.
The people who really care about performance are using those. The ones who don't are honestly mostly still writing code that has large constant factors of _sequential_ performance available as low-hanging fruit. Sure, they'd take free performance, but the rewrite/porting/debug costs (even in automatic parallel compilers for the same language) are at least as high as just firing up a profiler.
We believe so! Our project leader is focused on how we can parallelize general-purpose applications easily. With more and more people writing in static functional languages that have relatively poor parallel scalability without massive program transformations (Haskell, OCaml, F#, etc.), we think there is an opportunity there.
That said, we're at a "go big or go home point." We either need to ramp up from the 1.5 grad students + 2 undergrads / year significantly or wrap things up. Getting to a point where we can be used in general-purpose projects requires a lot of work, none of which results in papers.
If I had to guess, the most probable impact is what you would expect from PL research - integration of lessons learned in other systems down the road:
- CilkPlus has a nice first pass at a work stealing algorithm, but we showed how to do it without static tuning by the programmer with lower overheads, to boot.
- Vectorization to take advantage of wider vector hardware requires transformation of data structures (e.g., array of structs to struct of arrays). We showed how to do that automatically and reason about changes in program performance.
- We have boatloads of papers - at both workshops and conferences - on what has to be done to the compiler and runtime to run efficiently on NUMA multicore systems. Right now, most fp systems do not run into these problems because they cannot scale past their own parallel overheads. Once past that bottleneck, the next one will be memory traffic, at least in our experience.
I don't say any of that to fling mud at other systems; we started our project after them all, at the start of the multicore era (2006), with the goal of investigating these specific issues without carrying along the baggage of a pre-existing sequential implementation.
There's also still a lot more to learn. I personally don't buy that that deterministic and total chaos are the two only points in the design space of program reasoning. There have to be some interesting midpoints (e.g., histories that are linearizable) that are worth investigating.
As the complexity increases one thing that starts to show is not speedup in palatalization but fault tolerance.
Debugging a non-concurrent program can be difficult, now throw in threads, shared memory, pointers and it quickly becomes a nightmare. The system could be fast, but if it crashes every week, is it useful. Often the answer is yes. But in some cases the answer it no.
There is no free lunch. Shared-nothing architecture doesn't come for free. You pay a toll in _sequential_ performance. It might or not matter to you.
> Automatic parallelization is very possible.
For numerical algorithms many and for small function scope scale I can see that. Numerical code. But the problem is (and what Joe was pointing out) is that applications and algorithm design has to be build concurrently to start with.
A compile will not re-factor your code to not access a single database and acquire a lock from 100k clients into using some eventually consistent or event-sourcing data store. It is something that has to be built from ground up.
Same thing with fault tolerance, it has to built in from ground up. Adding it later is not easy.
Isn't it really fair to say that it's designed for both? The way it uses immutable state and something-similar-to-s-expressions to express data make it very straightforward (or even transparent) to distribute work between multiple processes and separate computers, in addition to how it makes it practical and simple to break work into small chunks that can be interleaved easily within the same thread. It's really designed for doing both very well, wouldn't you say?
Not at all. Erlang isn't useful for modern parallel computing as we know it, which is usually done as some kind of SIMD program; say MapReduce or GPGPU using something like CUDA. The benefit doesn't just come from operating on data all at once, but these systems (or the programmer) also do a lot of work to optimize the I/O and cache characteristics of the computation.
Actor architectures are only useful for task parallelism which no one really knows how to get much out of; definitely not the close-to-linear performance benefits we can get from data parallelism. Task parallelism is much better for when you have to do multiple things at once (more efficient concurrency), not for when you want to make a sequential task faster.
SIMD is a specialized form of parallelism. It is not the only definition of the term.
It should also be clear that task parallelism (or concurrency from your perspective) has not had the benefit of billions of engineer-hours focused on improving its performance. It is within recent memory that if you wanted 20+ CPUs at your disposal, you'd have to build a cluster with explicit job management, topologically-optimized communications, and a fair amount of physical redundancy.
As many of the applications requiring low-end clusters tended to involve random numbers or floating point calculations, we also had the annoyance of minor discrepancies such as clock drift affecting the final output. This would present, for example, in a proportional percentage of video frames with conspicuously different coloration.
Task parallelism was something used to work on 20 years ago when we thought it was the solution to scaling. But then we found that the supercomputer people were right all along, that the only thing that really scales very well is data parallelism. So the focus in the last 5/10 years has been finding data parallel solutions to the problems we care about (say deep neural network training), and then mapping them to either a distributed pipeline (MapReduce) or GPU solution.
> It is within recent memory that if you wanted 20+ CPUs at your disposal, you'd have to build a cluster with explicit job management, topologically-optimized communications, and a fair amount of physical redundancy.
You are still thinking about concurrency, not parallelism. Yes, the cluster people had to think this way, they were interested in performance for processing many jobs; no the HPC people who needed performance never thought like this, they were only interested in the performance of one job.
> As many of the applications requiring low-end clusters tended to involve random numbers or floating point calculations, we also had the annoyance of minor discrepancies such as clock drift affecting the final output.
Part of the problem, I think, is that we've been confused for a long time. Our PHBs saw problems (say massive video frame processing) and saw solutions that were completely inappropriate for it (cluster computing). Its only recently that we've realized there are often other/better option (like running MapReduce on that cluster).
I think in this case they're relatively interchangeable terms. Rather than a SIMD vectorization of a task, you are applying a MIMD solution to various parts of a task.
You can typically get more of an immediate boost with SIMD on current hardware (especially if you can effectively cast it to GPGPUs), but MIMD is more easily applied. Almost any application can be refactored to spawn lightweight threads for many calculations without any explicit knowledge of the π-calculus.
To your point and for a well-understood example, make -j doesn't always result in faster compilations. It may if you have the ability to leverage imbalances in CPU and storage hierarchies, but you can also kill your performance with context switches (including disk seeking).
> but MIMD is more easily applied. Almost any application can be refactored to spawn lightweight threads for many calculations without any explicit knowledge of the π-calculus.
MIMD hasn't been shown to scale, and its not locking that is the problem, but I/O and memory.
> To your point and for a well-understood example, make -j doesn't always result in faster compilations. It may if you have the ability to leverage imbalances in CPU and storage hierarchies, but you can also kill your performance with context switches (including disk seeking).
When Martin Odersky began pushing actors as a solution to Scala and multi-core, this was my immediate thought: the Scala compiler is slow, can this make it go faster? It was pretty obvious after some thought that the answer was no. But then we have no way yet of casting compilation as a data-parallel task (a point in favor of task parallelism, but it doesn't help us much).
"Once we have the breakdown, parallelization can fall out and correctness is easy."
Joe is saying this too. And he's saying that because Erlang is a concurrent language, parallelism (he's thinking MIMD not SIMD) is easy. He says:
> Now Erlang is (in case you missed it) a concurrent language, so Erlang programs should in principle go a lot faster when run on parallel computers, the only thing that stops this is if the Erlang programs have sequential bottlenecks.
I don't think he - nor the Go chaps - conflate concurrency and parallelism.
My main issue here is that people here "parallelism" and "a lot faster" they automatically think "scaling." But just hardware threading doesn't get us anywhere near that goal, even if we write our C-style multi-threaded code by hand very carefully.
The PL community is still not having honest up-to-date conversations about parallelism; they are about 20 years behind other fields.
Well, look at how people like Google's Jeff Dean, originally a PL person, became a systems person to basically attack parallelism problems head on. That is, look at the problems that NEED parallel computing, don't think of parallelism as a transparent benefit that is nice to have if it happens, and if it doesn't, its not the end of the world.
Once you accept that parallelism is needed, you realize that it is much more complex than just dividing things up onto multiple cores. That locking is never really the big problem, which really is one of concurrency, the problem becomes all about pumping data to the right place at the right time.
Fortran (especially ancient, wheezy Fortran) lent itself to supervised automatic parallelization because of its lack of dynamic arrays. It was "easy" to vectorize code at compile time when you had so much information about the runtime expectations.
We can do some of this now in most languages with hot-spot profiling, basic block analysis, selective inlining, and other innovations. However, you really can't beat low-level languages that explicitly "hint" at their execution paths.
By the same token, Cray's applications were...so...slow if you were foolish enough to run them on the expensive hardware and not the FEPs.
If the code isn't efficient, you'll run into Ahmdal's Law much more quickly. In fact, I think your comment aligns with what Joe was saying: automated paralellization is not going to happen. You will have to go through and find all your contention points, just like your application experts did.
I completely agree with your last sentence. For those of us who have dived in a way, the advantages become clear, but TFA was really just preaching to the choir.
> At this point in time, sequential programs started getting slower, year on year, and parallel programs started getting faster.
The first part of this statement is plain wrong. Single thread performance has improved a lot due to better CPU architecture. Look at http://www.cpubenchmark.net/singleThread.html and compare CPUs with the same clock rate, where a 2.5 GHz. An April 2012 Intel Core i7-3770T scores 1971 points while a July 2008 Intel Core2 Duo T9400 scores 1005 points. This is almost double the score in less than four years. Of course, one factor is the larger cache that the quad core has, but this refutes Armstrong's point that the multicore age is bad for single thread performance even more.
For exposure to a more balanced point of view, I would highly recommend Martin Thompson's blog mechanical-sympathy.blogspot.com. It is a good a starting point on how far single threaded programs can be pushed and where multi-threading can even be detrimental.
Also, I think that fault tolerance is where Erlang really shines. More than a decade after OTP, projects like Akka and Hysterix are finally venturing in the right direction.
Your point is absolutely correct, but your example could be better IMO. It's not fair to compare a desktop chip with 45W TDP with a laptop chip rated at 35W. Not to mention that the newer i7 actually goes up to 3.7Ghz turbo (vs. 2.5Ghz constant for the C2D) for single threaded loads, so the clock rate is not really comparable in that benchmark (even though base clocks are the same).
A better example would be C2D E8600 @ 3.33Ghz and i5 3470S @ 2.90GHz (3.6Ghz turbo). They are both 65W desktop parts, and the single threaded clock speed is similar. You can see that the C2D gets 1,376 in the single threaded benchmark, while the i5 gets 1,874. The difference is not as drastic (the C2D launched at a significantly higher price point as an enthusiast level chip, while the i5 is a budget chip) but definitely still significant. There are probably even better comparisons but I didn't spend too much time picking out comparable CPUs from different generations.
True, good point. I tried to find something with a similar nominal clock speed and forgot about Turbo. But then, Turbo is a good example why single threaded performance is getting better even in the age of multicores.
Sequential programs are getting slower relative to the parallel programs and both the theoretical capacity of the systems. Also much of the market is shifting to chips optimized for power consumption which are, in fact, slower.
So while obviously the most literal and absolute interpretation of the statement "sequential programs are getting slower" is nonsensical, I think there's a very valid point being made.
Obviously, sequential programs haven't been getting slower on new chips. However, the acceleration rate of single thread performance has been slowing, so even with all our tricks we're getting double in four years when it used to be every 18 months just by doubling the number of transistors on an IC.
AMD recently sacrificed some single-threaded performance in order to achieve an increased core count. Whether or not that was a good move or representative of the industry as a whole is open to debate, but it does occasionally happen.
I think your argument doesn't hold up all that well, you said that the improved architecture of the individual cores improved single-threaded performance (despite, I assume, the decreased clock rates), so I think Dr. Armstrongs point that the shift to multicore made sequential programming less profitable holds at least to an extent - if the CPU manufacturers used the same architecture but used just one core and the higher clocks th single-threaded apps would run still faster, but instead the multi-threaded apps benefit more from the improvements in the CPU.
The part of Armstrong's argument I was (explicitly) referring to was not about relative gains of multi-threading but about presumed absolute losses of single-thread performance. My argument against this is not refuted by relative gains of multi-threading..
Of course you realize even bigger gains on many common workloads using parallelism, but this part of his argument doesn't need the first part, which was wrong.
That depends on which CPUs you are looking at, he says "clock rates started sinking" from about 2004. At any rate
pinning the clock rate doesn't make any sense in the comparison, even if you think it's for his benefit.
The single thread performance story may be different if you take 2004 era Xeon/Opteron chips and follow the single-thread performance as they go to the 8-12 core chips later on.
Erlang solved a problem really well over 20 years ago, its the sanest language by far that I have used when dealing with concurrent programming. (I havent tried go or dart yet) and I owe a lot of what I know to the very smart people building erlang.
However it has barely evolved in the last 10 years, will 2013 be the year of the structs? (I doubt it), every new release comes with some nice sounding benchmark about how much faster your programs will run in parallel and there is never a mention of whats actually important to programmers, a vibrant ecosystem and community, language improvements that doesnt make it feel like you are programming in the 80's. Better constructs for reusing and packaging code in a sane way.
Its fairly trivial in most languages to get the concurrency you need, I think erlang is solving the wrong problem in 2013.
The syntax is a fairly minor point, and although its ridiculous there isnt first class dictionary support in 2013 there are parts of it I really like.
But Ericcson dont know how to manage an open source ecosystem, I dont think they particularly want to. It only started using an open source control a few years ago, still no open bug tracker, half the standard library is in terrible shape, there is no good support for 3rd party library integration.
A few years ago I wrote a UI to the documentation that most of the community seemed to massively prefer (http://erldocs.com). Every year I asked the OTP manager how to get it merged but instead they wrote their own (imo) sub par version while every release changing the documentation format which broke mine without warning.
But as everything, there are trade-offs. Even Go team uses shared memory protected by locks in some cases (see Go standard library) because it's faster or easier that way.
Every good idea (sharing memory between threads is dangerous and therefore should be avoided) taken to extreme becomes cargo cult programming.
Yes, it is dangerous, but at the same time there are plenty of successful projects that do it because there are programmers that can contain that complexity despite the somewhat popular view that this approach dooms you to failure.
An important side effect of not permitting sharing memory is that you can always recover sanely.
An Erlang process has its own heap, so when it blows up, its state just goes away, leaving your remaining program's state untouched. With Go, there is no way to recover sanely; even if your goroutines are designed to copy memory, Go itself has a single heap.
Now, this is a very odd design decision for a language that claims it's designed for reliability. Perhaps Go's authors thinks it's better just for the entire program to die if a single goroutine falls over; well, that's one way, but it's a crude one. Erlang's design is simply better.
I wonder if Go can ever adopt per-goroutine heaps, or whether it's too late at this stage. I was happy to see that Rust has chosen to follow Erlang's design by having per-task heaps, even if all the pointer mechanics (three times of pointers, ownership transfer, reference lifecycles and so forth) result in some fairly intrusive and gnarly syntax.
> Go allows you to share memory between goroutines (i.e. concurrent code).
Go will share memory, by default, and special attention must be taken preventing or avoiding it. It's not an allowance.
> In fact, the Go team explicitly tells you not to do that
And yet they have refused to implement a correct model, even though they have no problem imposing their view when it fits them (and having the interpreter get special status in breaking them, see generics).
> Not really. If you use channels to communicate between goroutines, then the concurrency model is that of sequential processes
Except since Go has support for neither immutable structures not unique pointers, the objects passed through the channel can be mutable and keep being used by the sender. Go will not help you avoid this.
> That is, the default concurrency model militated by Go is not shared memory, but that of CSP. It's disingenuous to affix Go with the same kind of concurrency model used in C.
It's not, go passes mutable objects over its channel and all routines share memory, you get the exact same model by using queues in C.
> What's your point? Purity for purity's sake?
That the Go team has no issue breaking the rules they impose on others, so that point is irrelevant.
> Except since Go has support for neither immutable structures not unique pointers, the objects passed through the channel can be mutable and keep being used by the sender. Go will not help you avoid this.
I never claimed otherwise. But I do think you underestimate the utility of idioms.
> you get the exact same model by using queues in C.
No, you don't. C doesn't have lightweight threads, which means it can't support a useful CSP model of concurrency.
Just because Go allows shared memory doesn't mean its main concurrency model isn't CSP. Go doesn't force CSP on you, but that is nevertheless the primary concurrency model of the language.
>> (and having the interpreter get special status in breaking them, see generics)
>What's your point? Purity for purity's sake?
I assume the point is that the Go developers are saying "purity for thee, but not for me" (if you think generics are impure or unnecessary), or "generics for me, but not for thee", which is just annoying.
The decision to use generics or not has nothing to do with purity. It's about trade offs. If a decent balance can be struck with special privileged functions built into the language, I don't see how that is intrinsically bad.
It's not a decision about whether or not to use generics, it's a decision about whether or not to make it possible to use generics.
The developers have decided that they should make it possible for themselves to use generics, but impossible for you to use generics. Do you really not see why that's annoying? That's not a matter of tradeoffs, because they've already made generics possible---for themselves. At best they might be thinking something like "only WE are capable of grokking when generics are appropriate; everyone else would abuse them", which is rather arrogant, no? Lots of people have been saying that go is "missing" generics. The official response seems to be "no it isn't, you don't really need them." The unoffical response (evidenced by the use of generics in the compiler) is apparently "you're right, go is missing generics, so we'll include them, but just for ourselves".
The context of this discussion started when people were complaining about the language designers being able to build in special functions that are type parametric. The implication was: if the language designers can do it, why won't they allow me to do it?
i.e., purity for purity's sake. It ignores legitimate trade offs between allowing a few special functions and building an entire generics system into the language.
I still don't know what you mean by "purity". Saying "i.e., purity" doesn't help, since I already knew that you think purity has something to do with generics being definable by people other than the language designers.
"If the language designers find it useful to occasionally use type-parametric functions, why won't they recognize that other people might also find it useful, too, and for other functions?" doesn't strike me as a demand for purity in any sense. Consistency, maybe; recognition that the designers are giving themselves special treatment, sure. What's pure about the desired state of affairs, or impure about the present?
Honestly, sometimes it seems as if someone who wants to defend Go against a criticism immediately claims that the critic is just obsessed with purity. Why else would you criticize Go?
Is it that they couldn't or haven't? It's obviously capable of compiling code, but with the state of compilers today (impressive backends, etc.) does it make sense to have a three-stage bootstrap (minimal C → mini-go → full-go → optimized-full-go) at this point?
In fact, the Go team explicitly tells you not to do that: "do not communicate by sharing memory; instead, share memory by communicating."
Sadly, that doesn't mean what you think it means. It really means: don't organize your IPC around shared state. The juxtaposition in the second half is not directly related to the first half (except poetically), though it does do a good job of completing the their picture of CSP. Also note: it is explicitly talking about shared (sadly mutable) state.
You can always opt not to share memory, but there's no facility to prove or enforce it. It's not dire, with practice you can send values, or never mutate referenced objects. It is very natural for the most part. I've done it, but not in Go.
Finally, it's not doom. Even C can do threading after all, and somehow these things don't blow up too much. But there's value in eliminating the pitfall entirely. When people criticize Go for being imperfect, they're not saying it's not going to work, they're contrasting with a more effective solution or lamenting that some design decisions weakened the effort.
How would you get per-actor heaps that cannot be violated by other actors? That is critical to Erlang's ability to recover from processes dying. I spent a lot of time doing Java and can't think how you could (you could in the JVM if you had language constructs for it, but then we are back to a new language).
There's a reason Stackless Python's actors aren't just a library on top of Python.
But those hacks wouldn't provide the same guarantees that language-level changes provide. Sure, you can try not to impact other thread's heaps, but nothing is stopping me, which means a simple programming error has the potential to impact multiple threads. As a result, you can't just "reboot" that thread (a critical piece of what makes Erlang interesting), because you have no guarantees its errors didn't impact other threads. You also have no guarantees that the underlying libraries aren't mucking up all of your carefully crafted memory management.
It's like the kernel protecting memory so applications can't overwrite each other. Sure, applications could just write to their own memory, but nobody actually trusts that model. Instead, they want something below that level enforcing good behavior.
1. Obviously, virtual memory adds a wrinkle to this that kind of forces kernel protection, but even if we had literally unlimited RAM, we would still implement kernel protections on memory.
Erlang itself is, after all, implemented in C and ASM.
But what you can't (practically) implement yourself in other languages is all the professional care and maturity that have gone into the whole package over its long history. AFAICT, Erlang/OTP is much more than just a library.
Ultra-lightweight actors, a VM scheduler tuned for scheduling massive numbers of concurrent ops, etc. If you tried doing this in with a java library running java code, you couldn't get anywhere near their level of concurrency.
Obviously, you can do the same things in Java, as people have demonstrated with alternative languages that target the JVM.
It's the expressiveness at the language level that is really the "magic". For example, doing the equivalent of OO is not intuitive in Erlang, but completely possible (actually easy, but it looks...wrong) whereas it's supported by every Java tool. By the same token, pattern-matched message passing, lightweight green threads, and hot code deployment are primary concepts in Erlang.
Same old hype. Erlang is good I guess, and I've used it in production a couple of times. But it's just a language that solves 3 problems but creates another 30. Just like C++11, Dart, Go, etc.
This kind of belligerent rhetoric (we're solving the right problems, everyone else is dumb) is the kind of drivel that gives momentum to language zealots that think language X is better than language Y.
I've contributed to Google Go in the early phases and I was naïve and really believed that Go was the "next big thing." But it turned out to be yet another general-purpose language with some things that were really interesting (goroutines, garbage collection, etc.) but some things that were just same-old same-old. Now, I'm editing a book about Dart and I've since lost my enthusiasm for new languages; I can already see that Dart solves some problems but often creates new ones.
And in a lot of ways Erlang sucks, too. The syntax is outdated and stupid (Prolog lol), it has weird type coercion, memory management isn't handled that well (and many more). Of course, since Facebook uses it, people think it's a magic bullet (Erlang is to Facebook like Python is to Google).
The article also forces readers to attack a straw man. Often times, algorithms simply cannot be parallelized. The Fibonacci sequence is a popular example (unless you use something called a prefix sum -- but that's a special case). So in many ways, the rhetorical question posed by the article -- "but will your servers with 24 cores be 24 times faster?" -- is just silly.
How is it any more or less stupid than curly bracket.
Show me another production ready language that has the same level of pattern matching as Erlang.
> Often times, algorithms simply cannot be parallelized.
Who cares. How many people here have implemented individual algorithms and delivered them as units of execution. Sure middleware companies maybe sell a cool implementation of A* or some patented sort algorithm.
You can think of parallelization at system level. Can you handle client connections concurrently (and in parallel?). If yes, that covers a large chunk of the usage domain for platforms these days.
> memory management isn't handled that well (and many more).
> So... how would you have handled it, out of curiosity.
Imo separate heaps is the first big mistake. Even implementations like Erjang (Erlang on the JVM using the Kilim microthreading library -- which I've also contributed to) improve on the copy-from-heap mechanism prevalent in vanilla Erlang. Not only that, but Erlang's memory allocator isn't that well-suited for multi-threaded allocations, which also means that Erlang doesn't (can't?) take advantage of tcmalloc, umem, hoard, etc.
Well, technically there is a difference between separate heaps and no shared memory. Data in Erlang is immutable, so processes could use a combined heap, but they still wouldn't have read-write shared memory.
However, in my opinion separate heaps was absolutely 100% the correct design decision. The main benefit is not having to worry about the effects of a long-running, stop-the-world garbage collection, which can have catastrophic effects on user interaction, server response times, request queue lengths, etc. An additional benefit is that the language implementers can use "dumb" algorithms for garbage collection and avoid a large class of difficult-to-track bugs.
Robert Virding talked about these issues at some length at this year's Erlang Factory; hopefully the video will be posted soon.
We used Erlang several years ago. The code base has ~100k lines of code so it should be representative. We abandoned it later and switched to C++ because of performance (mostly in mnesia) and quality issues (some drivers in OTP). We didn't expect too much from performance considering it is functional (which seldom does in place update) but it is still below expectation.
It is understandable though. Just think about how much resources have been put into development of Erlang VM and the runtime/libraries(OTP), and compare it with JVM/JDK. There is just no magic in software development. When talking about high concurrency and performance, the essential things are data layout, cache locality and CPU scheduling etc for your business scenario, not the language.
If zlib could be rewritten in Erlang to be lock-free, why not just rewrite it in C to be lock-free instead of porting it? AFAIK Erlang isn't some magical language that allows traditionally-locked data structures to become lock-free.
Well, zlib is fairly trivial and probably not a good example due to overheads.
However, an example such as a torrent server this would make much more sense.
That being said, Erlang is basically a scripting language for building fault-tolerant and parallel applications.
Using C, you might be able to get parallel, but it'll be a lot of work to make it distributed and fault tolerant.
The underlying data structures have little to nothing to do with what's being said in the article.
I've looked at Erlang before, and I would certainly agree that it's far simpler to write a concurrent application in Erlang than it would be in C.
I'm just taking issue with the bit at the end, where they're bragging about removing a serial bottleneck by rewriting zlib in Erlang in order to remove a lock. Rewriting it in Erlang really doesn't have anything at all to do with switching to a lock-free data structure.
I'm confused where the lock that TFA is talking about is in zlib. We've used zlib in a multithreaded environment for years, and haven't had any issues with it, and as far as I can see, there isn't any mutex or semaphore usage in the source code(http://zlib.net/ ) for the library.
It sounds like this is a zlib usage issue more than anything else.
One certainly could. But to obtain the same properties you get with Erlang, you'd have to reimplement some features of the Erlang VM: Lightweight threads, message passing, etc.. Erlang is opinionated. It imposes a very specific model of concurrency. If you buy into that model, the language/VM gives it to you for free. If you don't, then you pretty much can't use Erlang.
By contrast, C is a very low-level language that can do anything. You can implement any model of concurrency in C. But you'll be doing all the plumbing yourself, or using a library that does the same. Erlang's model is not the only possible way to be lock-free, and you can pursue other options in C, if you want to.
Ok so you re-write zlib. Then re-write, imagemagic, then re-write glibc, then re-write other, etc, etc.
Yes you can do it. But after a while it is like plugging wholes in a piece of swiss cheese. That is what Joe was saying you start with code that doesn't run well concurrently because concurrency was added later. It is better sometimes to start from scratch with a language that makes concurrency the default and the sequential sections are the exception.
> AFAIK Erlang isn't some magical language that allows traditionally-locked data structures to become lock-free.
There is some magic in how it has separate heaps and how it maps schedulers n:m (n cpus say 2 to 24 to m processes say 2 to 300k), how it provides concurrent garbage collection without stopping the world, how it provides hot code reloaded if that is what you need.
No it won't make coffee for you and it might not work well for a lot of tasks but it just happens to be the right tool for the right job lately as reliable concurrent back-ends becomes more important (as opposed to say single threaded desktop applications)
Yes, but standard C has no way to tell the compiler a variable is immutable (immutability != constness), so unless you go non-standard (and very verbose at it) Erlang is still a better tool for the job.
There's one big problem Erlang couldn't solve that I live with to this day :
Unlike another general purpose language (like say, C++ or C#) allow me to grasp what's happening after staring at it for 30 seconds. This is the same problem, I have with Lisp.
Maybe I'm just dyslexic, but these rhetoric pieces for one language or another that says it's concurrent (which it is), fast (obviously), more C than C, will bring the dead to life, create unicorns and other wonderful, fantastic things that I'm sure are all true, just don't seem to be capable of passing into my grey matter.
You know another thing all these amazing super power languages haven't been able to do that even a crappy, broken, in many ways outright wrong, carcinogenic etc... etc... language like even PHP has allowed me to do? Ship in 48 hours.
Before, I get flamed, I already tried that with Nitrogen (http://nitrogenproject.com). It didn't end well, but maybe it will work for someone already familiar with Erlang.
It's like you've written the Mahabharata; it'a a masterpiece and it's one of the greatest epics of all time. Unfortunately, it's written in Sanskrit.
> Unlike another general purpose language (like say, C++ or C#) allow me to grasp what's happening after staring at it for 30 seconds. This is the same problem, I have with Lisp.
I had the same problem with Lisp (Scheme, to be more specific) and I thought that it'd be impossible to reason about run-times and such. That is, until I learned the language and the standard libraries. I've never looked at Erlang, but I'd bet it's the same issue.
A C++ programmer can look at C# code and figure out what it's doing because they have similar syntax and vocabulary. Just because Erlang isn't immediately accessible to you, it doesn't mean it isn't any good for shipping in 48 hours.
Perhaps if you spend 24 hours sharpening your axe, you'll chop that tree down in another 4 hours instead of using the full 48.
I used hyperbole of course, but in fairness, I've spent on the order of several months getting into Lisp. Mabye it's because I didn't work with it exclusively (I've been told here on HN that you need to be immersed in it completely and continously) that I still didn't become OK in any semblance of the term.
Until that changes, I don't see how it will help me.
Basically, I would say that what you are suffering from is a kind of mental block syndrome: you think in a procedural/imperative paradigm. All your listed languages operate in that paradigm. It's a very transferable paradigm, as it so happens. I can come up to (some approximation of) speed in an imperative language in under 2 weeks. In order to ship Lisp(Prolog, Haskell...), you have to break out of that paradigm. I am not condemning you, mind. It is what it is. Rewiring your head is hard, and often doesn't have direct results.
I can, however, ship with Common Lisp, because I've spent on the order of 5 years learning it and writing it most evenings. I am learning Clojure and am preparing to ship a (excruciatingly minor) product with that after only maybe two months of dabbling. This is possible because I've bent my head around into Lisp shapes.
It's also been said that some people have the shape of Lisp in their head, and when they learn Lisp, their heads fit it by nature, and other people don't have that innate meshing. I certainly found Lisp to mesh with my head well.
Oh yes. It can be hard to get started with Common Lisp, just in terms of getting an environment working. I have a tutorial site to help with that(plug plug plug): http://articulate-lisp.com/
That's why I never regretted trying to learn Common Lisp some time ago, even though I didn't ship anything into production, and why I really do enjoy doing the same thing with Erlang right now, i.e. trying to understand it and getting as comfortable with it as I can get (and preferably this time maybe putting something out there).
Both these experiences helped me see programming differently, a change of "paradigm" as you very well put it, so now even when I get back to Python or PHP I feel like I'm a better programmer. Plus, there's something to be said about the fact that always trying to learn new and interesting stuff and not only focusing on "shipping code" is what keeps one's passion at higher levels. After almost 10 years in this trade I've found that passion at what you're doing is a very valuable and in the same time very volatile resource.
I couldn't break that thinking in college and struggled every day of my software engineering course where we implemented a Pascal compiler in Scheme. It wasn't until 10 years later that I started to get a handle on functional programming, due in part to some trickle-down from my Emacs config files. :)
>>Just because Erlang isn't immediately accessible to you, it doesn't mean it isn't any good for shipping in 48 hours.
No, that is not the problem. The problem is total disregard to what most people consider valuable to them. And if they don't get it, framing that as their stupidity rather admitting the fact that the syntax is a little strange to wrap your head around(which is true).
>>Perhaps if you spend 24 hours sharpening your axe, you'll chop that tree down in another 4 hours instead of using the full 48.
In all fairness sharpening your erlang axe might take 24 months not 24 hours.
> In all fairness sharpening your erlang axe might take 24 months not 24 hours.
In all fairness it won't. Erlang is not a difficult language to learn, and honestly, it doesn't take that much effort for the syntax to become familiar.
The semantics of Erlang are different from that of C-like languages, and therefore I think it's good that it has different syntax. You could give it C-like syntax, but that might be just as confusing, if not more so, since it wouldn't mean the same thing it did in C.
Now, I have never read or written PHP. Here's the thing. PHP fills a niche. That niche is being super-productive early on.
There are other goals, like, the code being readable, ease of being reasoned about, fast and so on.
And there are languages that fill those niches. When people say crappy language, more often than not, crappy for their needs(or sometimes their expertise). There is no need to fight about it; its like saying who prefers which color.
The reason you do lisp is not to do coding in it(given the current state of affairs), but to reshape one's mind by thinking in terms previously unseen because we could not see forest for the trees.
And that ability to see something from different view points arms you with weaponry to solve challenging problems. There is atleast a physicist who agrees with me on that :)
Personally I call PHP crappy because it could easily have been much better at its main goal without sacrificing anything from its other goals, but for some reason it wasn't, and now we have to live with the stupid early decisions for compatibility reasons (eg there's no good reason for the mess of strFunc, str_func, string_func, funcstr, needle/haystack, haystack/needle, etc. When the main goal is being super-productive early on, being able to use the most basic functions without constantly looking up their order of parameters would be a great help)
I think Erlang/OTP probably has a higher learning curve than other frameworks, but I guess is a lot down to history. As another poster said Erlang solved this problem 20 years ago, where as more modern languages are typically based on the C syntax, so they share a lot in common.
Back to your point though, I think once you understand the language it is actually a lot simpler to understand what is going on. Modules are usually very self contained, and you don't get the layers upon layers of indirection you see in other frameworks (I'm looking at you Rails). I think the functional style of programming as well helps to keep things simple, it doesn't make sense to have a 20 line function in Erlang.
That's alright. Right tool for the right job. Don't try to use a drill if a hammer is more appropriate, it won't work. If you know C#, Python, etc, you will still need them. If it helps you ship, heck, even COBOL works.
zlib is fine as long as you don't give an non-threadsafe memory allocator - see http://www.gzip.org/zlib/zlib_faq.html#faq21. As far as I can tell, it either means that the summary was imprecise and the slowdown was in the image processing code and not zlib or that they chose to rewrite (and debug) a big chunk of code rather than read the zlib documentation.
Ignoring that point, this seems like a poor point for comparison as it's a trivially parallelized task because zlib operates on streams and shouldn't have any thread contention. There's very little information in the description but unless there are key details missing, this doesn't sound like a problem where Erlang has much interesting to add. The most interesting aspect would be the relative measures for implementation complexity and debugging.
1. Erlang has locks and semaphores , receive is a lock, actors are semaphore. Erlang chose a 1 semaphore/ 1 lock per process model
2. Erlang scales better not because of being lock-free (see above), but because it easily uses async compared to other languages
3. Async prevents deadlocks not Erlang being lock-free (see above)
The lack of understanding is amazingly widespread. I often have to explain to people that when they look at their CPU utilization and it is at 10% it means "you are throwing money way", not "you are efficient".
That's not really true though, or at least not on all workloads: much as you are not "throwing money away" by not pegging your car engine in the red zone 100% of the time, you're not throwing money away by not being at 100% CPU all the time, there are other metrics, values and issues to take in account e.g. a pegged CPU but an unresponsive computer is useless for a desktop, a pegged CPU which can't serve requests because the CPU is pegged because it's swapping like mad is useless for a server, so is a server at 100% CPU when there's no load on it which will just keel over when people start trying to actually interact with it.
It sounds like you missed the point here. If an eight-core server is at 10% utilization, it effective has a single processor nearly pegged and the process doing it is thus CPU bound (and maybe serving responses at a high latency) while you have other cores sitting idle. Conserving CPU resources and running under capacity is wise, but has nothing at all to do with this comment.
> why is that not a contradiction? because an erlang program isn't "sequential" to start with?
Yes. The point is that in a well-coded erlang program only bottlenecks should be sequential (and the bulk should be concurrent), the goal's tool would be (I haven't seen the presentation so I'm throwing ideas to the wall) to see what dependencies lead to sequences in the system reducing overall concurrency and leaving the developer to handle fixing this part if possible.
It doesn't try to automatically parallelize a sequential program, and it does not start from fully sequential programs in the first place.
(not saying I agree with Joe's assertions, they're quite inflammatory and at a very fundamental level lack solid evidence. I have to say I prefer his milder tone to this new "rha rha" one, though this one may yield more visibility for the language I fear the drama)
They're different statements. Taking a sequential program and automatically parallelizing it is a very hard problem. What this tool does, as I read it, is simply find sequential parts of code, and it's up to the devs to figure out how to parallelize said code.
they aren't trying to automatically parallelise the program, they are just using the tool as a diagnostic aid to show where the sequential bottlenecks are, and where it would be most productive to rewrite that segment in a more parallel manner. think of it as a next-generation profiler.
Have you even used zlib in c++? The largest ecommerce site out there uses zlib in a multitreaded c++ application(24 cores, 100s of threads, 1000s requests/sec/server) and it works just fine! Bet you erlang can't come within a tenth of the performance of c++...
yes, because it also does other computation. Poiint was to illustrate that zlib can be used in a concurrent computing setting with high performance. The blog writer had claimed that zlib doesn't work in a multithreaded setting.
This completely misses the fact that many network services are not compute bound, and multi-tenancy (as we get from "the cloud") lets "legacy" code make very efficient use of CPU resources, even a larger number of relatively slow cores.
In conventional blocking languages, you can get a start on parallelizing your programs this way:
- break program into function calls that match the steps that can happen in parallel
- wrap the function calls in messages passed over the network
+ i.e. process(thing) -> post(thing)/poll_for_things()
- split the sender and receiver into different processes
OF COURSE there are big advantages to using a language (Erlang) or a heavyweight framework (map/reduce) designed for concurrency. Rolling your own process-centric concurrency is a different set of tradeoffs, not a panacea. But it's worth considering for some problems.
To say that Erlang fails to deliver what most programmers need misses the point. If you have a mainstream problem, use a mainstream language!
I've spent many years developing and reviewing products in the telecoms realm, and have found that failing to realize when something like Erlang brings life-saving concepts to your project may well make the difference between delivering on time and disappearing into a black hole of endless complexity. It's not for everyone, but when it fits, boy does it help!
Joe Armstrong: "The problem that the rest of the world is solving is how to parallelise legacy code."
Donald Knuth: "During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading."
all true. to add' we are in a local optimum where we have a lot of fast non- parallel solutions and the parallel way needs >10 x parallelism. we'll see if we ever get to ubiquitous 100-way parallelism with no need for backward compatibility.
People have been thinking this, that it's vastly better to design for concurrency upfront, for literally decades. And every single time there has been a big sea change in processor technology it's always been the next generation which will see things like VLIW or Erlang and so forth come to the fore while what I will call "iterative advancements" and "patched solutions" turn out to have too many weaknesses to be competitive. In reality the reverse has happened, and new specialized languages and instruction sets have been relegated to niches.
It'll be the same over the next 20 years as well.
I predict that we'll see a lot of technological leaps which will serve as much to maintain the ability to run "old code" in new and interesting ways as to enable a brave new world of purpose-built languages.
In the next few decades we'll see advances in micro-chip fabrication and design as well as memory and storage technology (such as memristors) which will result in even handheld battery powered devices having vastly more processing power than high-end workstations do today.
Is that an environment in which one seeks to trade programmer effort and training in order to squeeze out the maximum possible efficiency from hugely abundant resources? Seems unlikely to me, to be honest.
Indeed, it seems like the trend of relying on even bloatier languages (like Java) will continue. Do you think anyone is going to seriously consider rewriting the code for a self-service point-of-sale terminal in Erlang in order to improve performance? That's not the long pole, it never has been, and it's becoming a shorter and shorter pole over time.
In the future we'll be drowning in processor cycles. The most important factor will very much not be figuring out how to use them most efficiently, it'll be figuring out how to maximize the value of programmer time and figuring out how to use any amount of cycles to provide value to customers effectively.
(I think that advancements in core, fundamental language design and efficiency will happen and take hold in the industry, but mostly via back-door means and blue sky research, rather than being forced into it through some impending limitation due to architecture.)
Can I rephrase slightly (and take the odd strawman liberty)
The "mainstream" has been relying on incremental improvements for decades, and in doing so avoided rewriting legacy code until last possible moment
Some people have taken concurrency upfront and anecdotally seen cost / performance benefits plus more modern codebases and have anecdotally enjoyed competitive advantages in areas where concurrency makes a difference
We will never see the average, user interface bother with concurrency and legacy rewrites because the competitive advantages are low.
There are likely to be areas where the concurrency advantage is great enough - if you like erlang look for those niches
It's like designing a race car, or a fighter jet. Sure, they are amazing things. But are people ever going to commute to work in anything resembling a Bugatti Veyron or an F-22? Of course not. Neither maximum automotive performance nor air combat effectiveness are the sorts of things that are normally necessary to optimize for in daily life. Some time in the far future we're going to have both the tools to write amazingly efficient programs and to do so with a minimal amount of fuss from the programmer's perspective, but it'll be a long time getting there. And in the meantime there are going to be plenty of cycles of figuring out how to produce performance gains with the least disruption to existing ways of doing things.
Puuuuhhhleeeeaaaseeee can I commute to work in a Bugatti Veyron??
Please please please :-)
Edit: sorry unable to resist. However I am on Joe Armstrongs side - I would far rather make a decent living doing fun Erlang work than be in a java shop making the next generation of POS
Added to that I think not using Erlang or some STM based concurrency language must be an informed decision - if the CTO of big bank says we have tried two pilot projects rewriting the ATM network in Erlang and the projected costs do not add up, fine. If he says "I have two hundred java coders, we aren't moving". I don't think that's valid
Of course, most people would like to be working on race cars, or spacecraft, or fighter jets, but that just isn't an option for every body. And it's not as though there's no in between. The choice isn't just between some soul sucking blub-job in the enterprise trenches or using Erlang, there are lots of languages, lots of development patterns, lots of products.
I would agree, but with the proviso that the spectrum between soul-sucking and cool-space-tech is not a nice linear graph - in my experience its step-gradients, some companies are entirely on one level, and then they have to make a real effort to climb to the next (i.e. From manual deploys to CI)
Its actually a consultancy opportunity (I hope :-0)
This blog post shows everything that is wrong with languages like Lisp and Erlang. This is total disregard for that the rest of the world considers valuable to them.
The problem with these languages remain unchanged. The syntax is so strange and esoteric, learning and doing anything basic things with them will likely require months of learning and practice. This lone fact will make it impractical for 99% for all programmers in the world.
No serious company until its absolutely unavoidable(and situation gets completely unworkable without it) will ever use a language like Erlang or Lisp. Because every one knows the number of skilled people in market who know Erlang, are close to zero. And those who can work for you are going to be crazy expensive. And not to mention the night mare of maintaining the code in this kind of a language for years. There is no friendly documentation or a easy way a ordinary programmer can use to learn these languages. And there is no way the level of reusable solutions available for these languages as they are for other mainstream C based languages.
In short using these languages attracts massive maintenance nightmares.
The concurrency/parallelisation problem today is very similar to what memory management was in the 80's and 90's. Programmers hate to do it themselves. These are sort of things that the underlying technologies(Compilers/VM's) are supposed to do it for us.
I bet most of these super power languages will watch other pragmatic languages like Perl/Python/Ruby/Php etc eat their lunch over the next decade or so when they figure out more pragmatic means of achieving these goals.
>> I bet most of these super power languages will watch other pragmatic languages like Perl/Python/Ruby/Php etc eat their lunch over the next decade or so when they figure out more pragmatic means of achieving these goals.
You know, Lisp's syntax is weird but it is exactly this what makes it so flexible. It's easy to manipulate code as data, because the syntax is very regular. Try to do that with C's syntax...
So, unless someone knows how to solve this in a easy way, I'd say that the lot's of parentheses are actually a pragmatic decision (i.e. you want easy macros... so you have to use this uncommon syntax).
If popularity is the goal, then maybe those languages were not pragmatic. However, It seems the language designers of such powerfull languages (e.g. Lisp, Erlang, Haskell) were looking to solve other problems where popularity is really not a concern.
To write DSLs? You cannot use JSON to create new syntax for your language. The whole idea of DSLs is to extend the language for the problem at hand. How would you do that with JSON? Say... how would you write something like CLOS, for instance, using the alternative mentioned by you?
Maybe your option is good enough for a lot of use cases. But what when it is not good enough? Then you're stuck and there's nothing you can do except waiting for the language desingers to release a new version of your language with, hopefully, the changes you need.
>To write DSLs? You cannot use JSON to create new syntax for your language. The whole idea of DSLs is to extend the language for the problem at hand.
I'm not that sold on DSLs. I have a language (the base language) that people know, has certain semantics, etc.
Now I suddenly go on and add a new mini-language on top of it, with my ad-hoc semantics for the "problem domain"? Why multiply the languages used, so that someone would have to reason and understand both, instead of just one?
I could just use the functionality of the base language, AND it's syntax/semantics, to model the problem. I.e with objects in an OO design, with functions, in a procedural design, data and first class function in a functional design etc.
I don't really like all those Ruby DSLs for example, like for testing, were you have to learn each one ON TOP or knowing the core language.
>This blog post shows everything that is wrong with languages like Lisp and Erlang. This is total disregard for that the rest of the world considers valuable to them.
The problem with these languages remain unchanged. The syntax is so strange and esoteric, learning and doing anything basic things with them will likely require months of learning and practice.
I can't speak for Erlang, but Lisp, really? Strange syntax? It's about as straightforward as it gets. And you can learn it in a day, a week tops.
>No serious company until its absolutely unavoidable(and situation gets completely unworkable without it) will ever use a language like Erlang or Lisp.
Lots of serious companies used both. Lisp was widely used in academia and in places like the JPL. AutoCAD worked with Lisp. Heck, even more obscure languages like OCaml are widely used in the financial domain. And today, lots of startups use Clojure. This very site (HN) is made in a Lisp.
Sure, using LISP has some drawbacks and is avoided by the mainstream enterprises today, because of lack of developers and commercial support (compared to C, Java, .NET etc).
>I bet most of these super power languages will watch other pragmatic languages like Perl/Python/Ruby/Php etc eat their lunch over the next decade or so when they figure out more pragmatic means of achieving these goals.
I'm sorry to break it to you, but Perl, Python, PHP and even Ruby have already peaked. They are not going anywhere (upwards) in the next decade or so.
This is total disregard for that the rest of the world considers valuable to them.
Yes and no. Where it's true, sure, arrogant jerks.
But the Lisp weenies have realized they know something the rest of the world doesn't. They think it's important, and it is in a sense. If only you knew that... you'd agree. (Ok, try not to take that too seriously.)
Unlike the impossibly abstract Lisp Truth™, this Erlang bit is centered in a very concrete fact that affects all of us. If you don't care about it today, it will affect you tomorrow all the same. You might as well argue that you never much cared for oxygen and who cares if the atmosphere is slowly turning to methane?
Joe wasn't saying you have to write Erlang, he was saying you need to write concurrent programs. If another language eats his lunch, it will probably do it in the same way that would be done in Erlang. There are alternatives, but the actor model is by far the most programmer-friendly that I have ever seen.
You sounds like a typical "manager". I have heard this comment millions of time from management people. These so called pragmatic languages are good to build "applications" which are just a bunch of API calls bound together and most of the so called programmers are building "applications", they have no idea of how to build real "systems" which are distributed, robust etc. Try to build "systems" in your so called pragmatic language and you will find what I mean.
>> All software is building on top of something. Every heard of system calls?
Yup, heard about them. My point is the focus of these "applications" is meshing API calls rather than algorithm and data structures.
>> May be you being such a great programmer can show us how to build a "real system", without ever using a API of anything ever.
I ain't a great programmer at all, far far away from that. As far as "real system" is concerned, what about IBM watson?
>> Something like 99.99% of the world does that. Erlang is not even in the list top 10 languages in the world today.
I wasn't talking about Erlang specifically. I was pointing to the so called niche languages that you mentioned that very few people uses.
Anyway, there is no point discussing this because we have different point of view. Mine is "what should be done to further progress computer science - the progress is too slow", yours being "lets build applications and earn some bucks".