Hacker News new | comments | show | ask | jobs | submit login
BEAM languages, Hindley–Milner type systems, and new technologies (medium.com)
202 points by pyotrgalois on Aug 8, 2015 | hide | past | web | favorite | 86 comments



Nice article, and for an Elixir fan, provides a nice little snippet on something I've been having issues with but hadn't really put my finger on until I saw it:

"[...] and I really dislike that Elixir tries to hide immutability. That does make it slightly easier for beginners, but it’s a leaky abstraction. The immutability eventually bleeds through and then you have to think about it."

I don't think it necessarily tries to hide it (at all), but it does have some instances where something feels like a mutable structure. Those can be, at least for me, a bit confusing to reason about if you're expecting things both be and look immutable.

I suppose now that I know exactly what's weird, I should just go dig through the code and figure it out. Problem solved?

... One other thing, because I see this in the comments already, is that BEAM isn't the tool for every job-- but for some jobs, it is the only tool to do it well. Is the JVM faster at general tasks? Hell yes, but that's not the point, it's not even why BEAM is around.

It's about:

* Small concurrent workloads. Really long running CPU intensive tasks aren't going to be good.

* Low latency. Not just low, but with a very very small standard deviation. Your application's performance will be consistent.

* Fault tolerant.

The list goes on, and here's a nice summary of it (both bad and good):

http://blog.troutwine.us/2013/07/10/choose_erlang.html

There are times when I choose the JVM, there are times when I choose BEAM or MRI. I just try choose the right tool for the job, but some tools, make some jobs, very difficult.

cough ruby cough concurrency cough

Edit: One thing for people not familiar with BEAM, a "process" is not a Unix process, from the Elixir documentation:

"Processes in Elixir are extremely lightweight in terms of memory and CPU (unlike threads in many other programming languages). Because of this, it is not uncommon to have tens or even hundreds of thousands of processes running simultaneously."


Then again, there's NIF libs with Threads for those tasks which are long running and require computational performance. Last I checked all the really fast math libraries were written in C/Fortran/C++ not Java


It's surprising that the BEAM support for operations and management is very rarely mentioned. To me this is the key selling point for using BEAM vs JVM or something else.

Being able to open a remote console and do system introspection/tracing/profiling/debugging is a huge advantage when running in production. And all languages running on top of BEAM ofc get this for free.

In my experience, running JVM in production with tools like JProfiler/VisualVM/jconsole, etc. does not come close to the BEAM when trying to understand what is happening in the system.


> In my experience, running JVM in production with tools like JProfiler/VisualVM/jconsole, etc. does not come close to the BEAM when trying to understand what is happening in the system.

Then you haven't tried Java Flight Recorder/Mission Control or the new javosize. BEAM doesn't come close... :)


I hadn't heard of javosize before, looks interesting, thanks. Being able to update code and data on a live system is very useful and I haven't seen that for the JVM before (BEAM of course handles that :) ).


Well, there are a lot of JVM tools to inject code into a running application. Take a look at Byteman (which is much more mature than javosize, but mostly targets injecting traces for live debugging purposes)


"It’s not going to be too much longer before we declaratively describe out systems as well as our code. I am looking forward to that."

Amen! Been doing that to the extent possible for a while and it is terrific!


It's been done before to varying degrees. The original in automatic programming was one that took input from case tools and autogenerated a lot of COBOL. Sun's DASL language from ACE project was a domain-specific language for specifying a type of web application. Around 9-10kloc of it autogenerated 100+kloc XML, client code, server code, etc. Lots of them in the 4GL category for database manipulation with WINDEV/WEBDEV more general-purpose yet still requiring coding in BASIC-like language.

So, it's not far-fetched. It will likely be a series of DSL's like the above or iMatix's model-driven development approach. These would specify it at a high level with precise requirements and constraints. Then, planning software with heuristics would produce the code. Similar systems for integration. Several people's worth of work or 10-20 tools becomes one person with one set of tools. Doubt we'll replace the person or need for some programming tools.


I definitely agree that we'll always need people who think like programmers. We can develop tools in the vein of those you mention to significantly enhance the productivity of those people though. I haven't seen many tools like that that help in distributed systems or that allow one to easily visualize and understand an entire system.


That's hard. Good news for you is that there's plenty of ongoing work on toolkits and methods for doing that. All alpha quality for now. We'll just have to wait.


Here's a tool that you can play with to see how well Elixir scales with an embarrassingly parallel task (matrix multiplication) when throwing more CPU cores at it: https://github.com/a115/exmatrix


Yes, Elixir does well here, but I still prefer Lisp syntax. I would like to see a comparison of LFE and Joxa. Joxa seems more like Clojure. This presentation is a good one, but I'd like to see a nut and bolts comparison with side-by-side code:

http://www.slideshare.net/BrianTroutwine1/erlang-lfe-elixir-...


BEAM is a very nice VM (albeit rather slow compared to HotSpot or V8), but I don't understand why every mention of BEAM has to spread misconceptions about the JVM:

> In many systems, Java included, the Garbage Collector (GC) must examine the entire heap in order to collect all the garbage. There are optimizations to this, like using Generations in a Generational GC, but those optimizations are still just optimizations for walking the entire heap. BEAM takes a different approach, leveraging the actor model on which it is based: If a process hasn’t been run, it doesn’t need to be collected. If a process has run, but ended before the next GC run, it doesn’t need to be collected

Well, how does BEAM know which process ran (so that its garbage should be collected)? Bookkeeping, of course, and that is also "just an optimization". Similarly, if a JVM object hasn't been touched since the last collection -- it doesn't need to be examined.

> If, in the end, the process does need to be collected, only that single process needs to be stopped while collection occurs

And new HotSpot GCs rarely stops threads at all for more than a few milliseconds (well, depending on the generation; it's complicated), collecting garbage concurrently with the running application, and other JVMs have GCs that never ever stop any thread for more than 20us (that's microseconds or so).

While BEAM's design helps it achieve good(ish) results while staying simple, the fact is that the effort that's gone into HotSpot gets it better results for even more general programs (collecting concurrent, shared data structures -- like ETS -- too).

I've said it before and I'll say it again: Erlang is a brilliant, top notch language, which deserves a top-notch VM, and the resources Erlang/BEAM currently have behind them are far too few for such a great language. Erlang's place is on the JVM. JVMs are used for many, many more soft-realtime (and hard-realtime) systems than BEAM, and yield much better performance.

An implementation of Erlang on the JVM (Erjang) done mostly by one person, was able to beat Erlang on BEAM in quite a few benchmarks, and that was without the new GCs, the new (or much improved) work-stealing scheduler and the new groundbreaking JIT (which works extremely well for dynamically-typed languages[1]).

OpenJDK could free Erlang programs from having to write performance-sensitive code in C (so many Erlang projects are actually mixed Erlang-C projects). While Erlang can be very proud of how much it's been able to achieve with so little, instead of fighting the JVM (or, rather, JVMs), it should embrace it. Everyone would benefit.

[1]: https://twitter.com/chrisgseaton/status/586527623163023362 , https://twitter.com/chrisgseaton/status/619885182104043520


The thing I don't get about Erlang and BEAM is the idea that having lots of little processes means that your program will scale brilliantly to run in parallel.

Programming Erlang (authored by the creator of Erlang) says without any qualification at all that "Concurrent programs are made from small independent processes. Because of this, we can easily scale the system by increasing the number of processes and adding more CPUs."

When I read that I was expecting it to be followed by "ha ha... not really because of algorithmic sequential dependencies and Amdah's Law of course!" but it isn't!

You can have an infinite number of processes but if the dataflow graph they form doesn't have any parallelism then Erlang and BEAM aren't likely to be able to work any magic to make them so. Even if it did have parallelism it is only going to have so much and you certainly won't be able to arbitrarily scale it beyond that by increasing the number of processes.

What's more the typical advice about mutable shared state in Erlang is to encapsulate it safely in an process - which seems to be a recipe for further serialisation to me and so a crazy thing to promote!


Everything you are saying is technically correct. The issue is that Erlang is trying to solve a different problem than you are describing. It sounds like you are hoping to perform some large but single task and are disappointed that Erlang can't defeat the Amdahl limitations inherent in your task. That's not Erlang's goal.

Erlang's goal is to take problems that are embarrassingly parallel in theory and make them embarrassingly parallel in practice. Serving a billion independent http requests in a distributed, parallel manner can technically be done in Java or C or assembly. But, it's very hard to do well and very easy to screw up in painful, confusing, life-wasting ways. Erlang makes it much easier to do well and much harder to screw up.


> Erlang makes it much easier to do well and much harder to screw up.

That's a feature of the language, not the VM (compare with Clojure, that does a similar thing on the JVM). You could still do all that on a higher-quality VM (simply because the effort put into it is orders-of-magnitude more than into BEAN; not because OpenJDK's people are smarter or anything).


That's a feature of the language, not the VM

Lies.

One way to look at the Erlang SMP VM is basically as a load balancer. Erlang automatically migrates processes between cores for maximum concurrent efficiency. It's basically coordinating N "tiny" Erlang VMs across all your cores and knows how to, ideally, optimally place your workload. You can even constrain the behavior to a per-core and per-scheduler level with VM options—not language options. See the +S and +SP and +SDcpu and +SDPcpu and +SDio and +sct options at http://www.erlang.org/doc/man/erl.html

There are tradeoffs between being the best language for a task and being the fastest language for a task. The more work you can move into the VM, the less you have to do as an application programmer, but potentially the slower your program may go since the VM has to discover or introspect your actions instead being told explicitly.

Our bottlenecks these days are programmer time and programmer thought correctness. Generating more work for programmers by making them write lower level code isn't the way forward even if the more work is slightly faster.

All that being said, everything has a price. Obviously never do numeric computing work in regular Python. Grab you some numpy or GPU frameworks. In the same vein, never do massively concurrent programming without Erlang or without a highly optimized event loop (but with an event loop you're limited back to one core, and on modern 48+ core systems, that's kinda pathetic).


> Erlang automatically migrates processes between cores for maximum concurrent efficiency. It's basically coordinating N "tiny" Erlang VMs across all your cores and knows how to, ideally, optimally place your workload. You can even constrain the behavior to a per-core and per-scheduler level with VM options—not language options. See the +S and +SP and +SDcpu and +SDPcpu and +SDio and +sct options at http://www.erlang.org/doc/man/erl.html

That's amazing, except that that's the work of Erlang's wrok-stealing scheduler, and as it happens, the JDK currently has the best work-stealing scheduler around.

> Our bottlenecks these days are programmer time and programmer thought correctness. Generating more work for programmers by making them write lower level code isn't the way forward even if the more work is slightly faster.

Why more work? I am for Erlang. Keep using Erlang. Just run it on the JVM. It's the same work with better results.


If I could get the JVM's JIT & serial GC performance combined with the BEAM's trivial-cost threads & thread-segregated GC, it would be sweet indeed.


> serial GC performance

HotSpot hardly ever uses a serial GC anymore. It's now parallel or parallel and concurrent.

> thread-segregated GC

You don't really want that if a shared-heap GC can buy you better performance because it's more mature and saves you all the copying.

> BEAM's trivial-cost threads

You can have that on the JVM.


That would be a nice combination.


but it isn't!

Chapter 26 of Programming Erlang, 2nd Edition, Programming Multicore CPUs, quite explicitly notes the problem of avoiding sequential bottlenecks, and even devotes an entire exercise to parallelizing a sequential program.


> brilliantly to run in parallel.

I just want to clarify it's for concurrent not parallel.

Erlang doesn't promise parallel. You can get parallel from concurrent but not the other way around and once again Erlang only enable concurrency and you may get parallel beacuse of concurrency.


I can keep quoting from the same book:

> Here’s the good news: your Erlang program might run n times faster on an n-core processor—without any changes to the program.

Sounds hopeful - and they've qualified it with might which is good.

> But you have to follow a simple set of rules. If you want your application to run faster on a multicore CPU, you’ll have to make sure that it has lots of processes, that the processes don’t interfere with each other, and that you have no sequential bottlenecks in your program.

Oh right, so as long as I have no dataflow dependencies it'll scale easily - but that's true for any language. The problems we have are when there are dependencies - and Erlang doesn't have a good solution for that in my opinion.

> Even if your program started as a gigantic sequential program, several simple changes to the program will parallelize it.

Several simple changes can parallelize an arbitrary sequential program? That's amazingly strong and obviously incorrect.

Also "you can get parallel from concurrent but not the other way around" that's not true! Vector instruction sets allow parallelism but not concurrency.


That's not true for any language. It depends heavily on how that language is interpreted or compiled to machine code. A number of languages support threads but have eg GIL's. We have to judge each on a case-by-case basis. Both the Erlang language and BEAM designed specifically to support this. Here's some details for you in a JVM comparison:

http://ds.cs.ut.ee/courses/course-files/To303nis%20Pool%20.p...


Unfortunately, that comparison lacks a lot of pertinent information on the JVM like new GCs, new schedulers, new, better JITs and various lightweight-thread implementations (it does mention Quasar, but doesn't understand that it works just like BEAM. BEAM also instruments your code, and in BEAM you can also accidentally block an entire kernel thread by calling a library not written in Erlang: just as you would if you were running Erlang on the JVM.

Like I said in another comment, BEAM's Erlang specificity does not mean that it's the best VM for Erlang; all data points to HotSpot (today) being a much better Erlang VM. It just means that a reasonable VM for Erlang could be developed with relatively little effort.


I think you're a little too excited about JVM's to see the point of the comment. The parent comment included a statement that code in any language can scale without dataflow dependencies. I said it actually depends on the implementation of that language as some don't realize that potential or even defeat it entirely. Gave an example with GIL followed up by examples in BEAM and JVM where implementation mattered.

Far as JVM vs BEAM, it's actually orthogonal to my comment as it would only support that implementation decisions matter for scalability on top of language's inherent traits. You've been arguing that yourself except on other side.


Per your last comment - that is what he was saying. Vector instruction sets allow parallel, which does not imply concurrency.


Ah yes you're right sorry. I was thinking without reading carefully that he was saying you can't have parallelism without concurrency.


> The thing I don't get about Erlang and BEAM is the idea that having lots of little processes means that your program will scale brilliantly to run in parallel.

The point is scaling. Think in terms of request rate. If you know you can have millions of processes per machine and they run well in parallel, then you can handle requests with processes and stop worrying.


Sure, but you can have those millions of processes on the JVM, too and scale better because of better access to shared data than ETS.


Erlang generally encourages shared-nothing architectures. Of course, in some cases you want regions of shared memory or some other concurrent global resource, hence ETS. I see nothing wrong with ETS, it's well optimized for the Erlang term format in particular and gives you serializable updates.

Scalability and large actor counts aren't the definitive features of Erlang, though. It's supervision trees, the distribution protocol, the OTP framework, the primacy of tuples and lists as your main and highly flexible data types, a great pattern matching engine for binary formats and regular Erlang terms alike, module-level hotswapping, a crash-only programming model, the ability to have external programs benefit from Erlang semantics via external nodes and ports, so on and so forth.

Yes, not all of this is thanks to the VM in of itself. A lot of it is runtime and language features.

But it's already there in a cohesive whole. There is absolutely no reason to switch to the JVM when the EVM is a beast of its own.


> Erlang generally encourages shared-nothing architectures.

This is the language feature that `pron` keeps mentioning. Nothing about the VM is especially better for this than the JVM for instance.

> I see nothing wrong with ETS, it's well optimized for the Erlang term format in particular and gives you serializable updates.

There isn't anything wrong with it (as a complete neophyte to ETS and the EVM generally), the question is how much better it could be if it was on one of the several first rate JVMs that get so much more resources poured into them. Sharing data concurrently is precisely what the JVM is good at (especially at very large data set size). So in the cases where you need to use something like ETS, there is a lot of potential for improvement on a JVM vs EVM.

> But it's already there in a cohesive whole. There is absolutely no reason to switch to the JVM when the EVM is a beast of its own.

I don't want to speak for `pron` but I suspect what he is getting at is, the combination of the Erlang full story on the JVM would be a phenomenal bit of tech and it would be much easier (and more likely) for the Erlang bits to get ported to the JVM than it would be to bring the EVM up to the standard of any of the best JVMs.


It would be a phenomenal bit of tech anywhere. We had to go with Erlang for one piece of our product at Plum precisely because there is no analog story in Haskell to the Erlang full story. I would have loved to be able to build that specific piece in Haskell but it made little sense when considering what Erlang/OTP provides.

It's nice to say that the JVM has more resources and is better at XYZ while the BEAM VM is only better at ABC, therefore the Erlang full story should be on the JVM to reap the benefits of both; however, I think that would be unhealthy for Erlang. Different VMs present specialized focuses and I think the areas that BEAM is lacking in can be tackled and brought up to parity instead homogenizing the VM-field and adding to the kitchen-sink that the JVM already is.


I think the areas that BEAM is lacking in can be tackled and brought up to parity

Probably. But why not spend that effort on the language and libraries?

instead of homogenizing the VM-field

That makes as much sense as saying your language shouldn't run on the kitchen-sink Linux so as not to homogenized the OS field.


> Probably. But why not spend that effort on the language and libraries?

Because the best Erlang has to offer isn't really in the Language. Some in the Libraries. Most in its VM. I would pick Haskell or ATS over Erlang - unless I need those few unique features that Erlang/OTP really got right.

> That makes as much sense as saying your language shouldn't run on the kitchen-sink Linux so as not to homogenized the OS field.

No, your analogy is moving the goal post. No one ever said someone couldn't implement Erlang the language for the JVM. As many people have pointed out to you, it isn't the language that makes Erlang. It's BEAM + OTP. We're not talking about moving the language around, we're talking about gutting the VM, my statement still holds: Exalting the JVM to be the one-true-VM for Erlang (therefore also implying any language you do not understand well that needs a VM) is a very bad idea and pretty silly.

Diversity is good. BEAM's VM is good. The JVM is good. Even the CLR is pretty amazing (F# beats the pants off of Scala). There's no reason at all to think that Erlang would be better off on the JVM; however, borrowing successful ideas from other awesome and successful technologies? I think that's a swell path to walk. Beware of the kitchen sink, though, is my only warning.

Also, I don't always think OS' are the best place to run your application. There are many arguments for using something like Erlang on Xen or HaLVM if the design requirements can justify it but arguing about your other critical statement should be a different thread.


Good points. Monoculture hurts us. BEAM people could be putting more effort into copying improvements in competition and academia. That would get BEAM performance way up there. Right now, they seem to focus more on other things with BEAM performing good enough for its users. With the diversity benefit on the side.


You would be surprised, actually. I think there's quite a bit of work going into BEAMJIT, an LLVM based JIT for Erlang that far exceeds HiPE. I won't argue with you though that the VM could use improvements gleaned from the last ten-fifteen years of implementation and research.


To me, it really just seems like a difference in labor and time. BEAM doesn't have nearly as much investment of brainpower going into it. It's also been mainstream for much less time. There's less corporate R&D working on it. So, we'll see things progress more slowly and it be behind in various ways until some of this changes.

True for most stuff out there that's not the No 1 or No 2 choice for mainstream developers. Yet, results such as HipHop and PyPy show vast improvement can be made when even one company puts a lot of effort into something. A combo of academics applying to Erlang/BEAM the best in FP compilation and best in VM architecture might bring similarly dramatic improvements. A good precedent is how I saw the Racket Scheme team knock out significant weaknesses in their compiler practically as they were posted on forums. Not saying it always happens but that sort of thing in Erlang space would be interesting.


Just to clarify, when I say Erlang should run on the JVM, I obviously mean Erlang +OTP.


> There is absolutely no reason to switch to the JVM when the EVM is a beast of its own.

I think there is, if you want to concentrate your limited resources on the language and its phenomenal libraries while letting an enormous team working on the world’s second-largest open-source project take care of the VM for you, while at the same time giving you better performance and a wider reach. There are many more organizations that would adopt Erlang if it were on the JVM.


Completely ignoring the massive resources it would take to make the switch in the first place.

Erjang doesn't cut it. It's an incomplete research project that works on the basis of bytecode translation. Further, the disadvantages with regards to global GC are clearly listed. You say that it'll only keep improving, but that's essentially taking a leap of faith that the JVM developers will eventually get to parity with a feature you already have.

"Limited resources" is a red herring and FUD, plain and simple. Nor is "wider reach" guaranteed in the slightest. Wider reach is not intrinsically a good thing, either. Organizations for whom Erlang is out of reach simply because it doesn't use the JVM are absolutely petty and there is no loss from them not using it, IMO.


> Completely ignoring the massive resources it would take to make the switch in the first place.

If I'm suggesting it, I obviously believe that the cost/benefit is worthwhile. I don think the effort required is massive.

> You say that it'll only keep improving

I say that it has already improved enough.

> Organizations for whom Erlang is out of reach simply because it doesn't use the JVM are absolutely petty

Not petty, but rational. Those organizations already have millions of lines of code, and lots of knowledge and experience on the JVM, and the reasons for choosing Erlang aren't compelling enough given the adoption cost. But if you lower those costs...

All I'm suggesting is simply lowering the adoption costs of an upcomer, rather niche tech with a small ecosystem (which would also, I'm convinced, considerably improve the tech).


> [...] but you can have those millions of processes on the JVM, too [...]

In theory, yes. In practice, it's very difficult. JVM thread corresponds to a thread of host OS. Spawning and keeping those is very slow and expensive compared to Erlang's processes. You could have a pool of pre-spawned workers, but suddenly you can't spawn a worker for each connection and hope it will all work; you need to manage the pool. You also could try to implement green threads in JVM, as they are in Erlang, but you would need an entirely new compiler to insert yield points at appropriate places, or else you would get exactly the same problems with green threads as everywhere else.

Under JVM you just don't spawn a thread for each and every activity, because you would choke your system. The whole point of developing Erlang was to allow exactly this programming style in a manner safe against processing congestion.


> you need to manage the pool. You also could try to implement green threads in JVM, as they are in Erlang, but you would need an entirely new compiler to insert yield points at appropriate places.

I am talking about a new compiler. An Erlang compiler. The JVM is a virtual machine. Erlang is a language. We've already proven you can run Erlang on the JVM quite well, and that was before many pertinent improvements to the JVM and its ecosystem.


JVM still lacks some serious functions, namely, links and monitors. I'm not that sure you can emulate those on JVM reliably without implementing them as fundamental operations.

And then, there's still problem of interoperability. Even when you write your code in Erlang@JVM, it still needs to talk to Java code, which doesn't have yield points.


> JVM still lacks some serious functions, namely, links and monitors. I'm not that sure you can emulate those on JVM reliably without implementing them as fundamental operations.

You don't emulate them; you implement them -- just as BEAM does. Having them baked into the runtime serves no purpose. The JVM operates at a much lower level than BEAM -- just as BEAM is implemented in C, it could be implemented in Java, except that the really hard parts (JIT and GC) are already taken care of. Think of Java as C + JIT + GC.

> it still needs to talk to Java code, which doesn't have yield points.

That's not a problem. First, Erlang code talks to C code, which doesn't have yield points, either. Second, the JVM doesn't need to rely on yield points as much as BEAM does, because it is much more kernel-thread-friendly than BEAM.


How does Akka stack up in regards to providing these attributes on the JVM?


Akka actors are multiplexed on real JVM threads. So they implement a cooperative model of threading (i.e. you'd better not block for long inside the body of an actor)


Unless Akka provides a compiler, it doesn't allow much of the style Erlang was developed for.


I'm guessing you're the Quasar guy or am I off? Either way, I think you have benchmarked the lightweight thread approaches on JVM's. How much simultaneous concurrency can the JVM methods manage right now for say serving web requests? And how much does Erlang's best do on same machine?

I think that's an interesting and useful comparison point to start with to test your claim. This is also something I figured Java side would greatly improve on.


No, Erlang does not mean that anything you write will scale. Your solution has to be broken into parallelizable pieces. But, the scalability of your solution is only as good as the mechanisms within the language and runtime to efficiently allow developers to create a scalable system. BEAM implements several nice features:

Data is immutable, so we don't have to worry about keeping data coherent between.. anything. Whether it is two processes or two nodes. New data is can be constructed with reference to old data without fear that the old data will be modified. So, "mutation" is really just new data with a reference to the old unchanged data. This greatly lowers the churn in creating new data. It also means everything can just pass (process to process or node to node) what it has without feature it will be out of date.

Everything is defined in modules. Modules define what we would think of in OOP as namespaces, structures, classes/types, and class functions. Importantly, they only define functionality. Modules do not have state. Therefore, functions accept some set of inputs, create new data from the inputs (no mutation), and return some output. This makes it very easy reason about what the code is doing if you keep the modules well defined and reasonably sized. This code can be shared around easily, too. It's got no state and is immutable.

Processes are an abstraction. You can think of them as thread, but they're really just a stack and and a little book keeping. A BEAM VM will normally of real threads equal to the number of CPUs in the machine. Each real thread will then exclusively pick a process, load the book keeping, point itself to the stack, and execute bytecode for a period of time. When done, it will mark the changes in the book keeping, and move to the next process. This is very lightweight, so literally millions can run on a single computer. Because they are self contained, they're easy to clean up. Processes also expose standard set of interfaces for communication, a pub/sub system. Again, immutable messages are sent back and forth. So, it doesn't matter if it's the same node or not.

Finally, everything is abstracted to the notion of nodes with in a cluster. By default, anything you executes on the local node, but you can specify otherwise. I can execute a module call on another machine or spawn a new process on another machine. It just means a little more information in the call, but it's the same exact concept programmatically. Also, it's possible group processes into named services. You can call a named service and it will know what processes to contact. It's a very low barrier to entry to parallelize your code if you just write it that way.

When you start thinking in terms of how structure you code for BEAM, you inherently get easy access to scalability.


> Data is immutable...

But you don't need that at the VM level. Clojure does that on the JVM. Having that at the VM level makes a simple GC work reasonably well, but HotSpot has world-class GCs that perform better, even without the assumption of immutability.

> Everything is defined in modules.

Again, that's a language-level feature.

> Processes are an abstraction

You can get that on the JVM, too.

> Finally, everything is abstracted to the notion of nodes with in a cluster.

That's the runtime library's concern. Not the VM's.

> When you start thinking in terms of how structure you code for BEAM, you inherently get easy access to scalability.

All of that is great, but implementing those features at the language/library level and harnessing HotSpot's power would give you that same easy access to even greater scalability.


Hey man, some people (like me) just don't like to work on the JVM, despite it's advantages and superior features like the GC. Just accept it.

Having worked with java, scala, jruby and closure, something always is clumsy, be it interfacing with java cruft, slow startup times of the vm, maven & co... While there are solutions to fix the solutions, its just annoying for me. I get what you say, but nevertheless are JVM (and .NET) based things nothing i'd use (except i'm forced to do so).


> Hey man, some people (like me) just don't like to work on the JVM, despite it's advantages and superior features like the GC. Just accept it.

I accept it, but the fact of the matter is that -- like it or not -- there are at least two orders of magnitude more people who use the JVM than BEAM. You're comparing the world's most popular runtime with a runtime that's not even in the top-ten.

Some of your complaints stem from exactly that difference -- the JVM is designed to operate much higher workloads than BEAM, and people use that -- hence Clojure's slow startup etc (the JVM starts up in < 80ms, BTW). But, again, your observations don't change the fact that if Erlang stays on BEAM it will forever be a niche language.


Becoming ultra popular isnt an advantage at all. Look how much crap gets produced with JS or PHP, tons of crappy libraries shallow the few good ones, people write throwaway code like there is no tomorrow and real (tm) developers have to maintain this mess. And yes, I have seen code in my day job written in Java or Scala that was nearly impossible to even understand. Much Code.

Again it's obvious that the JVM is dramatically more in use than BEAM, but I like it so far. Up to now people only stumpled upon erlang when they actually needed it, now with elixir & co a few others discover that BEAM might be exactly what they need. This shows in the ecosystem, and the community of #elixir is by far the nicest I have met so far.

Just personal experience, yours might differ (obviously). Just to reiterate: Becoming too popular results nearly everytime in garbage for everyone. Just look how much stuff gets crammed in JS nowadays.


The point of erlang isn't beating micro benchmarks. Everyone and his dog knows that other techs have better performance.

The ease of scaling across machines, fault tolerance and low latency variation are more typical selling points. Besides that, god prevent erlang to become just-another-JVM-language, I embrace competition.


> The ease of scaling across machines

What does that have to do with the VM implementation?

> fault tolerance

True, that is a good selling point -- in theory. Indeed, BEAM's process isolation is better than the JVM's on paper. In practice, so many Erlang systems have so much C in them (because Erlang isn't fast enough for the data plane), that they can still bring down the entire VM (not as if there aren't other ways of doing that even without native code), or they interfere with one another in other ways because of BEAM's poor support for shared concurrent data structures.

> low latency variation

Nothing that can't be achieved on the JVM. Much of the low-latency Erlang enjoys is because relatively little data is kept on the Erlang heap anyway, and whatever significant amount of data is kept on the Erlang heap, it's in non-GCed ETS. If that's your way of achieving low latency variation, Erlang can do better on HotSpot.

> Besides that, god prevent erlang to become just-another-JVM-language, I embrace competition.

If your goal is not to have the best language environment you can but to show the world you have impressive results for the effort you've put in, then that's a whole other discussion.

And if all you want is competition, you can have Erlang on BEAM and the JVM. Why tie the language to one VM? Many JVM languages also compile to JavaScript, too (Clojure, Kotlin, Scala, Fantom and probably more)


Technically the distribution protocol is a property of the runtime system, but it does cooperate with the VM for details like term serialization.

True, that is a selling point, except that so many Erlang systems have so much C in them (because Erlang isn't fast enough for the data plane), that they can still bring down the entire VM (not as if there aren't other ways of doing that even without native code).

This is only the case for NIFs and (linked-in) port drivers, i.e. the facilities that are dynamically linked into the runtime. Regular ports which rely on a byte-oriented interface and are controlled by an Erlang process are safe, as are external nodes (typically, but not necessarily C nodes) which use the erl_interface libraries for marshalling/unmarshalling into and from Erlang terms, and thus can be treated from the programmer's perspective like they're regular Erlang VM nodes.


> True, that is a good selling point -- in theory. Indeed BEAM's process isolation is better than the JVM's on paper. In practice, so many Erlang systems have so much C in them (because Erlang isn't fast enough for the data plane), that they can still bring down the entire VM (not as if there aren't other ways of doing that even without native code), or they interfere with one another in other ways because of BEAM's poor support for shared concurrent data stuctures.

From my limited knowledge of Erlang, both of your points seem to be off the mark:

- Erlang systems have a lot of C code, but the purposes of BEAM was to manage those individual pieces and have those C component isolated such as they can crashed without causing system-wide issue.

- Actor model does not require programmer concurrent data structure to be effective. Actually that's the whole point of it...


> Actor model does not require concurrent data structure to be effective. Actually that's the whole point of it...

That's complicated. Actors don't require concurrent data structures for the bits that don't require concurrent data structures but they do for the bits that do :) That's why Erlang has ETS. That's why you still need a database.


No it's not. The proper way to design systems is breaking them into pieces, implementing each one the best way for that piece, and connecting them together. So, if one can't do DB's with Actors, do it with another tool or model (eg Eiffel/Java w/ SCOOP). Use Erlang/BEAM for what it's good at. There's also tools such as ZeroMQ that make the integration fast & easier. You get the best of both worlds.


All the more reason to use Erlang/JVM alongside other JVM languages. Polyglotism is one of the JVM's greatest strengths, and interlanguage interoperability on the JVM is the best you can get.

And Erlang/JVM can be even better than Erlang/BEAM at everything Erlang/BEAM is good at. There is nothing in BEAM that makes it more appropriate for running Erlang than HotSpot (that may have been true in the past, but that's no longer the case). BEAM's Erlang specialty simply means its development required relatively little effort to run Erlang reasonably well. It doesn't mean the JVM can't run Erlang better, and at this point in time we have every reason to believe it can run Erlang much better than BEAM.


"Polyglotism is one of the JVM's greatest strengths, and interlanguage interoperability on the JVM is the best you can get."

It's not the best you can get but polyglot support is a strength.

"There is nothing in BEAM that makes it more appropriate for running Erlang than HotSpot (that may have been true in the past, but that's no longer the case)."

Still got it beat on latency and security risk per dollar spent: zero for Erlang w/ 5 digits for RT-JVM. If these aren't an issue, then Erlang on JVM may indeed be a superior option. I can imagine many use cases where it would be.


> zero for Erlang w/ 5 digits for RT-JVM

RT-JVM's guarantee scheduling latencies of 2us. Erlang doesn't come close. In practice, stock HotSpot has better latency than Erlang. Erlang's "guarantees" are only in effect when 1) no native code is used, 2) no global effects are used (some process registrations, binary heap, etc.).


When a NIF causes a crash it does take down the whole VM. I think the way to isolate NIFs is have them on their own nodes.


Kinda hilarious since we've had completely isolated ports for a loooong time. For non-data-intensive tasks, don't underestimate how fast speaking erl_interface over a unix pipe can be. Plus, free isolation, free fault tolerance and free supervision restarting without any networking required.

http://erlang.org/doc/reference_manual/ports.html (15.1)

Here's one of my old examples (no guarantees to its current effectiveness or correctness): https://github.com/mattsta/libgeoip-erlang/blob/master/c_src... — then the whole thing is opened and run from Erlang like https://github.com/mattsta/libgeoip-erlang/blob/70b58ef5ef8a...

It's just cleaner to stay outside direct VM linkage as much as possible.


Do you know of a chart that compares all of these dimensions across languages/VMs?


What you're saying is a VM with tons of R&D, tons of corporate investment, and a focus on speed was faster than a new one that was Ericsson's side project focused on stuff other than speed? Little surprise. Meanwhile, BEAM and its language have been doing exactly what they're designed for with enough success that it's mainstreamed naturally. Which Java didn't.

Truth be told, most of the crowd using BEAM doesn't care if it's a bit slower than Java. They just want easy scaling, distribution, and fault-tolerance. A different code-base than Java's is a plus in terms of increasing implementation diversity and avoiding the bullseye currently on Java.


> avoiding the bullseye currently on Java.

That bullseye exists only in the minds of some HNers. Here is a very (very!) partial list of companies running primarily or largely on the JVM: Google, Twitter, Netflix, LinkedIn, Box, IBM, SAP, Amazon, eBay.

> They just want easy scaling, distribution, and fault-tolerance.

... So they write chunks of their code in C. That would be completely unnecessary if they'd just run Erlang on the JVM.


Oh no, it exists in the form of CVE's and actual compromises. Java compromises were coming into my news feed at a higher rate than Windows despite all its native code. Got bad enough that Krebs on Security simply recommended taking Java off one's machine unless they absolutely need it. Nice that you named many top tech firms with smart, well-funded NOC's and security teams as the counter-example. Harder for individuals and smaller firms than simply installing what works and hardly anyone is attacking. ;)

On second point, Erlang on the JVM currently gives same scaling, real-time properties, easy distributed apps, and availability as BEAM? And without one of the commercial VM's you seem to be assuming (but not stating) for eg real-time? If so, you might have a strong argument on that end. You just need to demonstrate each with examples of real-world apps running in Erlang on each. More people would accept your claim if you demonstrated it.


> Java compromises

99% of those are Java browser plugin compromises and have nothing to do with server-side Java.

> Erlang on the JVM currently gives same scaling, real-time properties, easy distributed apps, and availability as BEAM?

As currently there is no "Erlang on the JVM" (other than a little-maintained experiment), the answer is no. If you're asking whether an Erlang implementation on current JVM would do that, the answer is no, either; it's not the same, but much better (otherwise I wouldn't have suggested it).

> And without one of the commercial VM's you seem to be assuming (but not stating) for eg real-time?

Yes, of course. Azul would be much, much, much better. Stock HotSpot would just be much better.

> More people would accept your claim if you demonstrated it.

Obviously, demonstrating it would require some effort, and as the Erlang ecosystem is so small, there is little reason for the JVM ecosystem to prove it (growing the ecosystem by another 0.1% doesn't justify the effort, no matter how small). In the meantime, you can look at the Erjang, bearing in mind that it doesn't use the new JIT, new GCs (although you can try), and new scheduler.

OTOH, if Erlang wanted to expand its reach by orders of magnitude, someone in that community should give it a try. Obviously, the fact that the world's largest, most technically-savvy companies rely on the JVM means that it's a good choice. If the Erlang community doesn't want to try -- hey, it's their loss... If they want to convince themselves they're making the right choice, that's fine by me, too. But if they want to do that by believing (and perpetuating) false notions regarding HotSpot, I'm just correcting their errors.


Exactly. Erlang + BEAM vs Erlang + Java is all speculation at this point. I already told you I think HotSpot would perform better than current BEAM minus determinism. Latest updates might or might not help on that. Meanwhile, they can use what they have with diversity benefit or ride along with Oracle's I.P. with associated benefits and risks. Either is a valid choice with me favoring diversity and trying new designs.


How many end users have an Erlang runtime installed that is invoked with untrusted code by way of a browser plugin? If there are no potential targets, don't be surprised about a lack of published Erlang VM exploits.

I'd guess that the number of JVM CVEs is in the same ball park as the other sandboxing platforms, Web browsers (JavaScript) and Flash.

In 2015 it's good advice to uninstall JVM (and Flash!) browser plugins, since they provide negligible value with current browsers. But generalizing that to the server side, where all code that runs is trusted, is dubious.


It's a benefit I call security by economics. The bad guys focus limited energy to produce attacks with maximum ROI. Makes them aim at most popular stuff. Simply choosing less mainstream, yet high quality, tech avoids many attacks as a side effect. Erlang is currently benefiting from this. So, I list it as a side benefit over Java.

It's not a high assurance system designed from ground up for security. It's a commercial system designed for availability. It will have plenty of flaws for malware writers to find. Meanwhile, they ignore it and smash Java instead. Gotta be a weight off Erlang crowd's mind.

Truth be told, I'd be getting my codebase in secure shape during such a time. Would look higher quality when attacks appear.


As a point of disclosure, and also to improve your credibility, you are the author of Quasar (actors and erlang-style processes on the JVM), are you not?

I'm just learning Elixir (and therefore erlang/BEAM somewhat) and one thing that's cool to me is that a piece of code that's taking too long to execute can be paused by the VM while it switches to another thing, which keeps the latency down. I think, like, each process has some number of "ticks" or something before it switches away.

Can erlang on the JVM do that?

Edit: Also, the other thing that majorly attracts me to Elixir/Erlang is OTP (applications, genservers, supervision trees with restart strategies, etc). Are there any plans to port those libraries/philosophy into Quasar?


Yes, I am Quasar's main author, but note that I'm not advocating Quasar. I'm advocating Erlang, only on the JVM. And yes, Quasar has all those features, too, but an Erlang implementation on the JVM would use the Erlang implementation, not Quasar's Java implementation.

> Can erlang on the JVM do that?

Of course it can! Just like BEAM does it. (In fact, Quasar used to do that, too. We took out that feature because Quasar also gives you access to kernel threads, and processes that take to long can just be moved to kernel threads, which does this kind of preemption better, anyway. But an Erlang implementation on the JVM can behave just as Erlang does on BEAM).


What JVM stops threads for ~20 micros?


http://www.azulsystems.com/products/zing/whatisit

8000$ per machine, though.

The G1 collector that will be made default in Java 9 might make some applications effectively pauseless too on some workloads.


This is a recurring theme. The Java proponents' alternative to the FREE BEAM VM is $8,000 per machine or some other high priced stuff. I could afford an extra IT person just by switching to BEAM on several machines!? Sign me up!


The pricey alternative is if you want the absolute best. The free alternative is already (with high probability, based on Erjang and more recent improvements) much better than BEAM. Not to mention the costs you save by not writing C code and having an immense selection of high-quality libraries to choose from.


Do you have proof that Erjang is better than Erland/BEAM with real-world benchmarks? Far as libraries, I'd agree with you that there's more. The quality part varies per library.


I know Azul, it doesn't guarantee pauses of 20 micros; <10 millis maybe. G1 isn't even close.


Here is the claim they make:

> Proven to deliver consistent latencies in the 10s of microseconds

http://www.azulsystems.com/products/zing-performance-data

-----------

For workloads where heap is at 100G they still claim latency under 20ms at the 99.999%.

http://www.azulsystems.com/sites/default/files//images/Azul_...

-----------

Now, I would be interested in an independent benchmark.


Indeed, they have good marketing :). Don't get me wrong, I think Azul and its C4 GC is very nice. However, beyond some absolute best case on smallish heaps, I do not see it hitting guaranteed 20 micros on large server heaps. Also, Azul sacrifices some throughput in favor of minimizing GC latency, which is fine as most things need tradeoffs but should be mentioned (G1 also has lower throughput than parallel due to heavier write barriers).


The Azul guys told me they treat any pause > 20us (or maybe 40us, I don't remember exactly) as a bug.




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: