Ref link: http://blog.erlang.org/retired-pitfalls-22/
Calling 50s to .4s a “nice improvement” is an awesome understatement!
I’ve heard that the BEAM VM byte code is not well documented (maybe in comparison to JVM bytecode), which is one reason that Elixir transpired to Erlang first; is this true?
So saying that Elixir is transpiled to Erlang is incorrect: it compiles to Erlang AST.
> In OTP 22 we have completely re-implemented the lower levels of the Erlang compiler.
Meaning some performance improvements for bit syntax which in Erlang is used [EDIT: in some of the newer, external code; there are still many places where charlists are used] for strings. Some string BIFs got faster. This should translate directly to Elixir and its string handling functions performance.
> OTP 22 comes with a new experimental socket API. The idea behind this API is to have a stable intermediary API that users can use to create features that are not part of the higher-level gen APIs.
Other than the ease of extending the API there are some performance improvements, too.
> PR1952 contributed by Kjell Winblad from Uppsala University makes it possible to do updated in parallel on ets tables of the type ordered_set. This has greatly increased the scalability of such ets tables that are the base for many applications, for instance, pg2 and the default ssl session cache.
More concurrency, better performance.
> In OTP 21.3 the culmination of many optimizations in the ssl application was released. For certain use-cases, the overhead of a using TSL has been significantly reduced. [...] The bytes per second that the Erlang distribution over TSL is able to send has been increased from 17K to about 80K, so more than 4 times as much data as before.
Everything relying on ssl got faster.
> On OTP 22 the logging facility for ssl has been greatly improved and there is now basic server support for TLSv1.3.
Not sure about that, don't know the details of TLS. More/better logging is always welcome.
> In order to deal with the head of line blocking caused by sending very large messages over Erlang Distribution, we have added fragmentation of distribution messages in OTP 22. This means that large messages will now be split up into smaller fragments allowing smaller messages to be sent without being blocked for a long time.
You can send lots of data over to other Erlang nodes without chunking them yourself - other messages will still go through anyway. Great if you have such messages in your app.
> Three new modules, counters, atomics, and persistent_term, were added in OTP 21.2. These modules make it possible for the user to access low-level primitives of the runtime to make some spectacular performance improvements.
> For instance, the cover tool was recently re-written to use counters and persistent_term [...] now it uses counters and the overhead of running cover has decreased by up to 80%.
> A fun (and possibly useful) use case for atomics is to create a shared mutable bit-vector. So, now we can spawn 100 processes and play flip that bit with each other
More tools for performance optimization is always welcome. Shared bit vector is an impressive example, but I'm not sure what else could be done with the modules, I didn't read their docs yet.
> In OTP 21.3, the version when all functions and modules were introduced was added to the documentation.
Great. It's implemented for many other langs' docs and is useful. All the docs were regenerated and the info about the version which introduced them was added to all the functions all over the place.
> In OTP 22 a new documentation top section called Internal Documentation has been added to the erts and compiler applications. The sections contain the internal documentation that previously only has been available on github so that it easier to access.
Better docs on internals is always a good thing. Could potentially save some work for other OTP langs implementers, including Elixir.
> Each major OTP release wouldn’t be complete without a set of memory allocator improvements and OTP 22 is no exception. The ones with the most potential to impact your applications are PR2046 and PR1854. Both of these optimizations should allow systems to better utilize memory carriers in high memory situations allowing your systems to handle more load.
Good. (not sure what else I could say here...)
In general, it looks like this release is focused on the optimization of many parts of the platform. Especially string handling improvements are welcome and will affect many Erlang-based (ie. including other OTP languages if I understand correctly) applications out there. That's good. Erlang being "slow" was always its weakest point which came up in nearly every discussion about Erlang. Great to see this is being actively addressed on many fronts. I'll definitely update my apps to use 22.
Congrats and thanks to all contributors!
In Elixir, maybe, but in Erlang many strings are still charlists. Especially important is the fact that the Erlang stdlib’s Leex (lexer) and Yecc (parser) modules operate in terms of charlists, and both the Erlang and Elixir compilers use those modules. So compilation won’t be getting any faster from this optimization.
> In OTP 22, HiPE (the native code compiler) is not fully functional. The reasons for this are:
> There are new BEAM instructions for binary matching that the HiPE native code compiler does not support.
> The new optimizations in the Erlang compiler create new combination of instructions that HiPE currently does not handle correctly.
> If erlc is invoked with the +native option, and if any of the new binary matching instructions are used, the compiler will issue a warning and produce a BEAM file without native code.
Source: http://erlang.org/download/otp_src_22.0.readme (search for OTP-15596)
The BEAM team are not the maintainers of HiPE and don't have the expertise to update it, meanwhile HiPE does not have a full-time, dedicated team behind it. I won't be surprised if HiPE gets split out from the default install in a few years if HiPE keeps falling behind.
> HiPE and execution of HiPE compiled code only have limited support by the OTP team at Ericsson. The OTP team only does limited maintenance of HiPE and does not actively develop HiPE. HiPE is mainly supported by the HiPE team at Uppsala University.
I wish BEAM could be as effective as or close to the performance of JVM for Numerical Computation.
If you want to leverage highly performant numerical code, I suggest to link against a native C library from BEAM using NIFs via http://erlang.org/doc/tutorial/nif.html or connect a C program to the Erlang network via http://erlang.org/doc/tutorial/cnode.html
If you really like you can choose to compile things using HiPE which in my tests can make mathematical stuff about 10X faster... it sounds pretty unsupported these days though. Maybe just put messages on a queue and process them in golang or rust if you must have the performance, no need to let the BEAM have to handle such things directly and always good to measure where your bottlenecks are first.
You have to take immutability and multi-processing into account. Each process has a separate heap as a design assumption, so if you want to pass data between them, you must copy. Data is immutable as another design assumption, and this means that if you want to change something in a bigger data structure, you also must copy.
Theoretically, the compiler is free to optimize the latter - if it notices that some code can be safely optimized in a mutating way, it is free to do that, but then, someone needs to write that optimization - and the fact that this is not a priority for BEAM means that it is unlikely that someone will spend time to implement it if there are more pressing matters for them to do so. Then, not only it must be written, it must be proven that this optimization does not break anything.
Big binaries aren’t copied; they’re passed by reference to a shared ref-counted heap (where each process holds one reference to the binary on the shared heap, and the process’s heap can hold N references to the far reference pointer.)
In theory, more types could be made to do that.
In fact, I believe that handles passed back from NIFs actually reuse this infrastructure, presenting themselves as binaries such that sending the handle to another process is really just creating a new smart pointer in the new process-heap, holding the same raw pointer that the original NIF handle held.
And you can do a lot with that. Erlang’s new-ish `counters` module, where you get a handle to a mutable uint64 array, is essentially just a built in NIF that passes around a mutable NIF handle pointing to a uint64 buffer, exactly as above.
There’s nothing stopping the Erlang runtime (or your own NIF code) from adding all sorts of other operations against native, mutable types, and exposing them as these sorts of abstract data structure handles. You could, for example, have a `matrix` module for doing matrix math; or a `data_frame` module that can participate in every operation a Pandas.DataFrame can (with all such operations implemented as native code); or a module exposing SHM buffers; or even a module exposing GPU driver primitives (e.g. vertex/shader/texture buffers).
Everything that the bytecode would do to such ADTs would be slow-ish, but the point would be to load stuff into them in your glue code, and then in your hot loop, just bump already-loaded ADTs together in fast, native ways.
I have no idea how they prioritize the new features to be built but IMHO safely optimizing code in mutating way is the reason why Haskell is so fast despite the heavy abstraction primitives it offers.
Also, what is the fun in writing perf code in another (native) language if host language is (or can be) capable by itself ?
I used the reference of Haskell just to make a point that "Immutable Abstraction can be backed by safe mutable implementations"