Hacker News new | past | comments | ask | show | jobs | submit login
Erlang/OTP 20.0 (erlang.org)
391 points by okket on June 21, 2017 | hide | past | web | favorite | 99 comments



Some things I like from the new release:

* Dirty schedulers: This allows easy integration of blocking C-based libraries. So for example can wrap something like RocksDb and make it available to the rest of the VM easier. Or libcurl and others.

* DTLS : This lets it talk to WebRTC clients

* Erlang literals are no longer copied when sending messages : This is kinda of a sneaky one. By default (with exception of large binaries) Erlang VM usually copies data when it sends messages. However, in this case module literals (constants, strings, etc) will be another thing that's not copied. There is a hack to dynamically compile configuration values or other tables of constants as a module at runtime. So if you use that hack, you'd get a nice performance boost.

* code_change, terminate and handle_info callbacks optional in the OTP behaviors. This is very nice. I always wondered why I had to write all that boiler plate code.

Also here is a detailed list of changes:

http://erlang.org/download/otp_src_20.0-rc2.readme


> * Dirty schedulers: This allows easy integration of blocking C-based libraries. So for example can wrap something like RocksDb and make it available to the rest of the VM easier. Or libcurl and others.

Amazing. I'm new to erlang, but within the last hour just read comments by a seasoned programmer on how erlang's shiny scheduler has an achilles heel when it comes to blocking C-based libraries. I suppose then that this truly is a significant advancement.

Mention of the scheduler was in a comparison between erlang and node.js. Article here: https://notamonadtutorial.com/interview-with-jesper-louis-an...


> how erlang's shiny scheduler has an achilles heel when it comes to blocking C-based libraries.

It's an improvement but it was already possible to write C-based libraries. There are drivers, ports, NIFs were there before but for long running computation had to do your own thread and queue setup. This just avoids that bit because VM writers did it for the user so to speak (and probably did the right way).

Also it is really something advanced and not what most Erlang programmers would end up doing anyway. Once you start writing C code and directly loading it into the VM, the same caveats as before apply - that is a some fault tolerance and safety guarantees go out of the window.


Yeah, the only thing I've used it for thus far is hashing passwords e.g. https://github.com/riverrun/argon2_elixir


Exactly. Half the point of the beam is not having that type of code impact the scheduler.

Case in point, you can write a faster JSON parser in C and use it (jiffy)...but it's not desirable to pollute your BEAM for the minor performance gain.


* DTLS : This lets it talk to WebRTC clients

Could you elaborate more on this? I don't know much about erlang/OTP's support for WebRTC. Does adding DTLS makes erlang/OTP possible to talk to WebRTC clients because WebRTC requires DTLS? or this means erlang/OTP now has a complete stack and API for WebRTC communication out of the box?


DTLS is TLS (SSL) but for datagrams (UDP) instead of the traditional streams (TCP).

WebRTC uses DTLS as one of its base protocols. The other one is SRTP but even its setup requires DTLS.

I am not sure how complete the rest of the support it but having DTLS is a major step ahead.


Thanks for the clarification. I was aware of the DTLS requirement for WebRTC, but not sure the whole WebRTC stack support on erlang/OTP. Now thinking about it, maybe an erlang/elixir/whatever-on-BEAM implementation of STUN/TURN server would make sense...


I thought that the STUN/TURN stuff was just for discovery and used fairly conventional protocols...Isn't the DTLS for actual media transmission - i.e. directly between clients (or via some kind of Gateway?)


STUN is for signaling, but the hole-punching business is not 100% guaranteed to work, so you need TURN as the back up relay server. DTLS is needed for both p2p traffic and the replaying traffic through TURN, so your TURN server does need DTLS for its job.


STUN/TURN are just NAT traversal, they are building blocks in WebRTC signaling.


ETS CAS is a huge one, too!


Can you explain what that is for the uninitiated?


ETS is a fast [1] in-memory K/V store including a set type with constant time put/get in the standard library of Erlang. New in OTP 20 is an atomic CAS (compare-and-swap) operation.

[1] http://www.erlang-factory.com/static/upload/media/1459269312...


I thought dirty NIFs were available since like 17 or 18. I know I wrote some code that used them a couple of years ago.


They were marked as experimental. Now they are marked as stable


> * Dirty schedulers: This allows easy integration of blocking C-based libraries. So for example can wrap something like RocksDb and make it available to the rest of the VM easier. Or libcurl and others.

Silly question, but shouldn't elang have some good & highly concurrent http libraries?


Dirty schedules are about functions implemented in C - called NIFs (Native Implemented Functions) - a sort of FFI you'd find in most languages.

Because of preemptive nature of Erlang, doing lengthy work in those functions can destabilise the system (the "magic value" is said to be around 1ms). That is because C functions can't be preempted in the middle of execution like Erlang ones can.

Using dirty schedulers lifts this time limitation, but gives a higher constant overhead when calling a function on a dirty scheduler, since it means switching OS thread. This tradeoff, however, is perfectly acceptable for a lot of cases.

And yes, Erlang has good & highly concurrent HTTP libraries implemented in Erlang.


A question regarding dirty schedulers: does it still mean that when called C code crashes during execution, entire Erlang VM will crash as well?


Yes. That's not changed. All NIFs (regular or dirty ones) are executed directly in the context of the VM. A safe option would be a port - a regular program where you communicate through stdin/stdout. Ports allow representing such program (running in a separate OS process) as something equivalent to a native Erlang process.

There's also work on supporting writing NIFs in Rust, which gives some degree of additional safety. The relevant project would be: https://github.com/hansihe/rustler


Yes. For such cases, you'll want to instead use a C-Node or a Port program to interface with the C code.


Oh it does there are already a good number of decent ones. I just used libcurl as an example. And as someone suggested one reason to wrap libcurl could be is because it supports a lot of the corner cases and protocols.


I wrapped libcurl as a port https://github.com/puzza007/katipo. It would be interesting to see how much faster it could be made as a NIF.


Yes, you can see some HTTP client benchmarks here.

Disclaimer: I wrote the Buoy (https://github.com/lpgauth/buoy)


libcurl supports some other protocols, too. Maybe Erlang lacks good and highly concurrent Gopher libraries. :-)


if you want to get started but don't know where this is a good place to start:

http://spawnedshelter.com/

there's a online course by Simon Thompson from the University of Kent that started 2 days ago, you may be able to join:

https://www.futurelearn.com/courses/functional-programming-e...

if you like the ideas but want to try something different there are alternative languages that run on the Erlang VM:

* Elixir with a ruby-like syntax: https://elixir-lang.org/

* LFE (Lisp Flavoured Erlang): http://lfe.io/

* Efene with a python-like syntax: http://efene.org/

and one in development but already looking really interesting: Alpaca, an ML inspired language: https://github.com/alpaca-lang/alpaca


Thanks! I came here to ask the "How do I Erlang?" question. Erlang has been on my radar for a long time but I've never taken the time to dig in. 85% of my code output is obscure protocol stacks in Python+Twisted, the other 15% is real-time embedded C over an RTOS. Even if I can't use Erlang in production, I'm sure I'd find food for thought in the concepts.


http://learnyousomeerlang.com/

That book is a good place to start.

> Even if I can't use Erlang in production, I'm sure I'd find food for thought in the concepts.

Oh boy are you going to be in for a treat. The concept for no loop construct other than recursion and recursive thinking was worth it for the little time I dabble in Erlang.


As an erlang newcomer, the non-loop seems to be one of the most intriguing concepts, IMO.


You still have list comprehensions, map, filter, fold and foreach using higher order functions


Sounds excellent. Procedural imperative programming is the curse of computer science. Program counters be damned! Data-flow FTW.

I'm sure there's some tee-shirt art in that rant somewhere.....


You run close to the metal, eh? Then you might find this project interesting to hack on:

http://nerves-project.org/

FWIW, the class for it was the first class to sell out for the upcoming https://elixirconf.com/.

Here's a great presentation by Garth Hitchens on Nerves https://www.youtube.com/watch?v=O39ipRsXv3Y (he uses it in production at his company, http://www.rosepoint.com/, which builds sort of vertical-market marine navigation hardware). He mentions some performance stats as well.


Yes, thanks. Good link. I do live on the hardware/software frontier and always have.

I'm a big fan of Micropython, Python on bare metal. Noob question: What is the chance of an Erlang VM on bare metal?


Pretty decent chance: https://www.grisp.org/


wow, first I've heard about this. How does this differ from the Nerves project, which also boots directly into the Erlang VM?


Nerves boots into the ErlangVM, but runs on a buildroot Linux. This makes it possible to run on a wide variety of hardware, but with the associated baggage.

GRiSP uses RTEMS[0] as its base which should make its performance more predictable.

[0] https://www.rtems.org/


The tutorials and core library documentation are really very clean.. You should be able to go pretty far just by installing a package and starting through the tutorials, although OP also has some links to more docs too


There's a lot of overlap with Spawned Shelter, and I haven't touched it in a few months, but https://gist.github.com/macintux/6349828 is a list of free Erlang resources I try to maintain.


I was not aware about Spawned Shelter and your list of resources wow, I need to check the first be as soon as I click on your link I needed to write you some thanks message! awesome job with the list.


Please don't forget about Luerl: Lua has GREAT support inside the BEAM VM ecosystem https://github.com/rvirding/luerl


yep, there's also an erlang implementation of prolog https://github.com/rvirding/erlog

the only different with the others is that since they implement existent languages following the spec, which means that some things don't map directly to the VM semantics and pay a little overhead, for example lua is a mutable language and the BEAM is immutable, this means that mutability is implemented by passing an immutable environment and mutating it.

This doesn't mean you can't use them, but they are mostly used as extension languages and for scripting than for complete projects.


The original Lua has also designed to be used as extension language to be embedded and give the base systems scripting super-powers! The beauty of the Luerl Lua implementation is that is really a full implementation of the Lua language is not something inspired by Lua or some kind of handicap version of the language. I agree of course with your statement about a little overhead, but it is a real true Lua.


Luerl lacks tail call optimization, coroutines, and "proper handling of __metatable". Without those three things Lua is just a simpler JavaScript and not that interesting.

(Lua also has a first-class C API, but we'll let that slide.)


What do you mean with proper handling of __metatable?, I'm not aware about the lack of tail call optimization that you mention please share your sources, about coroutines luerl is designed to use Erlang processes instead of coroutines that's the beauty of its implementation, you have millions of independent lua vm's running on top of independent beam processes.

Coroutines are replaced by concurrency of multiple lua processes running in parallel if you have more than one physical core on your machine all in the same battle tested Erlang VM.

(Luerl is an implementation of that fist-class C API, but for the BEAM Erlang/OTP VM)


I'm not exactly sure what's meant about __metatable, either. I merely quoted directly from the project website:

  https://github.com/rvirding/luerl
I assume they don't fully implement some of the metatable semantics, such as __gc on tables or operator overloading. They don't mention lacking coroutines, but the interfaces are missing from their list of supported interfaces.

Coroutines aren't just about green threading or async I/O. I often write tree walkers similar to this:

  local function walk(t)
    local function walk_next(node)
      if node then
        coroutine.yield(node, "preorder")
        walk_next(node.left)
        walk_next(node.right)
        coroutine.yield(node, "postorder")
      end
    end

    return coroutine.wrap(function ()
      walk_next(t.root)
    end)
  end

  for node, order in walk(tree) do
    ...
  end
Basically, coroutines allow you to easily reverse consumer/producer roles such that both consumer and producer can be implemented in the most natural style. That's hugely helpful when dealing with complex data structures and complex code flow, and Lua is somewhat unique in providing stackful coroutines. (Scheme is perhaps the only other language providing something at least as powerful--call/cc is even more powerful. MoarVM has stackful coroutines, but they're hidden behind Perl 6's narrow gather construct. Though, FWIW, the gather interface works well for the particular example I gave above.)

I understand that Erlang is all about message passing, which can provide similar semantics. But when programming in Lua, stackful coroutines are a huge asset. They don't require copying of messages and are basically free--creating a new coroutine has about the same cost as creating a function closure in terms of memory, and invocation costs are the same as a pcall, which means there's basically no significant performance impact. More importantly, because they're stackful you can generally call unrelated module and library routines without that code needing to be aware that they're running in a coroutine, or that a function they've been passed might yield the coroutine. (The only niggle is if library code passes a function, which itself might yield, into another coroutine. In that case the function might yield to the wrong resume point. But IME this is very rare, specifically because coroutines often obviate the need rely on callbacks as an interface for result production. Aside: Does Erlang support passing functions--and closures--to another Erlang process?)


First thanks for the quality of your response.

Yes, I notice the quote from the project github repo, just guessing I thought that it was referring to the lack of getmetatable and setmetatable functions, but I see them in the debug module, I'm no Lua expert so I don't really know how this lack of handling of __metatable feature affects the implementation.

But it's no related with the garbage collector or any type of missing feature that could make the Luerl project unusable.

About coroutines, there is no coroutines in Luerl but that's on purpose and I know it may seems counter intuitive coming from a solid Lua background.

I can't agree more with your statement about that coroutines aren't just about green threading or async I/O, I agree with you on this completely... BUT

In the BEAM ecosystem you kind of want to use processes instead, the VM it's build for handling independent isolated processes that are very small and like you said basically free at creation time and also at context switching.

The main difference between processes and coroutines is that, literally, in a multiprocessor machine a OTP release on the BEAM runs several processes concurrently in parallel. Coroutines, on the other hand, are running only one at the time on a single core and this running coroutine only suspends its execution when it explicitly requests to be suspended.


> there's a online course by Simon Thompson from the University of Kent that started 2 days ago, you may be able to join:

First time doing a MOOC - is this considered a good quality one? I find his ordering of course material not always logical, and he omits useful info in places (discovered through Googling)


My favourite feature - shell history finally in core! That's been a possibly irrational but personally annoying omission for years, and required messy hacks to get around - not a good beginner experience for what I consider to be pretty basic functionality.

It's enabled by an envar as described by the hero who cleaned up the hack and ported it here: https://github.com/ferd/erlang-history


Bit of history on the feature.

The hack was an old thing I had written at a hackathon over half a decade ago. It had little cost to regulars who had installed it at least once before. It relied on DETS and had transient issues with corruption of history files, but nothing major enough for anyone to take notice or get mad about it.

I myself ended up working on other projects once that one was good enough (including the search function with ctrl+r in the shell -- that wasn't there at first, and then things like rebar3 and other side projects), and nobody really ever took over or reimplemented a better shell history around it, nor took the time to port it to OTP.

I had also asked for help in the past for a better implementation that didn't rely on DETS, but not a lot of people were interested or knew how to proceed any better than I did. Eventually I just got tired and reimplemented it with the disk_log module and some log fragmentation and rotation to avoid dropping the whole file every time a limit was reached. It took a few weeks of review to get it in and so far nobody has complained about it.

I think the overall lesson is that the moment you have a quick hack to do the job, regulars get used to them, but forget about the beginner's experience where what you have to do is apply all the hacks at once and dear lord does it ever look janky then.


Thanks very much for your work - really makes a big difference to me and i'm sure many, many others. Beer's on me if you ever make it to SEA!


I wish I had but more than one upvote to give!


yes, that's a great addition.

for the lazy:

Add `export ERL_AFLAGS="-kernel shell_history enabled"` to your profile and get shell history out of the box with #Erlang/OTP 20

(from https://twitter.com/mononcqc/status/877544929496629248)


Oh that's fantastic! I just announced this to my team and they were all very happy to hear that.


Also the docs got a facelift: http://erlang.org/doc/

Now that we are talking Erlang:

What's missing in Erlang that would be valuable to you?

What are the biggest pain points right now on the Erlang ecosystem that makes it harder for you to try it/adopt it?


> What are the biggest pain points right now on the Erlang ecosystem that makes it harder for you to try it/adopt it?

Personally, Erlang solve stuff that is very niche and I don't have any project in those stuff.

Elixir web frameworks really help in term of practicing the language.

Erlang had cowboy, etc.. but it wasn't close to the other MVC language. So practicing Erlang to get real experiences in it was hard.

Maybe there should be a page of examples of other problem Erlang can solve so I can decide which problem I like to start a project on? Cause so far when I think of Erlang it's just concurrency, back end protocol, etc... But I'm sure there may be other problems it can be address that I don't know about and that I would like to start a trivial project on it.


> Erlang solve stuff that is very niche

Well, it's true that Erlang supports some strange, esoteric (at least, esoteric for people outside of telecom) protocols.

But it supports all the "normal" stuff also, as you write:

> Erlang had cowboy, etc.. but it wasn't close to the other MVC language.

Cowboy is actually pretty decent a framework for web development. It's not Rails or Django equivalent, it's similar to something like stripped-down CherryPy, which may be regarded as minimalistic, but it's perfectly functional. And fast. And WebSocket handling is pure bliss.

So, Cowboy doesn't support MVC out of the box, but you can implement this pattern on top of Cowboy in ~30 loc.

> So practicing Erlang to get real experiences in it was hard.

Erlang is different. From its Prolog roots to its concurrency primitives to immutability and tail-call elimination - it's built to be different.

It's not exactly surprising that you can't use it as freely as you'd like immediately after learning it. It's expected. The only thing I can tell you now is that the problem disappears with time. You need to practice Erlang fundamentals for a bit and slowly advance to more complex topics.

In my case, it took me a year to become proficient with Erlang: the main language was a bit of a hassle, but its std library and OTP and ecosystem in general took even more effort to grok. OTOH, after a year of learning and practicing, I was able to write a proof-of-concept web apps in a day, including learning Cowboy. The "proof-of-concept" was fast and solid enough that it reached production stage and ran there for 3 years until finally replaced by some other tool.

So I'd say it's worth the effort, but of course YMMV :)


For me the biggest pain point was the lack of great libraries for doing basic stuff like HTTP requests (client side). I was not able to reach the performance of Java/JVM with the Erlang implementation and I could not figure out what I am doing wrong. OTP and those pattern associated with it are still a bit opaque after reading countless manuals, howtos and books.


I wish there was better discovery in Erlang sources. I'd really love to see something like Mozilla DXR for Erlang code. That way I can see what's going on in more mature Erlang sources. Grep and GitHub leave much to be desired searching through Erlang code.

It would also be nice to have a way to infer--without actually running the app--the supervision tree from the source and then graph it.


> It would also be nice to have a way to infer--without actually running the app--the supervision tree from the source and then graph it.

That'd be effectively impossible for the general case, since it's trivial to supervise/launch processes whose primary module isn't known at compile time.

I definitely agree that understanding the supervision structure is both valuable and non-trivial.


> Also the docs got a facelift

You can finally zoom into them on mobile without the sidebar covering half the text :D


They forgot to fix it for Firefox. I guess they're using only Chrome.


didn't noticed that the submitted link is to the docs :P


Something that most people seems to have missed

> The non SMP Erlang VM is deprecated and not built by default

In 2017, most languages are still fighting to get a multicore implementation. Erlang is ditching their single core one.


This is probably better page to look, as I don't actively look at Erlang, but as Elixir developer I am very interested in it.

http://www.erlang.org/news/114


If you're an Elixir dev, it pays to both keep an eye on Erlang and learn its standard library (the parts not implemented in Elixir). There's a lot stuff already in there that you don't have to reinvent when doing Elixir. And of course, every OTP release directly affects Elixir.


Nice:

When a gen_server crashes, the stacktrace for the client will be printed to facilitate debugging.



Yes :) I am in Chicago, I would be happy to do remote work.


I just want to echo this: I work in Elixir on side projects at least, and HelloSign sounds like a cool place to work! But no way I'm moving to California.


Let's note that Elixir is already compatible with it. Phoenix/Elixir, I feel, is the future of web apps.


I used to think the same but I'm not sure about it now. We have plenty of managed backends now and they usually support Node, Python and a few other languages [1]. BEAM ones are never among them. Furthermore the deployment story now is all about docker, which is at odds with the mosts advanced features of BEAM (hot reload, but I guess few people use it.)

So, it might be the future for fully self managed web apps, but developers are moving in another direction now. That damages the mindshare.

[1] Basically the owners of those backends have a voice in which languages succeed and which not. They'll never support more than a handful of languages, because of the maintenance costs.


Just an FYI Wings3D http://www.wings3d.com/ is built with Erlang. It is a Symbolics style 3d Modelling application using the Winged Edge data structure.


Is there like a why's poignant guide to ruby for erlang?http://poignant.guide/book/


As a rubyist, I found Elixir was a great entrypoint - from there you can learn OTP behaviors like GenServer, and slowly start to learn Erlang syntax from using stuff like its data structures or ETS.

Haven't found anything as great as why's guide yet though. Maybe it's just waiting to be written by someone :)


There's "Learn You Some Erlang for great good!" (http://learnyousomeerlang.com/), which is a great (the best?) resource for learning Erlang.


I found it too densely informative. After the 2nd or 3rd attempt I ditched it and just jumped into programming. Personal preference possibly. I think erlang's difficulty is exaggerated. Just jump in - the immediate little wins will lubricate the process.


"Densely informative" is the right way for a textbook to be, I think. You're not meant to read LYSE or books like it straight through; rather, each individual paragraph is dense enough with facts that after every one, you should stop and tinker with a toy example or two at the REPL to collect your thoughts and make sure you understand what's going on.

If you imagine someone teaching Erlang with LYSE serving as the actual textbook, they'd do what is done with most textbooks: split each chapter into rather small segments, and have you do exercises after each one to "interactively" absorb the knowledge.


what kind of guides would you like to see? we could help, but It's not easy to write docs like why did :)

any other style that would work for you?


I have tried on multiple occasions to learn and use Erlang for some different projects.

My process has always been stopped by a few specific things, and I end up going back to Common Lisp, which tends to be much easier to get to work.

My first problem is to make sense out of how to actually run a server application. Manually compiling source files and run functions is one thing, but trying to actually set up an application with all the monitors and stuff is really annoyingly hard. Even following step-by-step instructions resulted in errors that I didn't understand as soon as I tried to do anything outside what the tutorial showed.

The second issue that I have never managed to properly solve is how to use libraries. In particular, I have wanted to use the client libraries for RabbitMQ and CouchDB (both written in Erlang, so you'd expect it to be simple). The instructions how to install the libraries usually doesn't match the thing you actually download.

I've also tried to use the EDTS in Emacs, which is pretty nice, but as soon as I try to use a library, it can never find the hrl files.

I really want to use Erlang, but the difficulty of actually getting started with a real project as opposed to simple tutorial stuff has been extremely frustrating.


Try out elixir. It runs on the Erlang VM but is easy to get up and running and has a great community. It's much easier to add external libraries and uses its own package manager.


Thanks for the suggestions (you, and the other two people mentioning the same thing). I have just started looking at Elixir, and I'm hoping that will work better.

Am I just to assume that Erlang deployment really is as complicated and poorly documented as I thought, since no one suggested any documentation, but three people suggested I switch to Elixir.


In my experience, Erlang doesn't have a standard way of doing a deployment. There are 3rd party apps like Rebar and others, but they are both not available by default (after a fresh install) and seriously underdocumented. The result of this, IMHO, is that everyone builds their own build and deployment system, using shell scripting, Make or Erlang itself.

Elixir includes Mix - a standard task runner - and Hex as part of the distribution. It also provides a central pkg repository (https://hex.pm/packages). It's much, much easier to start a project and pull all your dependencies.

I guess you could use Hex and Mix with pure Erlang projects, but I haven't tried this.


rebar3 will become the standard build tool for erlang, it's already in the erlang organization (https://github.com/erlang/rebar3), most people nowadays use either rebar3 or erlang.mk (https://erlang.mk/), both work really well, at least for me.


Have you tried erlang.mk with relx? Setting up the project takes literally no effort, building & running the server app is as easy as "make run".


Second vote for the other comment. Elixir is basically developer friendly Erlang without a performance penalty.


To someone who likes Lisps, Elixir has Lisp style hygenic macros, and a very Lispy AST.


"People who like Lisps" are actually the target audience for LFE (Lisp Flavored Erlang at http://docs.lfe.io/current/index.html). It doesn't have all the Elixir ecosystem goodness, but it's a genuine Lisp, not merely Lisp-like :)


Erlang is awesome. The only problem preventing me to use it where I want to is terrible file I/O performance, especially on writes. I tried to Google the solution, but it seems that there isn't anything generally accepted for the moment.


something specific? did you tried raw file handles? http://erlang.org/doc/man/file.html#open-2


Yes, and delayed_write as well.

I tried to port a (very simplistic, but fast) market data append-only database from Scala to Erlang. In Scala, I have no performance issues, but the code is unnecesarily complex to my taste.

In Scala, I am getting around 100000 events per second, with great additional optimization margins (memory mapped files are great). In Erlang, it barely works with few hundred events per second.

Right now I have googled that disk_log in Erlang is fast enough, but it uses Erlang's own internal binary format (there is an option to plug custom codecs, but it is wonderfully under-documented, to say the least).

This looks strange to me, because Erlang is well-optimized regarding the network IO. What's the difference for file IO?


100K/events per seconds is easy to do, you can base yourself on fast_disk_log (https://github.com/lpgauth/fast_disk_log)


> Erlang is well-optimized regarding the network IO. What's the difference for file IO?

IIRC, async network IO is handled by the VM's scheduler threads each just calling a non-blocking select() on the fdset of sockets it holds port()s for every once in a while. This is because TCP/IP implementations are pretty much guaranteed to expose a non-blocking select() for sockets, and do in every OS Erlang is written for.

Disk IO, on the other hand, is done by throwing the calls over to special async IO threads, which have to send responses back to the scheduler that wants them using (I think zero-copy) IPC. This is done because not all OSes have the equivalent of non-blocking select() for file handles (i.e. what you get from using fcntl(2) + read(2) in Linux.) So Erlang's disk IO BIFs fundamentally require context-switches and/or NUMA messaging.

You get to avoid this if you use your own NIFs—which have the lovely property of running in the calling scheduler by default, with the ability to decide on each call whether the workload for this particular call is small enough to perform synchronously, or whether it should be scheduled over to a dirty scheduler, blocking the Erlang process and yielding the scheduler. In other words, this is like the disk IO BIFs in the worst case, and can be a lot faster in the best case. It's a lot like NT's "overlapped IO" primitives, actually.

You could also use higher-level BIFs. There is a reason Mnesia is implemented in terms of special BIFs (DETS) rather than DETS being a functional layer implemented using the disk IO BIFs. Mind you, DETS probably doesn't have the API you're looking for, but there are a number of NIF libraries that plug low-level "storage engines" into Erlang as NIFs that provide equivalent convenience:

https://github.com/cloudant/nifile

https://github.com/basho/eleveldb

https://github.com/gburd/lmdb

And there's also always the option to do what Erlang-on-Xen does: forego anything that "devolves into" disk IO entirely, implementing the file module in terms of network IO—the p9 protocol in their case. This is actually likely the lowest-overhead move you can make if you're going to be running your Erlang node on a VM that would just be talking to a virtual disk mounted from a SAN anyway. Instead of the OS mounting the disk from the SAN and Erlang talking to the OS, you can just have Erlang talk to the SAN directly using e.g. iSCSI.


If you are writing sequentially to a local disk, perhaps it would make sense to create a tiny shim process that listens on a socket and writes to a file. You could do that in C, or perhaps even as a socat invocation.


Do note that this should have about equivalent performance to Erlang's disk IO BIFs, as this is essentially what they're already doing—the async IO threads are the "tiny shim process." (The difference mostly comes down to the VM having an efficient internal IPC protocol to them, and the runtime being able to control their CPU core affinity.)

Note also that if you turn ERTS's async IO threads off (set the number of them to 0), you should get improved throughput on disk IO tasks, as you're then forcing the scheduler threads to do the disk IO themselves. Of course, this trades off against latency, because the fallback here is blocking calls.

Sadly, the async-thread-pool architecture has meant Erlang has had no reason to implement anything like Linux's kernel AIO support. (I wonder if they'd accept a patch that specialized the efile driver for Linux, the way it uses overlapped IO on win32...)


Interesting idea! Thank you for the suggestion.


I have noticed that as well. I have tried to optimize the scheduler and all but I could not get the performance up to a reasonable level. The worst part was that I could not figure out why.


Somewhat irrelevant, but I noticed that the copyright line at the bottom still says "Copyright © 1999-2016 Ericsson AB". Should it be updated to 2017?


It seems that the source build is broken. At least a straightforward attempt to build otp from github sources with latest clang/lld has been failed.


My favourite DSL!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: