Am I missing some context here? Why is there a 2021 in the headline ?
Edit: Covid and partial lock down has been going on for far too long I read the headline thought we are already in 2022. I had to check the calendar just to make sure we are still in 2021.
"Let’s say we want to send information to another Ractor, but don’t want to block for it to finish processing it. What happens if the receiving Ractor is too slow to process the data?"
I would prefer an API where the channels can be "buffered", so that a channel can be configured to hold at maximum N items, and trying to add to a full channel would block the producer.
I would also suggest that you totally split the consuming/producing from other tasks, that way you don't have to worry about blocking them.
Does anybody have any practical use cases for this? I am wondering how this would be preferable to something like sidekiq in a large application with multiple application nodes.
Completely different use cases. Sidekiq is best for decoupling workers from producers that are in different interpreter contexts and possibly different machines.
Ractor is for parallel execution within the same interpreter context. Much lighter weight, but also with no persistence.
EDIT: Basically consider Ractor for places where you might want parallel execution, want less data sharing than with Thread, but don't want the overhead of a process-external job system.
Why couldn't parallel execution in single process be combined with persistence? Store the job data, the continue execution of the job and at the end, clean up or mark it completed. One gets both persistence as well as an easy, single-process deployment.
Nothing stops you from doing that, but you're then in a relatively small niche of processing messages that you want persistence for, but where persistence is not important enough to put it in a more resilient external queuing system.
There's absolutely apps that fit in that niche. E.g. somewhat time-consuming processing that fit on a single machine and that can still be re-run from scratch if a queue is lost, can be a use-case for it.
I've built crawlers where some of the queues fit that model, for example - e.g. things we'd re-crawl 2-3 times a day, so worst case if a queue went and some things wouldn't update for a few hours.
You are right that an external queuing system might be useful in some cases. I was thinking of simple background jobs like sending emails while processing an HTTP request. Once persistence is available, why would this be any less resilient than an external queue? IIRC, this is the standard way of dealing with background jobs in Erlang/Elixir projects . Ractor should bring similar capability to Ruby ecosystem.
Depends what you mean by "persistence". If you store it to a replicated database, then sure, it's just as resilient, but that is an external queue.
If you store it to a replicated filesystem, then sure, it's just as resilient, but almost nobody runs a setup like that (I have; I ran GlusterFS in production without downtime for a decade - it was a great piece of software, but for this kind of usage pattern it'd kill performance and just stuffing it in a database would be operationally easier and much faster).
But if you just store it to a local disk, then if the individual server fails your queue fails, that is why it would usually be less resilient than a properly set up external queue based on a replication storage mechanism. Note that you certainly can set up external queues that are just as, or even more, brittle. But at least the more serious queueing systems at least have options that will allow you to make the queue resilient.
Note that all of the above takes away the advantage of "just" having a single process deployment unless you already have that infrastructure set up. I'd much rather set up a database + replica than GlusterFS for example (GlusterFS is great, but it's a pain to manage compared to e.g. just Postgres or a Redis cluster to get resilience for Sidekiq)
> Ractor should bring similar capability to Ruby ecosystem.
Ractor is 100% orthogonal to that. I did what you describe first time in Ruby in 2005 with processes. It took a few dozen lines of code + sqlite (run one thread to stuff new jobs into sqlite; fork as many times as you need workers, and use a pipe to tell them which job to pick up, or serialize the whole job).
What Ractor brings is more fine-grained isolation (more sharing than a separate process, less than Thread), and running Ruby code on multiple CPU cores without resorting to fork(). That's all it's trying to do.
You could implement what I described above slightly cleaner with Ractor and wrap it up into a gem, but form vs. Ractor is unlikely to make a big difference for that kind of scenario, because the overheads of the sharing is dwarfed by the disk io. This may matter more for other Ruby implementations than MRI.
I’m using it in a stream-identification library I’m writing (à la ‘file’/‘libmagic’) where I want to parallelize file IO against a frozen/shareable bank of in-memory type representations (known file extensions, magic byte sequences, etc).
Since it mentions that Ractors are inspired by the actor model, I'm surprised that it doesn't make any comparisons to how various actor model implementations in other languages handle backpressure
Edit: Covid and partial lock down has been going on for far too long I read the headline thought we are already in 2022. I had to check the calendar just to make sure we are still in 2021.