Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Kameo – Fault-tolerant async actors built on Tokio (github.com/tqwewe)
105 points by tqwewe 36 days ago | hide | past | favorite | 58 comments
Hi HN,

I’m excited to share Kameo, a lightweight Rust library that helps you build fault-tolerant, distributed, and asynchronous actors. If you're working on distributed systems, microservices, or real-time applications, Kameo offers a simple yet powerful API for handling concurrency, panic recovery, and remote messaging between nodes.

Key Features:

- Async Rust: Each actor runs as a separate Tokio task, making concurrency management simple.

- Remote Messaging: Seamlessly send messages to actors across different nodes.

- Supervision and Fault Tolerance: Create self-healing systems with actor hierarchies.

- Backpressure Support: Supports bounded and unbounded mpsc messaging.

I built Kameo because I wanted a more intuitive, scalable solution for distributed Rust applications. I’d love feedback from the HN community and contributions from anyone interested in Rust and actor-based systems.

Check out the project on GitHub: https://github.com/tqwewe/kameo

Looking forward to hearing your thoughts!




Looks very cool. Is there any documentation on how it works for communication over a network? I see the remote/swarm section but is there an overview somewhere?


Thanks! Its definitely missing, I'll need to add that perhaps to the kameo book.

Its using libp2p under the hood with Kademlia distributed hash table for actor registrations.


I've added an indepth section to the kameo book about distributed actors if you'd like to read more. https://docs.page/tqwewe/kameo/distributed-actors


Wicked cool. Thanks!


:'(


Hi - any documentation regarding actor registration? Is there a conventional way to inform a remote actor about a new actor? Would this be sent in a message? How does the actor.register('name') work? Maybe could be a useful addition to the documentation. Thanks.


Hi, I'll probably need to add better documentation on the internals of how remote actors work. There's not really any special features for informing other actors when one is registered currently, but you could do this yourself of course via messaging.

actor.register('name') works by using a Kademlia DHT behind the scenes. This is implemented thanks to libp2p which handles all the complications of peer to peer connections


I've added an indepth section to the kameo book about actor registration and lookup, including how it works: https://docs.page/tqwewe/kameo/distributed-actors/registerin...


What I find myself wondering (maybe based on a superficial understanding) is how this is fundamentally different from, or better than grpc, and whether it could be used as an implementation of grpc.


In my experience, setting up gRPC often involves a lot of boilerplate, particularly with code generation when using libraries like Tonic. While gRPC is great for well-defined, schema-driven communication, one of the big advantages of distributed actors in Kameo is that you can communicate with any actor on any node without the need to define a schema upfront.

With Kameo, an actor running on another node is simply accessed through a RemoteActorRef, and you can message it the same way you would interact with a local actor. This flexibility allows you to avoid the overhead of schema management while still achieving seamless communication across nodes, making the system more dynamic and less rigid compared to gRPC.


What's the reason for using Async for an Actor framework ?

They run in separate tasks / threads anyway and they are cpu-bound. So, why would it be necessary to make them async ?


In the case of tokio, multiple actors can run on a single thread. Tokio uses a worker pool of threads equal to the number of cores on your system. So spawning a new actor will run amongs other actors. This lets us perform io operations in an actor, such as http connection, and progress other actors whilst waiting for a response.

Kameo does have a `spawn_in_thread` function for CPU bound actors if needed.


> multiple actors can run on a single thread.

Right. It's not a very widespread use case, to be honest. You'd find that most would be N actors for M threads (where N <= M ; an Actor in itself is never shared among multiple threads [So `Send` and not `Sync`, in theory] - an inner message handler _could_ have parallel processing but that's up to the user)

I think you should assume in Kameo that every Actor's message handler is going to be CPU-bound. For example, it means that your internal message dispatch and Actor management should be on a separate loop from the User's `async fn handle`. I don't know if it's already the case, but it's an important consideration for your design.

Nice library, BTW, I think it checks all the marks and I like your design. I've tried most of them but could not find one that I liked and/or that would not have a fatal design flaw (like async_traits, ...) :)

PS : Multi-threaded tokio runtime should be the default. Nobody wants a single-threaded actor runtime. It should be in capital letters in the readme.


> Right. It's not a very widespread use case, to be honest. You'd find that most would be N actors for M threads (where N <= M

What makes you think that? Having a large number of actors per thread is by far the most important use case. The Actor model is commonly used in communication systems where there are hundreds of thousands of actors per machine (often one for every single user). In this context, Actors are typically extremely lightweight and not CPU-bound. Instead, they mostly focus on network I/O and are often idle, waiting for messages to arrive or be sent.


I think you misread :

- 2 actors on 1 thread = OK

- 1 actor on 2 thread = you are probably doing it wrong.

As for the rest, whether or not they are used in communication systems and whether or not they are cpu-bound, consider there are and run the handle on a separate loop from the main message dispatching. Otherwise you _will_ delay messaging if handles don't await.


Can someone provide some context on what an "actor" is here. It's the first time I've come across the term used like this.


43 years of actors: a taxonomy of actor models and their key properties (2016)

From the abstract: The Actor Model is a message passing concurrency model that was originally proposed by Hewitt et al. in 1973. It is now 43 years later and since then researchers have explored a plethora of variations on this model. This paper presents a history of the Actor Model throughout those years. The goal of this paper is not to provide an exhaustive overview of every actor system in existence but rather to give an overview of some of the exemplar languages and libraries that influenced the design and rationale of other actor systems throughout those years. This paper therefore shows that most actor systems can be roughly classified into four families, namely: Classic Actors, Active Objects, Processes and Communicating Event-Loops.

-> http://soft.vub.ac.be/Publications/2016/vub-soft-tr-16-11.pd...



Is this actually distributed? I see no evidence that this can be used in conjunction with even ipc with builtin features.


Check the examples folder.


https://github.com/tqwewe/kameo/blob/main/examples/remote.rs

    // Bootstrap the actor swarm
    if is_host {
        ActorSwarm::bootstrap()?
            .listen_on("/ip4/0.0.0.0/udp/8020/quic-v1".parse()?)
            .await?;
    } else {
        ActorSwarm::bootstrap()?.dial(
            DialOpts::unknown_peer_id()
                .address("/ip4/0.0.0.0/udp/8020/quic-v1".parse()?)
                .build(),
        );
    }


    let remote_actor_ref = RemoteActorRef::<MyActor>::lookup("my_actor").await?;
        match remote_actor_ref {
            Some(remote_actor_ref) => {
                let count = remote_actor_ref.ask(&Inc { amount: 10 }).send().await?;
                println!("Incremented! Count is {count}");
            }
            ...


Thanks! It's not in the front page material.


This is resolved now :) Added better material around distributed actors.


Yeah I definitely need to add some more documenation on this feature!


Curious, could it be used in wasm? I think it would be really cool if it can be used in browsers.


Would be absolutely awesome I agree! But, sadly I don't think tokio really runs in wasm just yet. But I see it being possible some day


This looks really nice! Curious if its running in production anywhere


I agree, really nice syntax.

There's a limitation mentioned in the docs:

  While messages are processed sequentially within a single actor, Kameo allows for concurrent processing across multiple actors.
which is justified via

  This [sequential processing] model also ensures that messages are processed in the order they are received, which can be critical for maintaining consistency and correctness in certain applications.
I agree to this and it gives the library a well defined use.

Docs and examples are well made.


This limitation is common to most implementations of the actor model. In fact, I think a lot of people would consider it a feature, not a limitation because it allows you to reason about your concurrent behavior in a more straightforward way.


Thank you for the lovely feedback! Happy to hear this. Will continue improving documentation, adding more examples to code docs, etc.


> There's a limitation

It’s a feature.


Not yet, however I hope to answer with yes soon. I'm using kameo heavily in a startup I'm building (oddselite.app). Hopefully will be released shortly for this to be a yes. But as of now, it's still quite a new library and the API has gone through many breaking changes to get where its at now


likely dumb question, but can this allow me to build trivial concurrent & parallel local apps? being able to parallelize the load across all the cores is important to me.


Under the hood it uses tokio runtime. So as long as you enable `rt-multi-thread` feature flag in tokio, and use `#[tokio::main]`, then yes! Actors can run on multiple threads. By default tokio uses worker threads, equal to the number of threads on your machine.

In kameo, all actors run in a `tokio::spawn` task.


Looks good, it would be great to see more examples in the docs.


Thanks! Definitely agree with you, I'll create an issue for this


Looks really nice.

But sometimes when I see projects like this in other languages, I think, are you sure you don't want to use Erlang or something else on the BEAM runtime and just call Rust or C via their NIFs?

I used Erlang about a decade ago, and even then it was so robust, easy to use, and mature. Granted you have to offload anything performance-sensitive to native functions but the interface was straightforward.

In the Erlang community back then there were always legends about how Whatsapp had only 10 people and 40 servers to serve 1 Billion customers. Probably an exaggeration, but I could totally see it being true. That's how well thought out and robust it was.

Having said all that, I don't mean to diminish your accomplishment here. This is very cool!


I think a lot of issues BEAM was trying to solve were solved by processors getting bigger and more cores.

BEAM's benefit 10-20 years ago where that inter-node communication was essentially the same as communicating in the same process. Meaning i could talk to an actor on a different machine the same way as if it was in the same process.

These days people just spin up more cores on one machine. Getting good performance out of multi-node erlang is a challenge and only really works if you can host all the servers on one rack to simulate a multi-core machine. The built in distributed part of erlang doesn't work so well in modern VPS / AWS setup, although some try.


“Just spin up more cores on one machine” has a pretty low scale ceiling, don’t you think? What, 96 cores? Maybe a few more on ARM? What do you do when you need thousands or tens of thousands of cores?

Well, what I do is think of functions as services, and there are different ways to get that, but BEAM / OTP are surely among them.


> What do you do when you need thousands or tens of thousands of cores?

I think most software won't need to scale that far. Did you run into any systems like that built on top of BEAM?


I’m just saying, erlang was built for telephony at scale, not for building some REST website. “You probably won’t need more than one big host for any given request” isn’t really a winning argument for scaled systems


Correct me if I'm wrong, but I believe "scale" in the original context meant developing a system with strong fault tolerance properties, so that if a node went down to ie a hardware failure, the system as a whole would keep working normally.

So, did you run into any systems that needed to scale to tens of thousands of cores for a reason inherent to the problem they were solving, and was built on top of BEAM?


Massive context switching and type checking on Erlang is inferior.


It's a nice point. I am a fan of the beam runtime, and it has been an influence on the design decisions of kameo. However I don't see myself switching to another language from Rust anytime soon, especially with the amazing advancements with wasm and such.

Although Elixir is a nice language, I struggle to enjoy writing code in a language lacking types.


Fair response.


What are the advantages and disadvantages vs using Actix or Ractor?


PSA: Actix (not Actix-web) is fairly inactive - one of the maintainers informally said not to use it for any new projects during one of this year's RustConf chats.



Actix was built initially using its own runtime and has gone over many iterations including a runtime change to tokio over its lifetime. In the past, building asynchronous actors with actix has been a huge pain and felt like a big after thought.

Ractor is nice, and I've used it in the past. A couple things differ between kameo and ractor:

- In ractor, messages must be defined in a single enum – in kameo, they can be separate structs each with their own `Message` implementation. This means messages can be implemented for multiple actors which can be quite useful.

- In ractor, the actor itself is not the state, meaning you must typically define two types per actor – in kameo, the actor itself is the state, which in my opinion simplifies things. As someone mentioned in a comment here, it was a bit of a turn off for me using ractor in the past and I didn't fully agree with this design decision

- Ractor requires the `#[async_trait]` macro – kameo does not.

There may be other obvious differences but I'm not super familiar with ractor besides these points


While the state is indeed a separate struct in ractor there's actually a good reason for this. It's because the state is constructed by the actor and it's guaranteed that the construction is the state is managed by the startup flow and panic safe.

Imagine opening a socket, if you have a mutable self the caller who is going to spawn that actor needs to open the socket themselves and risk the failure there. Instead of the actor who would eventually be responsible for said socket. This is outlined in our docs the motivation for this. Essentially the actor is responsible for creation of their state and any risks associated with that.

- for the async trait point we actually do support the native async traits without the boxing magic macro. It's a feature you can disable if you so wish but it impacts factories since you can't then box traits with native future returns https://github.com/slawlor/ractor/pull/202

(For transparency I'm the author of ractor)


Thanks for the reply! The example you gave does make sense regarding the stage being used with the actors startup method.

I wasn't aware async_trait wasn't needed, thats nice to see.

Also congrats on it being used in such a big company, thats awesome! I have a lot of respect for ractor and appreciate your response


Thanks! I'm happy to see actors getting some solid use in the industry to provide better thread-management safety and remove a lot of concurrency headaches.

Question for you, I was poking around in the codebase and how do you handle your Signal priorities? Like if a link died, and there's 1000 messages in the queue already, or if it's a bounded mailbox, would the link died (or even stop) messages be delayed by that much?

Have you looked into prioritization of those messages such that it's not globally FIFO?


Great question, I did some digging into the source code of beam to help answer if signals should have special priority, and the conclusion (with the help of someone else from the elixir community) was that signals have no special priority over regular messages in beam. So I decided to take this same approach, where a regular message is just a `Signal::Message(M)` variant, and everything sent to the mailbox is a signal.

So gracefully shutting down an actor with `actor_ref.stop_gracefully().await` will process all pending messages before stopping. But the actor itself can be forcefully stopped with `actor_ref.kill()`


Also ractor is used in production at Meta. (Which I can finally say publicly lol)

Here's the RustConf presentation for anyone interested https://slawlor.github.io/ractor/assets/rustconf2024_present...


I actually went through this exact exercise recently, but this library didn't show up in my searches for a good rust actor framework, so take that with a grain of salt. It looks very similar to the interface provided by actix at first blush, not sure how supervision works. My take is that most of these frameworks tend to solve with the same(ish) solution, so pick the one that has the best api. I liked ractor, although not having &mut self eventually wore me down. I swapped a small side project to use Stakker instead and while at first the macros intimidated me, the implementation really impressed me in terms of performance and API characteristics. It really feels like there's just enough there and no more.


The actix crate is deprecated. I looked on their site and repo and couldn't find an official announcement of deprecation but here is a link to what the lead said when I reached out with questions a few months ago: https://discord.com/channels/771444961383153695/771447523956...

EDIT: Tangent, but if anyone has experience making deterministic actor model systems that can be run under a property test I'd love to know more. It would make an amazing blog post even if it would have a very narrow audience


"in Tokio" would be even more non-ambiguous.


[dead]


Yeah how about something much older. I vote for Lode Runner.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: