> ...10k lines of code. This is 100x less code than the ~1M lines Twitter
I wish I didn't see this comparison, which is not fair at all. Everyone in their right mind understands that the number of features is much less, that's why you have 10k lines.
Add large-scale distributed live video support at the top of that, and you won't get any close to 10k lines. It's only one of many many examples. I really wish you compare Mastodon to Twitter 0.1 and don't do false advertising
> 100M bots posting 3,500 times per second... to demonstrate its scale
I'm wondering why 100M bots post only 3500 times per second? Is it 3500 per second for each bot? Seems like it's not, since https termination will consume the most of resources in this case. So I'm afraid it's just not enough.
When I worked in Statuspage, we had support of 50-100k requests per second, because this is how it works - you have spikes, and traffic which is not evenly distributed. TBH, if it's only 3500 per second total, then I have to admit it is not enough.
We're comparing just to the original consumer product, which is about the same as Mastodon is today. That's why we said "original consumer product" and not "Twitter's current consumer product".
Mastodon actually has more features than the original Twitter consumer product like hashtag follows, global timelines, and more sophisticated filtering/muting capabilities.
Some people argue it's not so expensive to build a scalable Twitter with modern tools, which is why we also included the comparison against Threads. That's a very recent data point showing how ridiculously expensive it is to build applications like this, and they didn't even start from scratch as Instagram/Meta already had infrastructure powering similar products.
I work in gaming, so I cannot speak to your specific experiences. Entity Component Systems are extremely performant, really good science, and shipping in middlewares like Unity. However, in order to ship an ECS game, in my experience, you have to have already made your whole game first in a normal approach, in order to have everything be fully specified sufficiently that you can correctly create an ECS implementation. In practice, this means ECS is used to make first person shooters, which have decades of well specified traditions and behavior, and V2 of simulators, like for Cities Skylines 2 and Two Point Campus.
So this is not meant to diminish the achievements of what you have built at all, it is more intellectually honest to say that "any high performance framework is most suitable for projects that are exact clones of pre-existing, mature things with battle-hardened specifications and end user behavior." While this might cover some greenfield projects, including the best capitalized ones that may matter to you, it does diminish the appeal of a framework for the vast majority of success stories from small & poorly capitalized teams. Those small & poor teams are very innovation and serendipity driven and hence rarely copying a pre-existing thing. And even if they try to become well-capitalized, they are almost always doing so by having worked on the thing they are copying already (i.e., already shipping version 1.0 for years).
Yes, this is along the lines of what I'm suspicious of too, as an also-gamedev that has done some ECS.
It's too easy to study an existing system and, given the resources, create a perfect demo for how to dramatically improve it in certain ways. You can go on to ship the demo, but the demo wasn't made by the same kind of organization, with the same kinds of goals, as the original system builders. The extreme example of this is, of course, the demoscene and its hardware-bending tricks that achieve the impossible through a significant modification to the design of a "production" equivalent.
So it's better by performance metrics, better by codesize, but unknown on other metrics. Like, "do I know how to start building new things with this?"
Asking “what are you optimizing for/building towards” and getting different answers for different products isn’t proof that either is inferior, just that they may be better at different things.
Your comment makes it seem like their post is misleading, when it isn’t it just might be that what they do best isn’t useful to you (or possibly anyone).
I can’t go into specifics due to NDAs, but I’ve been a developer on a game that was designed and programmed with the EnTT library from the ground up. [1]
I don’t know if I’d suggest ECS for every team as you need the right tooling, culture, and leadership to pull it off. That said, I think the paradigm has unrecognized benefits when it comes to greenfield gameplay programming.
Or, your data structures become a way for sub-teams to communicate and share state without stepping on one another’s toes. A group handling pathfinding and a different one handling scenario/mission logic can both pay attention to the position data without needing to be highly coupled. It becomes easy to “just add a system” or “an extra component” to build on existing functionality.
The Fred Brooks quote about “showing me your tables” comes to mind.
I only hobby in game dev. I have read some basics of ECS and IMO it seems a lot more intuitive than the “normal approach”. I think this is a case of using familiar tools we are comfortable with, not necessarily better. Unity’s ECS implementation is a hot mess though. I am looking forward to seeing what Bevy delivers as it matures.
I agree that copying an existing product will be easier and is usually cheaper and more performant because you can leverage your competitor’s R&D and lessons learned. I presume this is why some tech companies’ product lines are full of clones.
I worked on a previous iteration of Threads at Instagram, and I don't think the 25 person-years number you're estimating spent on the project was spent primarily on backend engineering. They concurrently launched native iOS and Android apps, and I think you're also factoring in the IC count from the linked article to mean full-time ICs dedicated exclusively to Threads, which is probably not how it worked. (Although TBH I'm not sure how you arrived at 25-person-years from the article, which said the project started with "dozens" of engineers and peaked at 50, and was 7 months long in total — a simple average of 24-50 engineers for 7 months, even if you assume they're all full time, would be about 21 person-years).
When I was at IG, my team was 20-something ICs full-time on our project, bursted to maybe double that as necessary in part-time help from ICs in the wider org. We had a total of three backend engineers, of which I was one, as well as a backend intern for a few months, although we bursted the backend team for a couple of months to four for some ML help.
Your project sounds pretty cool! But I don't think the comparison to Threads productivity is quite right. The majority of IG engineering is focused on building polished native clients, not generally on backend infrastructure. It's true that Meta already has a lot of the backend infra built — if Red Planet Labs can come close to what you get at Meta, that's pretty amazing. But I don't think the numbers you're quoting are apples-to-apples, or mean quite what you think they do. I don't know if this version of Threads operated exactly like the one I worked on (a Snapchat competitor), but I'd be pretty surprised if there was a product team at IG that was majority backend.
(Edit: I also think it's worth keeping in mind the experience levels of who works on these projects — at IG it's usually a smaller number of E6/E5s guiding a larger number of E4/E3s. Person-years are not all equivalent! If you spent ten years building the Red Planet Labs infrastructure based on your time at Twitter, nine person-months of your team's time building a product on your infra might not be the same as nine person-months of someone else's.)
Anyway — I don't mean to downplay your product, and really, if it's anything like the backend productivity I experienced at Meta, that would be pretty groundbreaking. Curious to see what you launch :)
Built some log databases and back end frameworks myself with some of the same concepts.
I applaud the creativity in rethinking how back ends should work. Please now do frontends next! :)
"But it's not a fully functioning 2023 Twitter!!!" I think some people miss the point. This is not about hey we built a Twitter clone. This is about a POC for a novel app architecture.
We need to be constantly examining and re-examining our thoughts about the best way to deal with distributed systems, scale, developer workflow. Even inventing new ones.
With things like twitter, the ui is not the hard part. Things like moderation are the secret sauce. All the corner cases and support for devopsy stuff likely account for a lot. Routing to specific instances for celebrities and such.
I originally read it as it's easy to clone twitter.
My response is it's easy enough to build a micro-blogging platform/service. it's all the other shit like moderation, regulatory/legal compliance, making a profit, keeping advertisers happy, etc that's hard.
We see this type of post regularly. Something like, "How I built a better <pick your app> clone by myself in a month." Well, no, usually it's just a bare skeleton with the least amount of functionality. Not only that, the software is the least of the functionality. The organizational structure around the app is what matters most to keep it going. It's an attention seeking ploy and the whole thing usually disappears real quick.
If you want background, you can go to the blog in his bio and read the first post, which mentioned he worked at Twitter starting in 2011. He mentioned elsewhere that his comparison is early Twitter, as opposed to current Twitter.
The background makes this a bit more interesting, because you can imagine how those early days impacted the arc of his work.
None of those things get eliminated by decentralization, they get distributed to whatever the point of control / ownership is.
Mastodon still requires security, compliance and moderation. And those requirements are going to keep getting more challenging by the year. It'll end up being another reason nobody will want to host content in a decentralized manner, the burden will become obnoxious.
An organization trying to maintain an ISO certification will have drastically different policies and controls than a small shop, or even a hobbyist group.
Everyone (theoretically) would be complying to statues in its broadest sense, but jurisdiction, regulations, industry best practices, reporting requirements, and appetite for risk is going to be different from organization to organization. It’s not one-size-fits-all.
So some of these things are eliminated because the people hosting those are not put under the same kind of scrutiny as say, a Twitter.
The scrutiny from investors, legal and compliance exists because the risks are real, and they don't go away just because there isn't a Serious Business involved. Once someone operating one of these is publicly damaged, the risk will be better understood and marked up accordingly.
They sure are, which brings back to the point: how much did this implementation of Mastadon adequately addressed these risks with the reduced code count?
During the first wave of Twitter exodus, several people in my professional circle asked if they should be hosting professional, field-specific Mastadon servers, etc.
My answer, born of moderating a modestly sized forum, was "Absolutely not under any circumstances."
The 100x less code always reminds me of a hypothetical Half Life 2 game engine.
Then your code would be:
StartHalfLife2LikeGame()
and you'd replace millions LOC with one line. If there is a perfect match between the framework and your app, there is no code. The more your app diverges from the ideal app the framework was written for, the more code you have.
Yes, all applications that are aligned with the framework (message spreading from one user to many others) - not that I want to put down their effort in any way.
But this is the core [0] - a simple event location plattform my wife founded hat easily >200k LOC while the core (edit/search/detail page event location) had <10k LOC.
[0] If building Twitter I perhaps would try their setup, although I have been bitten with "magic" JVM frameworks in the past, e.G. we used one commercially and the license cost went up from 80k to 800k YoY. On top of scaling problems we could do nothing about because the framework was a black box.
Twitter has many features, but not all of them are necessary. For example, there is no need to implement popups blocking the page and demanding a user to register, or detailed telemetry collecting user data from 20 different providers.
> Add large-scale distributed live video support at the top of that, and you won't get any close to 10k lines.
But Twitter isn't, and was never, about live video support: this is pure feature creep and that's how you get headcount inflation and a company that can be run for 17 years without making profit (AKA terrible business).
> When I worked in Statuspage, we had support of 50-100k requests per second
Having served 150kqps in the past as part of a very small team (3 back-end eng.), this isn't necessarily as big of a deal as you make it sound: it mostly depends on your workload and whether or not you need consistency (or even persistence at all) in your data.
In practice, building scalable system is hard mostly because it's hard to get the management forgot their vanity ideas that go against your (their, actually) system's scalability.
It takes more self-control and effort to reduce the number of features to the ones that matter. Twitter having more features is a liability, not a benefit.
> Add large-scale distributed live video support at the top of that,
Why? For the love of all that is good and efficient, why? Why not have a separate platform for that? Or link to a different federated video service? Why does every platform need to do all the things?
Indeed. Add a single JavaScript dependency… you will get the banana, the gorilla holding the banana, the tree holding the gorilla, and the whole jungle.
Not to mention the phrase "x times less than" doesn't really make sense the way it's often used. For it to make sense you have to reinterpret it to mean something that it doesn't based on being the opposite of "x times more than" (which is also often misused).
I do C++ backend work in a non-web industry and this entire post is Greek to me. Even though this is targeted at developers, you need a better pitch. I get "we did this 100x faster" but the obvious followup question is "how" but then the answer seems to be a ton of flow diagrams with way too many nodes that tell me approximately nothing and some handwaving about something called P-States that are basically defined to be entirely nebulous because they are any kind of data structure.
I'm not saying there's nothing here, but I am adjacent to your core audience and I have no idea whether there is after reading your post. I think you are strongly assuming a shared basis where everybody has worked on the same kind of large scale web app before; I would find it much more useful to have an overview of, "This what you would usually do, here are the problems with it, here is what we do instead" with side by side code comparison of Rama vs what a newbie is likely to hack together with single instance postgres.
In a typical architecture, the DB stores data, and the backend calls the DB to make updates and compile views.
Here, the "views" are defined formally (the P-states), and incrementally, automatically updated when the underlying data changes.
Example problem:
Get a list of accounts that follow account 1306
"Classic architecture":
- Naive approach. Search through all accounts follow lists for "1306". Super slow, scales terribly with # of accounts.
- Normal approach. Create a "followed by" table, update it whenever an account follows / unfollows / is deleted / is blocked.
Normal sounds good, but add 10x features, or 1000x users, and it gets trickier. You need to make a new table for each feature, and add conditions to the update calls, and they start overlapping... Or you have to split the database up so it scales, but then you have to pay attention to consistency, and watch which order stuff gets updated in.
Their solution is separating the "true" data tables from the "view" tables, formally defining the relationship between the two, and creating the "view" tables magically behind the scenes.
I read their post and honestly it’s not really that much different than just materialized views in a regular database plus async jobs to do the long running tasks.
It’s a ridiculous amount of fluff to describe that. Not to mention it’s proprietary and only supports the JVM and doesn’t integrate with the tons of tooling designed about RDBMS unless you stream everything to them, defeating the purpose.
What really irks me is that they go on and on bragging about the low LoC count and literally show nothing complete. They should’ve held on this post and released it simultaneously with the code.
We are very open in the post that the core concepts are not new:
Individually, none of these concepts are new. I’m sure you’ve seen them all before. You may be tempted to dismiss Rama’s programming model as just a combination of event sourcing and materialized views. But what Rama does is integrate and generalize these concepts to such an extent that you can build entire backends end-to-end without any of the impedance mismatches or complexity that characterize and overwhelm existing systems.
Indexes as arbitrary data structures that you shape to perfectly meet your use cases, a powerful computation API that's like a "distributed programming language", and everything being so integrated make a world of difference.
I understand the desire to see all the code, and that's coming in two weeks. That said, the code in the post isn't trivial as it's showing almost the complete implementations of two major parts of Mastodon: the social graph and timeline fanout.
Next week you'll be able to play with Rama when we release a build of it, and the documentation will help with that.
> But what Rama does is integrate and generalize these concepts to such an extent that you can build entire backends end-to-end without any of the impedance mismatches or complexity
Every time I hear this the reality turns out to be that building anything with this tech is like building something on top of SAP.
But I’m also just allergic to any post that says ‘look how amazing’ in general, so I’m a bit prejudiced.
After reading through the post a bit more, I’m inclined to believe it’s not hot air, but I think most of the innovation where is in the management layer, not the ease of application development.
Just looking at the first example tells me that there’s a million ways someone that doesn’t know what they’re doing can mess this up.
If the author of the platform implements some service on their own platform it’s always going to seem simple.
The difference is that the materialized-view logic lives naturally in the application code; there's no step where they go out of the DB to do computations and then reinsert.
Once SQL materialized views aren't enough, you might do this by replicating your database into Kafka, implementing logic in Flink or something, and reinserting into the same DB/Elasticsearch/etc. Very common architecture. (Writ small, could also use a queue processor like RabbitMQ.)
Their approach is to instead--apparently--make all of these first-class elements of the same ecosystem, not by "putting it all in the database", but by putting the database into the application code. Which seems wild, but colocates data, transformation, and view.
Seems like it would open up a lot of cans of worms, but if you solve those, sounds great.
You can do all of this with https://materialize.com, and you don’t need to write it in Java. Just connect it to a Postgres instance and start creating materialised views using SQL. These views then auto update. So much so, that you can create a view for the top 10 of something, and let it sit there as the list updates. Otherwise just use normal select statements from your views using any Postgres client.
IIUIC, the most significant difference from a materialized view is that the Rama infrastructure recompute only the changed data by checking the relationship between fields, while a traditional materialized view recomputes the whole table?
incremental view maintenance is the database equivalent of: "recompute only the changed data by checking the relationship between fields,"
Oracle has decent support for incrementally updated materialized views, redshift has some too. Materialize.com is an entire snowflake-like platform built around incrementally maintained materialized views.
> I read their post and honestly it’s not really that much different than just materialized views in a regular database plus async jobs to do the long running tasks.
How about you go and implement a Mastodon server to their level of feature parity, and tell us how much effort and how many lines of code it takes?
I really don't appreciate this kind of fluffy, insubstantial, overly dismissive non-content on HN.
This is all armchair for me, but I think they have containers and sharding built in as well, which is the other half of the puzzle when it comes to scaling.
Yes, but there are plenty of NewSQL that support views and offer all of that too. Yugabyte, Cockroach, TiDB and that’s just off the top of my head and open source. If we count proprietary then you have Fauna, Cloud Spanner and more I’m sure.
I’m getting Noria[1] / Materialize / Readyset vibes from this, or perhaps even Samsa[2] ones. (Incidentally, I’d appreciate it if anyone could elaborate on the differences between the two.) Explicit inspiration? Parallel evolution?
So... at a high level, early React for data? In other words, letting a framework manage update dependency graph tracking, and then cascading updates through its graph in an optimized manner to enhance performance?
Obviously, with tons of implementation difficulties and details, and not actual graph structures, but as a top level analogy.
Not at all, especially because React doesn’t do much dependency tracking on its own and is built for predictable UI updates and not performance.
To be honest any parallel with frontend here is meaningless, reactivity and all the concepts at play have existed long before JS and browsers came along, it’s easier to explain from first principles.
I think that’s probably not the case for many new developers that don’t have any exposure to anything not React. Of course ‘react for data’ is entirely misleading, but it may give a decent idea if you don’t have an hour to spend on an explanation.
Nathan Marz created Apache Storm, coauthored the book "Big Data", and founded an early real-time infrastructure team at Twitter. It's likely the 'curse of knowledge' of working on this specific problem for so long is responsible for the unique and/or unfamiliar style of communication here.
> Whereas Twitter stores home timelines in a dedicated in-memory database, in Rama they’re stored in-memory in the same processes executing the ETL for timeline fanout. So instead of having to do network operations, serialization, and deserialization, the reads and writes to home timelines in our implementation are literally just in-memory operations on a hash map. This is dramatically simpler and more efficient than operating a separate in-memory database. The timelines themselves are stored like this:
> To minimize memory usage and GC pressure, we use a ring buffer and Java primitives to represent each home timeline. The buffer contains pairs of author ID and status ID. The author ID is stored along with the status ID since it is static information that will never change, and materializing it means that information doesn’t need to be looked up at query time. The home timeline stores the most recent 600 statuses, so the buffer size is 1,200 to accommodate each author ID and status ID pair. The size is fixed since storing full timelines would require a prohibitive amount of memory (the number of statuses times the average number of followers).
> Each user utilizes about 10kb of memory to represent their home timeline. For a Twitter-scale deployment of 500M users, that requires about 4.7TB of memory total around the cluster, which is easily achievable.
Isn't this where the most difficult(expensive) part is and Rama has little to do with it? It appears that the other parts also do not have to be Rama.
We're storing those in-memory within the Rama modules materializing the home timelines. And the query topologies that refresh home timelines for lost partitions is colocated with that. This is dramatically simpler than operating a separate in-memory database, and Rama has everything to do with that.
> So instead of having to do network operations, serialization, and deserialization, the reads and writes to home timelines in our implementation are literally just in-memory operations on a hash map. This is dramatically simpler and more efficient than operating a separate in-memory database.
he's a developer and curious about the subject. Since it's a blog post, not a scientific paper, the fact that he did not understand could be a communication failure. I think he's being helpful
OP did not specify what their industry actually is. I've been doing "web work" for 17 years and I'm sharing their concern: where's the TL;DR for this? If this somehow can make me 100x as productive, how about starting with a "hello world" example that shows me how is it different from pip install django, etc?
Measuring "Twitter Scale" by tweets per second seems to be not how I would measure it.
Updates per second to end users who follow the 7K tweets per second seems more realistic, it's the timelines and notifications that hurt, not the top of ingest tweets per second prior to the fan out... and then of course it's whether you can do that continuously so as not to back up on it.
That's why we're saying "at 403 fanout". The bottleneck of Mastodon/Twitter is timeline writes, which is posts/second multiplied by the average number of followers per post. So our instance is doing 1.4M timeline writes / second.
Another important metric is "time to deliver to follower timelines", which is tricky due to how much variance there can be every second due to the extremely unbalanced social graph. When someone with 20M followers posts, that multiples the number of needed timeline writes by 15x. We went into depth in our post on how we handled that to provide fairness by preventing these big users from hogging all the resources all at once.
I heard somewhere that one of the particular challenges of Twitter's scale is not the average fanout, but the outliers where millions or tens of millions of users follow a single account. Does your simulation take that into account?
Congrats on the (kinda) launch. I was curious to see what you guys were up to. The blog post is pretty detailed, and with good insights. Reducing modern app development complexity to mixing data structures sounds like a good abstraction. I'm sure you thought really hard about the building blocks of Rama and you know your problems better than most of the hn crowd.
Now, the really hard part becomes selling. If companies start using your product to get ahead, that will be the real proof, otherwise its "just" tech that is good on paper.
On a side note, did you guys got any inspiration from clojure? I see lots of interesting projects propping up from clojure people...
Does a possibility of a Python API figure anywhere in your roadmap? Or am I missing the point here? Seems like it would be a smart choice to have one on the statically-typed side, and one on dynamically-typed side?
We're not releasing one as we don't have the bandwidth right now to maintain and document another API. That said, making a Clojure wrapper around the Java API should be pretty easy.
I've seen many people describe frameworks like this - you know, first you have the slow back-end event-driven master database that you don't query live against, then you've got eventual-consistency flows against the various data-warehouses and data-stores and partitioned sharded databases in useful query-friendly layouts that you actually read live from... and I never see it clearly explained: how do you read a change back to the user literally just after they made the change? How do you say "other views eventual-consistency is fine but for this view of this bit of info we need it updated now".
This write-up is very detailed but I couldn't find that explanation.
You write the update directly to the cache closest to the user and into the eventually consistent queue.
We did this at reddit. When you make a comment the HTML is rendered and put straight into the cache, and the raw text is put into the queue to go into the database. Same with votes. I suspect they do this client side now, which is now the closest cache to the user, but back then it was the server cache.
In Nathan Marz's (the article author) book, Big Data, he describes this and calls it the Speed Layer. I haven't fully finished the article yet, but the components it's describing seem to be equivalent to what he calls the Batch Layer and the Serving Layer in his book.
But I'm kind of getting the impression this works without any speed layer and is expected to be fast enough as-is.
Rama codifies and integrates the concepts I described in my book, with the high level model being: indexes = function(data) and query = function(indexes). These correspond to "depots" (data) , "ETLs" (functions), "PStates" (indexes), and "queries" (functions).
Rama is not batch-based. That is, PStates are not materialized by recomputing from scratch. They're incrementally updated either with stream or microbatch processing. But PStates can be recomputed from the source data on depots if needed.
Forgive me if I’m misunderstanding things, but this seems quite similar to what Materialize and ReadySet do, but like “as a library”, because Rama doesn’t use a “separate” layer for the storage stuff. Is that correct-ish?
Or, maybe their pitch is that the streaming bits are so fast, you can just await the downstream commit of some write to a depot and it'll be as fast as a normal SQL UPDATE.
It’s fast until it’s not. Making a post and then hitting reload and not seeing it can be very jarring for the user. Definitely something to think about.
What do you mean? Every post I do shows up instantly.
Reloading the page from scratch can be slow due to Soapbox doing a lot of stuff asynchronously from scratch (Soapbox is the open-source Mastodon interface that we're using to serve the frontend). https://soapbox.pub/
Yes. Depot appends by default don't return success until colocated streaming topologies have completed processing the data. So this is one way to coordinate the frontend with changes on the backend.
Within an ETL, when the computations you do on PStates are colocated with them, you always read your own writes.
That's part of designing Rama applications. Acking is only coordinated with colocated stream topologies – stream topologies consuming that depot from another module don't add any latency.
Internally Rama does a lot of dynamic auto-batching for both depot appends and stream ETLs to amortize the cost of things like replication. So additional colocated stream topologies don't necessarily add much cost (though that depends on how complex the topology is, of course).
I have to say in my ~12 years as an active Redditor I can’t recall a time where I saw any real state issues, even with rapidly changing votes, etc. Bravo!? Now that we’re beyond the days of molten servers, I have to say its overall reliability in the face of massive spiky traffic is quite a feat.
I imagine you get some UUID back from your write, and effectively "block" until you see it committed to the event stream. The intent of such a system is certainly for the read-after-write latency to be not much longer than a traditional RDBMS. (This is roughly what the RDBMS is doing under the hood anyway.) Probably you can isolate latency-critical paths so they don't get stuck behind big stream processing jobs.
The advantage of the overall architecture is that nearly all application functionality (for something like a social network) can tolerate much higher latency than an RDBMS, so you really want to have architectural building blocks that let you actually use this headroom.
This is usually what I do. Don't even want to wait for an HTTP roundtrip for some of these, e.g. "liking" a post should fill in the heart icon or whatever instantly.
One famous example of this going to far: Mac Mail app used to play a whoosh sound when your email is actually sent. They changed it to whoosh instantly no matter what. Given how often an email might fail to send or get delayed, this meant an actually useful indication of "great, your thing was sent, you can close your laptop now" was rendered useless.
Messaging apps often have a checkmark to indicate the message actually went to the server, and maybe another checkmark to indicate it was received on the other end. Maybe HN needs an icon indicating that your vote went through.
Yeah, it's easy enough that I was able to do it in the web inspector in a minute (artificial 1s network delay added): https://s11.gifyu.com/images/ScPMI.gif
When you step back and consider the incredible amount of manpower and resources that have been put into these applications, it's amazing how buggy these applications are. To put it simply, they're buggy because the underlying infrastructure and techniques used to build them are so complex that the implementation is beyond the realm of human understanding.
The way applications are built, and have been built since before I was born, is by combining together potentially dozens of narrow tools together: databases, computation systems, caches, monitoring tools, etc. There has never been a cohesive model capable of expressing arbitrary backends end-to-end, and every application built has to be twisted to fit onto the existing narrow pieces.
Rama is a lot more than just "event sourcing" and "materialized views". Those are two concepts at its foundation, but the real breakthrough is being that cohesive model capable of expressing diverse backends in their entirety. It took me more than five years of dedicated research to discover this model, and it was extremely difficult.
Yes, I 100% agree with you. I would like something like this to succeed, and agree the problem is real.
But what are the tradeoffs? There's nothing that comes with 100x benefit with no tradeoffs
(side note: I worked on Google Code for a short while in 2008, concurrent with Github's founding ... I think Github moved a lot faster in a large part because they weren't dealing with distributed systems at first -- they had a Rails app, a database, and RAID disks, and grew it from there. We had BigTable and perf reviews :-P )
Eventual consistency is probably one?
Can I specify that comment editing is correct and ACID, while likes/upvotes are eventually consistent? (No is a fine answer, these problems are hard)
I read through much of the doc, and don't see a mention of the word "consistency" at all, which seems like an oversight for something that is unifying what would be in a database with computation.
Rama is a much broader platform than a database, so the consistency semantics you get depend on how you use it. When using Rama, you're not mutating indexes directly like you do with a database, but adding source data that then gets materialized into any number of indexes.
You get read-after-write consistency for any PStates in a streaming ETL colocated with the depot you appended to. This is if you do the depot append with "full acking", which coordinates its response with the completion of colocated streaming ETLs. If you append at a lower level of acking, then you get eventual consistency on those PStates at the benefit of lower latency appends.
Microbatching is always eventually consistent as processing is asynchronous and uncoordinated to depot appends. Microbatching is higher thorughput than streaming and has simpler fault-tolerance semantics.
You'll be able to read a lot more about this when we release the docs next week.
I think many people are going to have problems programming with this consistency model, as they will with any that's different than a single machine. But that's basically "physics", so it's inevitable :)
But it seems like great work within the constraints -- look forward to learning more
I have indeed wondered why none of the cloud platforms have built more forward-looking tech like this -- instead it's copies of AWS and so forth
Hilariously, I went to edit the above comment, and HN was overloaded. Then it served me three or four 500's, AND it served me stale data in between
I was pissed off that I would have to type my comment again, but actually it did save it, and refreshing worked.
From what I understand Hacker News is architected more in-memory, on one big box ... Perhaps similar to the event sourcing model
(not knocking hacker news -- it's generally a very fast site, MUCH better than Reddit. Just that scaling beyond a single machine is difficult and full of gotchas )
StackOverflow used to run on a single (very beefy) machine also for a long time — databases make efficient use of vertical scaling, horizontal scaling is much harder.
Of course, specialist systems can often do much better.
One strategy (somewhat common in lambda architectures) is to query both the long-term store and the in-flight operations, and blend the results. The in-flight stuff is both small and already in memory so it's pretty often trivially fast, even if blending the data is relatively complex.
That does limit you to operations/queries you can describe in this dual format, but pretty often that's fine. Or if you can relax read-after-write you can just ignore the in-flight stuff and read from the main store and then there are no (added) limitations.
You have the option to track the latest update time and, during the minute immediately following this update, direct all reads to come from the leader. Additionally, you could oversee the replication lag among followers and block queries on any follower that lags more than a minute behind the leader.
For the client, it's feasible to retain the timestamp of its most recent write. In this way, the system can ensure that the replica responsible for any reads related to that user incorporates updates at minimum up to that recorded timestamp. If a replica isn't adequately current, the read can either be managed by another replica or the query can wait until the replica catches up. The timestamp might take the form of a logical timestamp, signifying the order of writes (e.g., log sequence number), or it could be based on the actual system clock, where synchronized clocks become vital.
When your replicas are spread across multiple datacenters—whether for user proximity or enhanced availability—there's an added layer of complexity. Requests requiring the leader's involvement must be directed to the datacenter housing the leader.
It’s a massive ask, even if the platform was 100x better, for all developers to give up every programming language and database they’ve ever used to depend on a startups closed source platform for all functionality.
It’s hard enough trusting Google or Amazons cloud offerings won’t change.
It seems that’s what they’re proposing right? What am I missing?
We're actually not asking anyone to give up anything. First off, it has a simple integration API (which you'll be able to see the details of next week) that allows it to seamlessly integrate with any other backend tool (databases, monitoring systems, queues, etc.). So Rama can be incrementally introduced into any existing architecture.
Second, Rama has a pure Java API and is not a bespoke language. So no new language needs to be learned.
So Rama-powered apps need to be written in Java? Or will any JVM language work?
And the Rama core will remain closed-source? That part seems like the toughest sell of all, at a time when the vast majority of developer tooling and backends are open source or at the very least source-available.
Rama sounds interesting to me for my 'next big project', but I'd not even consider building it on top of a closed core. I think this is a pretty common sentiment in these circles.
I understand building an OSS business is not easy either. But perhaps there is some middle of the road that you can walk?
- A contractual obligation to open source all (now current) code a couple of years in the future?
- Or an almost-OSS license that makes life difficult for competing cloud providers, like https://www.mongodb.com/licensing/server-side-public-license... ?
Yep, not OSS licensed is a nope for us for core dependencies. I am aware many here like to sign away freedom (even future viability) for more productivity now (using some new fangled cloud thingy which might/will be gone in a few months to underpin your entire product), but we are not interested in that.
> Rama can be incrementally introduced into any existing architecture
Big if true (and if the opposite, of incrementally removing it also works). There have been similar platform efforts in past, such as https://news.ycombinator.com/item?id=20985429 . For that one, the "massive ask to give up every programming language and database they’ve ever used to depend on a startups closed source platform" seems like the biggest hindrance to adoption.
I can imagine this being really useful from the ground up. Because it looks like it wants to be the source of truth, with different views on the data.
It’s hard to imagine it for a complex legacy application without having lots of added complexity. It wants to be the unifying programming model for the application. It would seem like running with two RDMS sources of truth simultaneously.
It’s like the xkcd “there are 12 ways of doing X, let’s create a standard to unify them” now there are 13 ways
Looks amazing and incredibly smart. But I found the LOC and implementation time comparisons to Twitter and Threads very disingenuous. It makes me wonder what other wool will be pulled over our eyes with Rama in future (or important real world details missed / future footguns).
Still super impressive. Reminds me of when I discovered Elixir while building a social-ish music discovery app. Switching the backend from Rails to Elixir felt like putting on clothes that actually fit after wearing old sweats. Rama looks like a similar jump, but another layer up, encompassing system architecture.
Honestly I'm willing to accept the number they gave since the author (Nathan Marz) was one of the lead/founding devs for twitter's streaming compute backend in the past.
Don't forget their entire ads system, data processing/analytics, monitoring, customer support, payments, internationalization. They have replicated at most a tiny bit of Twitter's core infra for sending Tweets. The company itself does a lot more than that.
It’s hard to construct a true randomized control trial for software engineering methods. People make many claims about programming paradigms or tools hard to validate.
It’s also unsure what we would compare a tool like this to. I doubt you could just say “compare it to Rails” given how frameworks like rails are bound to specific data models, and most realistic applications. You’d have to compare it to some other opinion about how to wire together different data structures.
The performance on the example Mastodon instance is very responsive - almost anywhere I clicked loaded nearly instantly. I created an account and the only thing I found missing was it doesn't implement full text search unless my user was tagged, but that might be a Mastodon specific item.
I think they have thought a lot about typical hard problems, such as having the timeline processing happen along side the pipeline, taking network / storage etc out of the picture. Nice work!
That is indeed an intentional part of Mastodon's design, which we tried to be faithful to as much as possible. We originally implemented search across all statuses and had to reimplement it when we realized Mastodon is a little different.
Well, we didn't start from anything as we implemented this completely from scratch. I believe Mastodon is much more widely used than those so it seemed like a better target for this.
This architecture seems very similar to existing offerings in the "in-memory data grid" category, like Apache Ignite and Hazelcast. I'm more familiar with Ignite (I built a toy Notion backend with it over a few afternoons in 2020).
The way Ignite works overall is similar. You make a cluster of JVM processes, your data partitioned and replicated across the cluster, and you upload some JARs of business logic to the cluster to do things. Your business logic can specify locality so it runs on the same nodes as the relevant data, which ideally makes things a lot faster compared to systems where you need to pull all your data across the wire from a DB. Like Rama, Ignite uses a Java API for everything, including serializing and storing plain 'ol java objects.
Rama's data-structure oriented PState index seems easier to work with than building indexes yourself on top of Ignite's KV cache, but Ignite also offers an SQL language, so you can insert your data into the KV cache however, add some custom SQL functions, and then accept more flexible SQL querying of your data compared to the very purpose-built PCache things, but still be able to do lower-level or more performance-oriented logic with data locality.
Anyways, if you like some of this stuff but want to use an existing, already battle-tested open source project, you can look for these "in-memory data grid", "distributed cache", kind of projects. There's a few more out there that have similar JVM cluster computing models.
Hazelcast has been on my list to explore for a while. Anyone have pointers to a good sample project / deep-dive in the same sort of spirit as the OP here?
Also would love to hear folks’ thoughts on the sort of usecase where this data grid excels.
I'm excited to see the docs for Rama. But I am also a little scared of the comment " I came to suspect a new programming paradigm was needed" from Nathan.
It's not so much that I think the comment is wrong or anything, but rather that it seems so similar to what I have heard in the past from power-lisp (or Clojure in this case) super-smart engineers.
I feel like we have reached a point in software development where "better" paradigms don't necessarily gain much adoption. But if Rama wins in the marketplace, that will be interesting. And I am quite excited to see what a smart tech leader and good team have been able to grind out given a years-long timeframe in this programming platform space . . .
This is why we exposed Rama as a Java API rather than Clojure or our internal language (which is defined with Clojure macros, so it's technically also Clojure). Rama's Java dataflow API is effectively a subset of our internal language, with operations like "partitioners" being implemented using continuations.
This is meant to be hyped to sell your Rama platform/product/framework?
That you have spent 10 years building in secret?
During that time you have built a datastore and a Kafke competitor and ?
Should not those 10 years be factored into the time it took to develop this
technical demo?
Is it 100x less code including every LOC in all of Rama?
I mean I am sure you picked a use cast that is well suited to creating a
Twitterish architecture implementation.
If I went off and wrote a ThinkBeat platform for creating Twitterish systems
and then created a Twitterish implementation on top if it, its real easy to
reach low LOCs.
Kind of reminds me of when FoundationDB came out, they really needed to demonstrate lots of different use cases to prove they were the database storage layer to rule them all... Not just one
Great question. There's actually two ways to look at this: what does it look like to run Rama in a unit test environment, and what does it look like to run a small-scale single-node Rama application in production?
For the former, Rama has a class called "InProcessCluster" that works identically to a real cluster. It enables Rama applications to be tested and experimented with end-to-end. There's an example of this in the post and this is what we're releasing next week.
For the latter, Rama can be run on a single node with each daemon and module being a separate process. We made it really easy to launch single-node Rama instances with just a couple commands with the "rama" script that comes with the release. That said, we haven't spent much time yet optimizing small-scale Rama deployments and there's likely things we can do to make it more efficient (e.g. combine the Conductor and Supervisor daemons into a single process).
Follow up question : do you see Rama as being a good fit for applications that /don't/ need Twitter scale? These have simpler requirements, but I feel the integration you propose could still have value there.
Yes, it's a better model for developing backends in general. Our comparison against Mastodon's official implementation demonstrates this, being at least 44% less code.
It's the ability to avoid the impedance mismatches which dominate existing tooling that makes such a difference. With existing databases, including RDBMS's, you have to twist your application to fit their data models. The existence of things like ORMs help, but they add their own layers of complexity.
With Rama, you mold your indexes to exactly match your application's needs. And you're always just working with objects represented however you want, whether appending data to depots, processing data in ETLs, or storing data in PStates.
That computation and storage are integrated and colocated is another way that Rama simplifies application development and deployment.
Is there a breakdown of effort Twitter spent doing the mastodon-level service (serving a feed of the accounts you are subscribed to) vs everything else like ads, algorithmic feed, moderation, fighting spam, copyright claims, localization, GR, PR, safety, etc?
Is this just me, or does the code in the post feel like they've implemented what should have been a new programming language on top of Java?
Their "variables" have names that you have to keep as Java strings and pass to random functions. If you want composable code, you don't declare a function, you call .macro(). For control flow and loops, you don't use if and for, but a weird abstraction of theirs.
I feel like this code could have been a lot simpler if it was written in a specialized language (or a mainstream language with a specialized transpiler and/or Macro capabilities.)
I'd quote the old adage about every big program containing a slow and buggy implementation of Common Lisp, but considering that this thing is written in Clojure, the authors have probably heard it before.
Internally there actually is a new programming language, implemented using Clojure macros (so it's also Clojure). The Java dataflow API is exposing a subset of that language. We did it this way rather than expose this new language directly because most people don't know Clojure and we don't feel it necessary or desirable to require people to have to learn a new language to benefit from this technology.
Kinda disappointed by the simulation, where are all the viral posts?
I've been digging around for a while and haven't found any posts with more than 20 faves. The accounts I've found with ~1 million followers have little to no engagement. I want to see how a post with a million faves holds up to the promises of "fast constant time".
I'm especially curious about these queries — fave-count and has-user-faved — since a couple years ago Twitter stopped checking has-user-faved when rendering posts more than a month or so old, so I imagine it was expensive at scale.
The load generator generates boosts/favorites for a subset of posts that are randomly picked to be "popular". However, since the rate of posts is so high even individual posts picked to be "popular" are only getting ~70 reactions.
Tracking reactions is considerably easier than timeline fanout though, as a favorite does a small handful of things (updates set of users favoriting a status and sending a notification), while fanout has to do an operation on every follower (403 operations on average, sometimes up to 22M).
The code getting the favorite count for a status looks like:
The API server doesn't do these queries individually, which would be two roundtrips. It does them together in a query topology along with fetching other needed information (like number of boosts, number of replies, "has boosted", "has muted this status", etc.).
I would argue that this is not "a Mastodon instance", since it is not running Mastodon - other than that, very very neat work! I'm excited for that "Source Code" link to be live :)
Yeah, I think this is just an ActivityPub server that supports the Mastodon extensions, right? I think we should embrace the fact that the federated world can be diverse, rather than just call everything "Mastodon"
Mastodon has it's own API. It basically offers a very limited ActivityPub API too, but it's own API is very different.
And it's a very slim ActivityPub inplementation. For example, I don't think you can do basic things like get an individual post in ActivityPub. This should be easy simple json-ld to get but it's just 404. https://www.w3.org/TR/activitypub/#retrieving-objects
It does have a bunch of stuff that isn't federated though, such as Like counts/collections. And of course it only implements the server-to-server (S2S) part of AP, not the client-to-server (C2S) part.
We call it a "Mastodon instance" because we implemented the entire Mastodon API (https://docs.joinmastodon.org/api/). This is in addition to also implementing the ActivityPub API which Mastodon also implements for federation.
"Originally, Twitter was one, monolithic application built with Ruby on Rails. But now, it's divided into about two hundred self-contained services that talk to each other. Each runs atop the JVM, with most written in Scala and some in Java and Clojure"[1]
So is Twitter not a Twitter instance? Like if it looks, walks and toots like a Mastodon, is it not a Mastodon instance?
That doesn't match the way people use the term though. Pleroma and Akkoma implement the Mastodon API but wouldn't be called Mastodon instances since they aren't running Mastodon.
> We call it a "Mastodon instance" because we implemented the entire Mastodon API…
Except "Mastodon instance" means an instance of Mastodon, which is open source. Whether or not it was intended to be deceptive (I'd think a group of smart people would know better), this personally left a bad taste in my mouth.
>We spent nine person-months building our scalable Mastodon instance
They federated this brand new code in 9 months, and bluesky still hasn't released anything regarding federation. Don't keep your hopes up, it would kill their business model to let anyone run part of the network. People-driven networks are just not compatible with commercially driven ones, name one successful example.
I think it's smart from a legal perspective, because the team members seem to partially be coming from companies acquired by Twitter.
So I guess, if you say "it's a Mastodon-clone", you cannot be accused of taking proprietary ideas from Twitter (this is just a guess, they know better).
But technically very interesting and refreshing to see. I really like their approach. It feels they are innovative.
From a legal perspective, it's against Mastodon's trademark policy: "Only use the Mastodon marks to accurately identify those goods or services that are built using the Mastodon software." https://joinmastodon.org/trademark
Something I'm immediately thinking about with this is change management and inertia at the early stages of a new, underdefined project. Less code is great, the big question is how such a system compares to the usual hack-and-slash method of getting a v1 up and running as you search for PMF from the perspectives of ops, cost, data migrations, rapid deployments, and so on. Presumably, the idea here is to start from the beginning with Rama, skipping over the usual "monolith fetches from RDBMS" happy paths, even for your basic prototype, this way you don't slip into a situation like Twitter did where that grew slowly into an unscalable monstrosity requiring a rewrite. So an article focused on the "easy" part that's required in the beginning of rapid change, as much as it's not as important as the "simple" part that shines later at scale, seems useful.
The basic operation Rama provides for evolving an application over time is "module update". This lets you update the code for an existing module, including adding new depots, PStates, and topologies.
For context, nathanmarz created what is now Apache Storm, which is used for stream processing at some of the world's largest companies, so he knows a thing or two about scale.
Considering the length and amount of detail in this blog post, I understand why they would need another week to get the code ready (assuming there will be more docs)
Measuring words and loc is not a great way, imho, to share what you’re doing. In fact, I’d love a much shorter set of documentation now to under and this better. Long docs will probably make it less likely to read.
I have often thought along similar lines, that the effort involved in building software seems to indicate a level of abstraction that is missing. The general theme of the comments seems roughly what you should expect from a very bold, paradigm shifting proposal. Good luck with your efforts and don't let this discourage you!
I will make one minor suggestion that I hope is constructive. I found the post difficult to read, largely because you rapid fire introduce a bunch of completely new concepts and propose a solution to many problems at once. You make a passing comparison to "just event sourcing and materialized views", although this was the easiest way for me to understand what you are doing. Starting from event sourcing and materialized views puts the reader on a ground they already understand, and moving on from there to why rama is better/what it adds on top, would be an easier transition.
Thanks for the feedback. The post is meant to give a taste of Rama by showing what it can do for building a full application end-to-end. Next week, we'll be releasing a set of guides, tutorials, and documentation which introduces the concepts in a much gentler way. We'll also be releasing a build of Rama that you can download to try Rama out yourself, and an open-source repository of example code from the documentation.
i always had this question: how realistically is to, having an standard spec and interoperable protocols, for toxic apps of big international tech companies that provides """services""", so instances of implementations can be maintained by municipalities or local tech business and talent with 100x less employees and money?
what policies should be in place to achieve that? what would be the challenges? it would be better/healthier? is someone researching such things like transition to sustainable digital services? (sustainable in terms of local labor, privacy, economy, accountability, etc...)
i mean if you think about this as public services not as a business, profit is secondary, and first is just to make the thing better and better for the users, no need for spying , no advertisement, no need for a rich piece of shit somewhere getting a piece of the money paid in your city for every taxi drive, food delivery or to give up privacy to a soulless/faceless entity just because you want to say something publicly or keep in touch with people. there is no disruption from their part, its just an old thing put on the internet, they are just in the middle of everyone's life, just sucking everything they can. is the actual state of affairs "efficient"?
there must be fed up engineers and tech people everywhere with the sad state of IT industry.
The EU already has regulations in place regarding open banking with its Payment Services Directive. I'd imagine a similar framework could be applied to big social tech.
I don’t really see the point of the comparison. They should show something you could only make with Rama or show how much faster it is to iterate with Rama.
Saying this is 100 or even a million times cheaper is like saying taking a picture of Sistine chapel and printing out copies is a trillion times cheaper than making it originally.
Many of us on this site could make a number of products very efficiently and cheaply given a static and fixed set of requirements as well as an existing implementation for reference.
That being said it was a very detailed post, so kudos for that, but it’s far too vague to be actionable. Why not just release the code and post simultaneously instead of just bragging about how little code was required?
I think the marketing idea of this is amazing : I would probably never even consider learning and reading about such a framework if I heard of it straight up.
But if you are really releasing a usable open source implementation of something performant that actually federates properly, that is a huge selling point that buys you a ton of respect up front.
Noticeably missing are any details about concurrency control and replication or recovery protocols. A Twitter clone is one thing but any sort of application needing ACID Transactions is a whole other beast.
All data on Rama is replicated automatically with a configurable "replication factor". Data written to Rama is not made visible until it's successfully replicated. The documentation we're releasing next week includes a page going into detail in how this works.
The real reason why we can't easily replicate Twitter/Facebook/Google is because we don't have the distributed storage/caching/logging/data processing/serving/job scheduling/... infrastructures that they have built internally that are designed to provide some level of guaranteed SLAs for the desired scale, performance, reliability and flexibility, not because it is hard to replicate the application logic like posting to timelines. That's also why Threads were built by a small team rather quickly -- they already have the battle-tested infras that can scale.
Any attempt to build a simplified version of the ecosystem will face the same fundamental distributed system tradeoffs like consistency/reliability/flexibility/... For example, one of the simplifications may be mixing storage/serving/ETL workloads on the same node. And the consequence is that without certain level of performance isolation, it could impact the serving latency during expensive ETL workload.
For Rama to be adopted successfully, I think it is important to identify areas where it has the most strengths, and low LOCs might not be the only thing that matters. For example, demonstrating why it is much better/easier than setting up Kafka/Spark and a database and build a Twitter clone on top of that while providing similar/better performance/reliability/extensibility/maintainability/... is a much stronger argument.
> The instance has 100M bots posting 3,500 times per second at 403 average fanout to demonstrate its scale.
Mastodon has to send messages to each instance with a recipient. That server can then fan out to all it's subscribers. The way this point is worded makes me think all the bits are on just a single instance, meaning all the fan out can be dealt with internally without having to do any server-to-server at all.
That is a fair comparison to Twitter, which is single instance. But it sounds like a much reduced ambition versus the task Mastodon has to do.
I would hazard the guess that twitter's "show tweets to other people" is 1/40th of the functionality piled into twitter; some other large functions would be things like "track ad sales" or "improve engagement" or "allow random law enforcement organizations to engage in whatever access is needed for any particular part of the world" Each of those is going to be a huge pile of code and all of it working together is going to N! your complexity.
I would not want to speak for raverbashing but I feel the same way: I actually can't tell if the bug is with soapbox or with your instance but clicking on the first link from your post practically locks up my browser due to every single Toot getting swapped out "at twitter scale"
If one clicks quick enough to jump to an actual post, it seems relatively static so it's hard to tell if the bots are deleting and recreating their posts or what. In true Xitter clone fashion, trying to view the Posts & replies from any one user is "sign in
Anyway, all of this is not to detract from your framework announcement as much as to have you consider that it's perfectly fine to label that instance as a load test, that's a fine thing, but calling it a legitimate instance seems to be a potential source of confusion
We did notice on a less powerful machines the browser getting overwhelmed with the rate of new content (even though we're only streaming 10/s instead of the full 3.5k/s actually happening on the backend). I don't know if the poor performance in this context is due to Soapbox, the browser, or just the hardware.
To get a better feeling of Rama's performance on your hardware, I suggest registering an account which will allow you to poke around the whole platform. It takes just a couple seconds to register and we don't send any emails.
Are there any plans for exposing a Clojure API? Given that it's implemented in Clojure, seems like it would be a natural fit. Interop with Java is nice but can be cumbersome compared to the more natural calling conventions and idioms (threading macros instead of `..` builder patterns, etc).
I appreciate the inversion/melding of the data model and compute. Im curious to know your perspective on two parts:
How would multitenancy fit in to rama? Using your mastodon example, providing “hosted mastodon instances as a service”, where you _also_ allow for data governance, per customer encryption at rest, user IDP support, etc. Is it multiple single tenant rama deployments, running independent customers? Multitenant rama clusters with shared depos and each pstate includes “tenant id”? Something else?
Second whats the product/business angle on customer confidence, technical novelty, and your business core competency? A dated example but Im thinking of somewhere like basho with riak. Super cool tool, takes some mental adjustment to “get”, challlenges selling hosting vs software vs pro services.
For now it's going to be on-prem, so each user will just have their own cluster. Things like E2E encryption are pretty easy to implement on top of Rama's existing primitives (there was a good question about this on the rama-user group yesterday https://groups.google.com/u/1/g/rama-user/c/jj-ILcoMjtk).
We'll likely have a fully managed cloud version in the future.
Riak was good technology at the time, but it was really hard to distinguish from other K/V databases and didn't really move the needle on core business metrics (like development cost). Rama dramatically changes the economics of building large-scale software. It will take awhile for many to grasp that, as is obvious from many of the comments here, but I expect that as more and more users have massive success with Rama, that understanding will come.
The big question is whether we really want more centralised cloud applications of this sort in the AI future, with its potentially exacerbated risks of data breaches, identity theft, spam, etc.
Having a "Rama" for local-first, truly distributed (Solid Project style), self-sovereign-identity based apps would be differentially better probably.
It sounds like the authors spent 10 years building a Twitter factory, and now they can produce a Twitter faster than Twitter could produce it "by hand".
As someone who has worked in both startups and Enterprise IT for over 30 years (including large Java based systems) I see a use case for Rama in large companies who have a lot of difficulty glueing many different systems together to achieve scalability. So I think that Red Planet Labs could get several contracts in the 100k USD and over range in large Enterprises. This is for enterprises who have the problem of integrating many systems to achieve scalability and who are already large Java shops.
However, I do not see Rama's initial market being startups, since they just want the simplest way possible to build UI + backend and want to iterate super fast with tech that their developers already know in the initial stages.
Why not? Incredible claims should have equal amount of scrutiny. I am glad that HN has a default skeptical bias - we don't want to be swept up in frenzies. If anything, HN is still relatively prone to these frenzies from time to time.
> To demonstrate the scale of our instance, we’re also running 100M bot accounts which continuously post statuses (Mastodon’s analogue of a “tweet”), replies, boosts (“retweet”), and favorites. 3,500 statuses are posted per second, the average number of followers for each post is 403, and the largest account has over 22M followers. As a comparison, Twitter serves 7,000 tweets per second at 700 average fanout (according to the numbers I could find).
Is Twitters 7k tweets per second the average? If so, what’s the peak rate, and have you tested your system under this load?
Look, I don't want to defend Twitter but ignoring 15 years of changes and the whole journey of scaling and then using the cost op just building a snapshot of the 15y old version is pretty disingenuous.
That's a bit like starting an Oracle clone now and summing up what they spent on developer salaries in the last 40 years. You basically can't not "reduce costs".
And no "the original consumer product" is not a real cop-out, you probably still have tons of people building iterations.
This is a lovely and very detailed showcase in how to combine streaming+ETL+materalized-view+query!
That said: You need better advisors. Your investors and/or the board gave you bad advice on how to publish these accomplishments and talk about them.
I hope your go-to-market strategy works out a little better.
Hyperbole is fine, but at least on hacker news, the audience is a bit careful with regards to grandiose statements.
What might work well on an investor presentation might backfire when you target engineers as audience.
I saw the Twitter post first and the blog next. The premise is compelling but it's been a promise made to the data and software world for decades together.
The architecture and the core primitives are something that we agree with a lot. Use cases and business value are a whole different ballgame.
We have invested the past 5 years at InfinyOn building Fluvio our open source rust implementation of core event streaming primitives which is implementing this architecture to orchestrate data as efficiently as computationally possible today. I am happy to see this project as an effort in the same direction.
Hmmm, "Rama is programmed entirely with a Java API – no custom languages or DSLs" according to the landing page, but this sure looks like an embedded DSL for dataflow graphs to me - Expr and Ops everywhere. Odd angle to take.
I consider "DSL" as something that's its own language with it's own lexer and parser, like SQL. The Rama API is just Java – classes, interfaces, methods, etc. Everything you do in Rama, from defining indexes, performing queries, or writing dataflow ETLs, is done in Java.
Yep the original term DSL was for custom languages, the eventual introduction of using it for these kinds of literate APIs was done later. Using it in the original way unqualified is fine imo.
Whether you like it or not; internal DSLs became a thing with Ruby back in the day. And these days things like Kotlin also lend themselves pretty well to creating internal DSLs. Java is not ideal for this. Kotlin and Ruby have a few syntax features that make it very easy.
The difference is making an effort to expose the features of the library via a syntactically friendly way. Most Kotlin libraries indeed have nice APIs that are effectively mini DSLs.
If you need an example, Kotlin uses a nice internal DSL for HTML where you write things like
Div {
P {
+"Hello world"
}
}
This is valid Kotlin that happens to construct a dom tree. There is no magic going on here; just usage of a few syntax features of the language. Div and p here are normal functions that take a receiver block as the last parameter that receive an instance of the element they are creating. The + in front of the string is function with a header like this in the implementation of that element.
operator fun String.unaryPlus()
The same code in Java would be a lot messier because of the lack of syntactical support for this. You basically end up with a lot of method chaining, semicolumns and their convoluted lamda syntax. The article has a few examples that would look a lot cleaner if you'd rewrite them in Kotlin.
When someone makes a distinction that you don't immediately appreciate, maybe don't just dismiss it as splitting hairs, as if the world was a simple place.
Based on what I read it's very similar to Kafka Streams + batteries ([semi]automatic workload orchestration, reactive queries, higher-level/slicker/"smarter" API (?))
Could you please compare Rama with Kafka Streams, especially from the point of view, if I would try to reimplement Rama API on top of Kafka Streams? What fundamental difficulties I'd face?
Summarizing, now edited down with some editorializing for clarity:
What is it? build web-scale reactive backends with an expressive java dataflow API. Instead of a database you develop your own custom app-specific indexes which are reactive, distributed and durable. It's like event sourcing and materialized views but integrated in a linearly scalable way.
> I cannot emphasize enough how much interacting with indexes as regular data structures instead of magical “data models” liberates backend programming
> It allows for true incremental reactivity from the backend up through the frontend. ... enable UI frameworks to be fully incremental instead of doing expensive diffs to find out what changed.
Ok, so in my mind I am positioning this against Materialized / differential dataflow, whose key primitive is a efficient streaming incremental join that works across very large relational tables. Materialized makes SQL reactive, Rama gives you a java dataflow DSL for developing purpose-built reactive database indexes.
How it works? 4 concepts: Depot, ETLs, PState, Query
Depots: "distributed, durable, and replicated logs of data." [Event streams?] "like Kafka except integrated" "All data coming into Rama comes in through depot appends."
ETLs: data arrives via depots, and is ETLed to PStates via "a Java dataflow API for coding topologies that is extremely expressive". "Most of the time spent programming Rama is spent making ETLs."
PStates seem like reactive data structures that are also durable/replicated, these are meant to supersede your database and indexes, letting you build custom purpose-built indexes that contain 100M elements:
> “partitioned states” are how data is indexed in Rama ... Unlike existing databases, which have rigid indexing models (e.g. “key-value”, “relational”, “column-oriented”, “document”, “graph”, etc.), PStates have a flexible indexing model. In fact, they have an indexing model already familiar to every programmer: data structures. A PState is an arbitrary combination of data structures. ... nested data structures can efficiently contain hundreds of millions of elements. For example, a “map of maps” is equivalent to a “document database”, and a “map of subindexed sorted maps” is equivalent to a “column-oriented database”. Any [composition] is valid – e.g. you can have a “map of lists of subindexed maps of lists of subindexed sets”.
Query: once you develop PStates to aggregate relevant data into a custom index of the right ... shape?, query seems sorta like GraphQL selectors over your custom index:
> Queries in Rama take advantage of the data structure orientation of PStates with a “path-based” API that allows you to concisely fetch and aggregate data from a single partition
> “query topologies” ... real-time distributed querying and aggregation over an arbitrary collection of PStates. These are the analogue of “predefined queries” in traditional databases, except programmed via the same Java API as used to program ETLs and far more capable.
Why choose Java of all languages. Why not something more modern and less verbose like Go or Rust. Just asking as I have worked enough in Java and then spend a lot of time in GC tunining. Granted the code was not that great and from a diverse team with different skill levels causing all the leaks.. But still
They actually used clojure; which is an interesting choice.
GC tuning on the JVM is much less of a topic these days than it used to be. The default garbage collector was changed at some point (G1). It has some configuration options but they come with sane defaults that mostly just work fine and adapt to your memory and cpus. You don't spend a lot (or any) time on tuning this typically. I know I haven't even looked at GC params in many years now. Never had to. And we run on modestly sized vms of 1 or 2 GB typically. This was different 10 or so years ago when G1 was still newish and not default. ZGC was introduced with Java 11 (I think), and aimed at very large heaps. It trades off additional overhead for guaranteeing very low latency. That tradeoff is why it is not default. For most users, G1 without any tuning whatsoever should be fine. Generally, if you are stressing your heap, you get more hardware. And if you are not, the GC should be keeping up just fine.
Anyway, like it or not, the JVM has been a work horse for big data for ages. Things like Hadoop, Kafka, Cassandra, Elasticsearch, etc. all run on it and scale fine (and typically without a lot of GC tuning). The only feasible alternative to the jvm used to be things like C++. Lately, Go and Rust are pretty credible in this space as well and both have had to do a bit of catchup in terms of maturity of tooling, libraries, and language features. Things like generics (Go), async (Rust), etc. are still fairly recent additions and both kind of relevant in a project of this type.
In any case, switching languages is hard for teams and these guys have been around for quite some time. When they started, Rust was a lot less mature than it is now and Go was still pretty new as well. Neither was an obvious choice for this stuff at the time.
Yeah about what I expect from a "we rebuilt twitter for cheap" post. There's no point to the comparisons with the Twitter codebase size/cost. It completely distracts from what is probably a perfectly fine project.
That's a fair criticism - this isn't an apples-to-apples comparison. What I find interesting about this is the cost of running the service. Being able to run a twitter-like thing on a hundred or so large aws instances is neat and I'm sure that many folks here dream of that kind of efficiency at their day jobs, but I'm more excited about how this scales down. Can you run a community of a thousand or so posters on a micro or nano instance for a few bucks a month or less? At that scale and cost, donations should easily be able to cover hosting fees and you would surely be able to deputize enough mods to keep things civil (for whatever definition of civil your instance lands on). Ads, monetization, personalization are non-issues (well, not major issues) at that scale.
Semi-related: Their homepage (https://redplanetlabs.com/) has to be one of the best looking websites I’ve seen in a while, buttery smooth as well. I love it.
Another try of Event Sourcing + CQRS. I thought it was great but after so many years it's still out of main stream. Lack of an integrated platform may contribute to, but can't explain everything.
I guess most people can't accept things which is fundamentally harder in such architecture than normal ones.
Really nice ideas here. The crucial advantage is having the storage+computation run as close as possible, which is a big advantage over a regular DB+app backend.
But I won't ever consider investing in it unless it's some form of open-source. It's too much of a risk to have a closed-source core.
X years from now "We reduced the cost of building _____ at Mastodon-scale by 1000x".
It's certainly interesting, certainly an accomplishment, but it's also the nature of the game. The present eating the past, to be eaten by the future. Rinse. Repeat.
how about doing a comparison on consumer grade vps like 1 vcpu/4GB ram setup comparison between your product and mastodon or pleroma for example?
i mean sure you can build a twitter scale product but federation means people can do that on their own and with your tech, they dont have to worry about scaling issues.
Actually if you read the article you can see we tested way above Twitter-scale. We can easily run this instance at full Twitter-scale by just paying for more servers.
The point isn't the Mastodon instance, but rather that Rama enabled us to build it at scale with in a tiny amount of code and time.
Mastodon and Twitter don't do the same amount of work per post. Mastodon doesn't have a recommendation engine, they don't have an advertising engine, they don't scan every post for CSAM, there's no global search, etc. (Some of these things are good not to have, but they still drastically change the scope.)
Claiming to have enabled significant scaling of a Mastodon/ActivityPub-compatible instance is fine. Claiming to have replicated Twitter on the cheap is, from the post, not accurate.
That's why we're comparing it to the cost of Twitter's original consumer product. As a demonstration of Rama, we scoped this project to the entirety of Mastodon which is roughly equivalent to Twitter's original consumer product (actually, it's probably greater in scope with additional features like hashtag follows and more complex filter/mute capabilities).
All those use cases you listed absolutely can be implemented with Rama, and Rama's extreme cost benefits would apply to those as well.
That's not the title of the article and also not what the article says. I would be really pissed if you editorialized the title of my article like that.
I'm happy to correct it if anyone suggests a better one. The intention is to find a neutral title that accurately reflects what the article itself is saying.
We've learned that when an article's original title generates complaints like https://news.ycombinator.com/item?id=37137317, the thread is likely to get derailed by shallow arguing about the title. It's in both the author's interest and the community's for us to nip that in the bud by (1) putting an accurate and neutral title at the top (preferably using representative language from the article itself), and (2) marking the title complaint offtopic since it no longer applies. These steps nudge the thread toward discussing the article's content rather than merely its title.
Alright, sounds reasonable. I think the problem here is that the author specifically says (in a sibling comment) that the point is not Mastodon and now it's in the title. Maybe they're fine with it though.
I'm no expert and definitely get things wrong - we only skim things and make a first crack at an attempt, and then rely on other people to refine it. If Nathan or someone else wants to suggest a more accurate and neutral title, we can do that - the goal is simply to clear the discussion space for something more interesting than title fever (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...). But now I'm repeating myself!
In the year of our lord 2023, people are still launching immature products with "we built a clone of a tiny subset of Twitter" as their use case? Come on. Twitter is huge because they have to support a huge number of use cases. Using this proprietary framework won't magically make complex use cases go away.
As we mentioned in the article, Instagram just spent ~25 person-years building Threads which is a barebones clone of Twitter. Not only did we build our instance 30x faster than that, we have way more features like federation, hashtag follows, polls, DMs, global timelines, and more. And Instagram didn't start from scratch as Instagram/Meta already had infrastructure powering similar products.
Going by their claims, they are showing off their generalizable platform, Rama, by building an application on top of it. The application is an example, not the product. For example, someone implements a Todo app on their hot new javascript framework in 10 mins, your objection would be, "But it took you 2 years to make the framework, so actually it took you 2 years plus 10 mins". Why stop there? It also took many years to build the underlying language, networking layers, infrastructure, processors, materials etc etc. You have to draw the line at the point where the application specific code starts and the generalizable platform ends, no?
The JVM took years to write. It took decades to develop the technology necessary to build a modern microcomputer. Before that, millennia to invent written language. And now that those platforms (including Rama) all exist, one can deliver a Mastodon server on top of them in about 9 man-months.
Exactly, why are so many people missing this point. It's not "we built a narrow, tedious framework for knocking off Twitter clones", it's "We built a platform that turns data processing on its head and look in a couple of months you can clone Twitter just imagine what YOU can do with this."
I see parallels though to Datomic, where they turned the database inside out, co-located the app logic and data and indexes, etc. There are a bunch of great videos on YT about Datomic by Rich Hickey & co, worth a watch and I think shine a light on the approach here, too.
I think they didn't do a good job making the point clear for people who just clicked the link without context. It starts off talking a lot about the Mastodon clone and then gradually starts talking about Rama as it goes.
So if one was building a database, and then later building an app against their own database, they should not get to claim how much time the latter took alone? There are enough hyped bullshit in the world to call out, but I would not be so quick to dismiss this one, especially given that the author does know what he talks about (he used to work at twitter)
It can be the smallest things.
Just the other day a CS professor was banned from a programming server for interviewing at the NSA 15 years ago. That was pre-snowden.
The bright side is that it's easy to move to a new server.
> How is it possible that we’ve reduced the cost of building scalable applications by multiple orders of magnitude?
> You can begin to understand this by starting with a simple observation: you can describe Mastodon (or Twitter, Reddit, Slack, Gmail, Uber, etc.) in total detail in a matter of hours. It has profiles, follows, timelines, statuses, replies, boosts, hashtags, search, follow suggestions, and so on. It doesn’t take that long to describe all the actions you can take on Mastodon and what those actions do. So the real question you should be asking is: given that software is entirely abstraction and automation, why does it take so long to build something you can describe in hours?
> At its core Rama is a coherent set of abstractions...
This conclusion is alarming to read from a company that's trying to sell a new platform. The vast majority of the work in building Twitter or Reddit is not about building a coherent set of abstractions, it's working with an often incoherent reality, dealing with a myriad of laws that describe, as if your web app were a human clerk at a post office, how to handle PII and credit cards and CSAM filters and audits and copyright claims and on and on...
I'm honestly shocked that the technical implementation of a simplified, coherent platform took a full 9 person-months. That shouldn't be the hard part. What I'd want to know as a prospective customer is how you handle exceptions to your beautiful, idealized architecture, when some foreign country requires that you only store comments posted by their citizens within their borders or something like that.
~~full text search doesn't appear to work... so it's possible they punted on one of the harder parts, which is fast efficient accurate fuzzy search, which moderation and a lot of those other harder things rely on.~~
eta: they say that had it but removed it because apparently it's not something mastodon supports. so I guess it is a pretty good high level implementation.
Building Twitter/Mastodon *not at scale* isn't that hard and certainly doesn't take 200 person-years. Building it *at scale* is a completely different story. Remember the fail-whale? That was years of Twitter struggling to scale their product.
That said, as we described in the post our implementation of Mastodon is less code than Mastodon's official implementation. So not only is Rama orders of magnitude more efficient for building applications at scale, it's also much faster for building first versions of an application.
Well since you use clojure, you probably know that to have small codebase, people often pick clojure. Going from point A to point Z quickly is rarely a goal for startups, going through A.. B... C... quickly, is the goal. I am still looking through all this, but a thought of having to bet on some java api + hope and pray it will jump over all unknown hoops, hm.
Comparisons to twitter are unfair, twitter is not really technical gem or is it? It's pretty impressive to build it with 3 ppl in 3 months, but hmm also seems feasible using other tech, given all blueprints are out there.
Well, as mentioned in the post Instagram literally just built and released their own barebones Twitter clone this year, and it took them 25 person years. They were also able to leverage all their existing infrastructure powering similar products.
So I would not say it's remotely feasible to do this in less than one person-year with any other technology.
So the things that make it difficult are all things you shouldn’t be doing in the first place? Well that certainly helps.
You shouldn’t be handling PII/raw CC’s anyways (assuming FinTech is not your core business)
Secretly scanning your customers private messages against an illegal and immoral hash table from a pseudo-government entity? Are you law enforcement? No? Then fucking stop.
Copyright claims? Fuck ‘em. Only do what you are absolutely, positively, no way-out legally bound to do. No more no less. Require formal, written requests and comply in the maximum amount of time allowed.
Audits? What kind of audit? If they’re non-financial you’re probably doing something wrong.
Corporate squares have ruined the tech scene, and it’s time to resist.
the group involved here may want to be mindful of the Mastodon gGmbH trademarks. Using the Mastodon logo on redplanetlabs.com to pitch a reimplementation of ActivityPub might be seen as infringing.
Any trademark case is going to have to prove that a reasonable person would think this article is from Mastodon gGmbH, or is talking about their product "Mastodon".
The top of the page reads "Red Planet Labs", the title of the article is "How we reduced the cost of building Twitter at Twitter-scale by 100x" and the first line of the article is "We built a Twitter-scale Mastodon instance from scratch in only 10k lines of code."
No reasonable person is going to think that this article has anything to do with the official Mastodon software, so there's no trademark issue here.
Trademark law doesn’t work like that. You don’t get to license your trademark in the same way you get to license copyrighted works.
Otherwise, people could just say you can’t use their trademark in any document that says something negative about them, and then successfully sue the press and angry customers for complaining about them.
You're missing the point. Rama is a generic platform that provides a new baseline for how expensive it is to build applications at scale. There's nothing about Rama specific to social networks. What we're showing is that Rama creates a new era in software engineering where the cost of building applications at scale is radically reduced. With Rama, anyone embarking on a new application today has a radically different economic outlook for the end-to-end cost of developing that application from prototype through large scale.
I think we provided a ton of substance backing up that claim, and we will provide even more next week when we release the build of Rama that anyone can use and its corresponding 100k words of documentation.
- "Depots" are event streams (for event sourced data repositories)
- ETL read one or more streams and project them to indexable read models...
- Which read models are called "PStates" and represent nested combinations of indices like hashtables, b-trees, linked lists and so on. The point of those being they have the data in fast to query way.
- And you have query engine which splits a query into 1+ index sub-queries and then aggregates.
Am I missing something, this seems relatively standard event-sourced / CQRS-like architecture, but streamlined to avoid redundancy and reimplementation of common abstractions.
It would've helped if the terms were less obscure than "depots" and "PStates".
Individually, none of these concepts are new. I’m sure you’ve seen them all before. You may be tempted to dismiss Rama’s programming model as just a combination of event sourcing and materialized views. But what Rama does is integrate and generalize these concepts to such an extent that you can build entire backends end-to-end without any of the impedance mismatches or complexity that characterize and overwhelm existing systems.
You have the general model correct, but here are a few clarifications:
- PStates are partitioned, durable, replicated indexes that are represented as arbitrary combinations of data structures. A PState can be as simple an an integer per partition, or it can be complex like a map of lists of maps of sets. PStates allow you to shape your indexes to perfectly match your application's use cases.
- I wouldn't call Rama queries an "engine", as it's considerably more straightforward in how it works than something like SQL. The base query API is called "paths", which are an imperative way to concisely reach into one partition of one PState to fetch or aggregate values. There's also "query topologies" which are predefined, on-demand distributed computations that can fetch and aggregate data from many partitions of many PStates.
Thanks, I will read more soon! I'm curious... how do you resolve the "impedance mismatch" between some "canonical" models that business decisions are made, based upon, which need to be synchronous with the depots (and mutually synchronous with other models sharing fragments of the same data), and the eventually consistent read models, which have a more lax constraint on how up to date they are?
How do you ensure consistency here? How do you organize it in the data flow?
Say I update a user, because that user seems to still be there in the query result/indexes, but actually an event for this user being deleted has happened some time ago?
This can also happen I suppose of the depots run queries themselves on PState in order to determine if a certain event is valid at all or not, and how exactly to carry it out.
The impedance mismatches you're used to from using databases are gone because:
- You can finely tune your indexes to be exactly the optimal shape for your application (data structure). You can see this in our Mastodon implementation with the big variety of data structures we used for all the use cases.
- You're generally just using regular Java objects everywhere: appending to depots, during ETL processing, and stored in indexes.
How you coordinate data creation with view updates is a deeper topic, so I'll just summarize one of the basic mechanisms Rama provides for coordinating this. Depot appends can have an "ack level" that determines the conditions before Rama tells you that depot append has completed. The default level is "full ack" which includes all streaming topologies colocated with that depot fully processing that record. With this level, when the depot append completes you know that all associated indexes (PStates) have been updated.
There's also "append ack", which only waits for the depot append to be replicated on the depot, and "no ack", which is fire and forget. These all have their uses depending the specific needs of an application.
Thanks! So we can see these ACKs as "wait and synchronize" signals I suppose? However how can we ensure an "all or nothing" between all parties trying to ACK a conditions they're mutually dependent on? I.e. transactionality or atomicity?
Systems that promise "free linear scaling" without qualifiers either withhold or have not analyzed/realized their bottlenecks yet. Say if there is eventual consistency maybe the "eventuality" becomes so long that the service fails at its purpose. Or the communication link bandwidth is exhausted between key business logic (mutation event generating) services, and so on.
The only systems that scale linearly are stateless systems. Mastodon is not stateless. And even stateless systems hit some bottlenecks eventually, as they exist and run in a scale-variant Universe.
So this claim by itself doesn't immediately impress me, just turns my red lights on, awaiting further investigation. But we can of course discuss why this claim is made and how is it supported. The article is long so I've not had the chance to read it entirely yet.
But we have X number of event streams mapped through Y number of ETLs to produce Z number of read model indices, in a shape that seems to form a highly interlinked DAG, which eventually loops back on itself in terms of message flow. Just the increased cross-chatter here as we introduce more features suggests non-linear scaling.
for example it can scale the way persistent data structures scale, which is to say "O(1) within target operational bounds" despite technically being log-n with high branch factor)
What one finds useful from a web application and what the web application actually is are usually two entirely different things.
I work in marketing automation, and I guess I have in one way or another my entire career. The clients who need to use the platform to communicate with their own clients over social networking may never touch our print delivery system, but that doesn't mean that print delivery doesn't exist or isn't important.
If you are unwilling to recreate the totality of the application in terms of functionality, then you are lying if you say that you have recreated it.
I wish I didn't see this comparison, which is not fair at all. Everyone in their right mind understands that the number of features is much less, that's why you have 10k lines.
Add large-scale distributed live video support at the top of that, and you won't get any close to 10k lines. It's only one of many many examples. I really wish you compare Mastodon to Twitter 0.1 and don't do false advertising
> 100M bots posting 3,500 times per second... to demonstrate its scale
I'm wondering why 100M bots post only 3500 times per second? Is it 3500 per second for each bot? Seems like it's not, since https termination will consume the most of resources in this case. So I'm afraid it's just not enough.
When I worked in Statuspage, we had support of 50-100k requests per second, because this is how it works - you have spikes, and traffic which is not evenly distributed. TBH, if it's only 3500 per second total, then I have to admit it is not enough.