>I suspect that the cost of running AT proto servers/relays is prohibitive for smaller players compared to a Mastadon server selectively syndicating with a few peers, but I say this with only a vague understanding of the internals of both of these ecosystems.
This isn't quite right. ATProto has a completely different "shape" so it's hard to make apples-to-apples comparison.
Roughly speaking, you can think of Mastodon as a bunch of little independently hosted copies of Twitter that "email" (loosely speaking) each other to propagate information that isn't on your server. So it's cheap to run a server for a bunch of friends but it's cut off from what's happening in the world. Your identity is tied to your server (that's your webapp), and when you want to follow someone on another server, your server essentially asks that other server to send stuff to yours. This means that by default your view of the network is extremely fragmented — replies, threads, like counts are all desynchronized and partial[1] depending on which server you're looking from and which information is being forwarded to it.
ATProto, on the other hand, is designed with a goal of actually being competitive with centralized services. This means that it's partitioned differently – it's not "many Twitters talking to each other" which is Mastodon's model. Instead, in ATProto, there is a separation of concerns: you have swappable hosting (your hosting is the source of truth for your data like posts, likes, follows, etc) and you have applications (which aggregate data from the former). This might remind you of traditional web: it's like every social media user posts JSON to "their own website" (i.e. hosting) while apps aggregate all that data, similar to how Google Reader might aggregate RSS. As a result, in ATProto, the default behavior is that everyone operates with a shared view of the world — you always see all replies, all comments, all likes are counted, etc. It's not partial by default.
With this difference in mind, "decentralizing" ATProto is sort of multidimensional. In Mastodon, the only primitive is an "instance" — i.e. an entire Twitter-like webapp you can host for your users. But in ATProto, there are multiple decentralized primitives:
- PDS (personal data hosting) is application-agnostic data store. Bluesky's implementation is open source (it uses sqlite database per user). There are also alternative implementations for the same protocol. Bluesky the company does operate the largest ones. However, running a PDS for yourself is extremely cheap (like maybe $1/mo?). It's basically just a structured KV JSON storage organized as a Merkle tree. A bit like Git hosting.
- AppViews are actual "application backends". Bluesky operates the bsky.app appview, i.e. what people know as the Bluesky app. Importantly, in ATProto, there is no reason for everyone to run their own AppView. You can run one (and it costs about $300/mo to run a Bluesky AppView ingesting all data currently on the network in real time if you want to do that). Of course, if you were happy with tradeoffs chosen by Mastodon (partial view of the network, you only see what your servers' users follow), you could run that for a lot cheaper — so that's why I'm saying it's not apples-to-apples. ATProto makes it easy to have an actually cohesive experience on the network but the costs are usually being compared with fragmented experience of Mastodon. ATProto can scale down to Mastodon-like UX (with Mastodon-like costs) but it's just not very appealing when you can have the real thing.
- Relays are things "in between" PDS's and AppViews. Essentially a Relay is just an optimization to avoid many-to-many connections between AppViews and PDS's. A Relay just rebroadcasts updates from all PDS's as a single stream (that AppViews can subscribe to). Running a Relay used to be expensive but it got a lot cheaper since "Sync 1.1" (when a change in protocol allowed Relays to be non-archiving). Now it costs about $30/mo to run a Relay.
So all in all, running PDSs and Relays is cheap. Running full AppViews is more expensive but there's simply no equivalent to that in the Mastodon world because Mastodon is always fragmented[1]. And running a partial AppView (comparable to Mastodon behavior) should be much, much cheaper — but also not very appealing so I don't know anyone who's actually doing that. (It would also require adding a bit of code to filter out the stuff you don't care about.)
[1] Mastodon is adding a workaround for this with on-demand fetching, see https://news.ycombinator.com/item?id=45078133 for my questions about that; in any case, this is limited by what you can do on-demand in a pull-based decentralized system.
> You can run one (and it costs about $300/mo to run a Bluesky AppView ingesting all data currently on the network in real time if you want to do that).
A clarifying question: the blog post [0] I found about zeppelin.social which I think is a full AppView, the author said this:
"The cost to run this is about US $200/mo, primarily due to the 16 terabytes of storage it currrently uses"
Last I heard the amount of storage was just a couple of terabytes so the growth seems to be very fast.
If and when the primary cost is the storage, IMO the crucial question is: what's the expected future cost of running community AppViews?
Because unless storage cost drops as fast as the BlueSky data grows (unlikely?), to me this architecture looks like it will very soon kick out smaller players and leave only BlueSky with enough money to keep the AppView running.
I can’t speak to how fast it grows and what it was, but I mean — if what you want is to keep the entire data of the network (similar to having all tweets on Twitter) ready to be queried then you have to store them. That’s just unavoidable in any technological solution. Alternatively you could hydrate and query posts on-demand from their sources (PDS), and people have done that as an experiment, but you need at least some aggregation to happen somewhere (for reconstructing reply lists or like lists etc). If more collectively-run network caches are available, this becomes more feasible without storing everything yourself.
In any case, if you’re okay with a partial snapshot of the network (eg all posts during some window or even more partial) then you can arbitrarily narrow that down. In Mastodon, having a “full” archive is downright impossible which is why we’re not talking about the same with regards to Mastodon. Whereas ATProto makes it possible, with the cost being the floor of what you’d expect the cost for storing data to be. How could it be better?
> if what you want is to keep the entire data of the network (similar to having all tweets on Twitter) ready to be queried then you have to store them.
They need to be stored, but do they technically have to be stored by just one AppView? I get that it's a 100x easier to implement it like that, but I don't think a distributed search would've been technically impossible (although, granted, necessarily it would have had worse UX).
Choosing this feature and then implementing it like they did was a technical choice. Technical choices have consequences and this, I think, was the one which will prevent BlueSky from reaching any meaningful decentralization.
And saying "you can create an inferior UX with affordable costs" is not a real answer. Any meaningful decentralization IMO can only happen if it's affordable to create feature identical nodes. That can only happen if you refuse to implement features in ways that need centralization to scale.
I’m not sure what choice you’re referring to here. Yes, simply loading data from the database is the most straightforward solution, and that’s what Bluesky itself did for its AppView. That’s kind of the default model in general in web development — and has nothing to do with decentralization. If you were running a centralized Twitter, storing the amount that Twitter stores would cost you exactly the same.
On the contrary, ATProto adds flexibility here. There are community-run projects like https://constellation.microcosm.blue/ that let small application builders avoid that burden. Of course you don’t want to overwhelm those by building a massive app on top. But the point is that ATProto starts with equivalent baseline to what you’d pay running a centralized service, and then gives you room to play with distribution of costs, potentially going all the way down to directly querying PDS’s on-demand or something in between like community-maintained caches or even potential third-party app-agnostic aggregation services. Eg you could imagine AWS, Vercel or Cloudflare building “app platforms” in five years that let you cheaply query shared data.
As for creating “identical” nodes, I think you hit the nail on the head — that’s not what ATProto aims to do. The insight is that it’s not useful or feasible for everyone to run their own copy of Twitter. But that it’s possible for everyone with “proportional interest” to run a “proportionally complete” part, with some of the costs being amortizable and poolable across many users and apps (thanks to shared infrastructure) and always individually replaceable (to avoid lock-in). This is strictly better than centralized.
> [1] Mastodon is adding a workaround for this with on-demand fetching, see https://news.ycombinator.com/item?id=45078133 for my questions about that; in any case, this is limited by what you can do on-demand in a pull-based decentralized system.
I'm not super up-to-date on Mastodon's/ActivityPub's workings, but aren't replies to a post pushed to the original poster's server? So wouldn't followers then be able to pull from that server at any time to get an always-up-to-date view of replies, at least theoretically? (With maybe posts from the last few seconds missing if the network's slow.)
(Asking because I've seen you claim that the architecture is inherently limited to never be able to achieve the "cohesive" experience.)
This only works for direct reply chains, right? It doesn’t provide a realtime view into all existing conversations that are happening on the post.
Imagine if, when you refreshed this HN page, only comment chains you’re already in would refresh timely. Yes, this would “work” to some extent, but it would clearly be a regression.
Additionally, going viral can overload your server due to this architecture. In ATProto this never happpens for self-fosters (of PDS) because the cost is amortized by AppView. (Same as in centralized products where the cost is on the backend.)
Ah gotcha, yes, indirect replies would need to be forwarded or something.
(To be honest, I'm already surprised that Mastodon scaled as far as it did. I will say, if I had seen the state of the web's architecture 20 years ago today, I probably also would have claimed that it was inherently insecure and that there was no way to get it to be secure enough to scale to billions of users, so... I don't know, maybe people will keep finding duct tape solutions to make it work, worse-is-better-style.)
>in what way can smaller players federate with ATProto or how the structure allows federation.
Each of the pieces I've described (PDS, Relay, AppView) implement the protocol specified at https://atproto.com/. Anything that acts as an ATProto PDS can be used as an ATProto PDS, anything that acts as an ATProto Relay can be used as an ATProto Relay, and so on. I'm not sure I understand the question so pardon the tautology.
The structure allows federation by design — a Relay will index any PDS that asks to be indexed; an AppView can choose the Relay it wants to get the data from (or skip a Relay completely and index PDS's directly); anyone can make their own AppView for an existing or a new app. That's how there are multiple AppViews (both for Bluesky app and for other ATProto apps) ingesting data via multiple Relays from many PDS's. There aren't many independent operators of each piece (especially outside of PDS self-hosting) but nothing is privileging Bluesky's infra.
Additionally, Bluesky's reference implementations of each piece are open source. So people run them the same way you would usually run software -- by putting it on a computer and exposing it to the internet. To run a custom PDS, you can either use the Docker container provided by Bluesky (https://github.com/bluesky-social/pds) or implement your own (e.g. https://github.com/blacksky-algorithms/rsky). Ditto for other pieces.
>Reading through it, it just sounds like sharding/scaling for a centralized service that's meant to be owned and provided by a single entity.
You're right in that the goal is to make it on par with centralized services in terms of UX and performance/scaling. However, it is decentralized.
I think what they are asking is: if I run a my own BlueSky AppView, how do I integrate with the bluesky.app so that users signed in on my AppView can interact with users on the the main AppView and vice versa? This is how most of us think about federalisation.
That's already the default behavior. You don't need to do anything special for that to work, it's what ATProto is designed for. Everything is always operating in a shared space by design.
When people "post", their posts go to their PDS's, which means that every AppView ingests data generated by every other AppView by default. There is no way to tell who's using which AppView — in fact, you can log into any AppView and your profile will be there with all your posts.
An appview is really just a service for hydrating content in the skeleton of posts that feeds return back to give to the user.
The only things they do other than feed hydration are track notifications, (optionally) provide a search engine, (optionally) provide a CDN, and (temporarily until E2EE rolls out) handle DMs.
So you can actually do things like the Red Dwarf [1] project which is a bluesky client without an appview. It's slower, you visibly notice request loading/pop-in, there's no notifications, and no search but it works with any other bluesky appview (since appviews are basically a lens into atproto rather than an independent service).
--------
If you wanted to run your own infrastructure, instead you'd probably want to run your own PDS. Running an appview has its benefits of course but the main way you "self host" is to run a PDS. That's fairly trivial and people have run them on all kinds of constrained hardware (including a literal jailbroken microwave if I remember correctly).
The AppView doesn't do that only for Bluesky data. It does it for any Personal Data Stores (user accounts with all their user data) that it knows about.
When you "interact" with users elsewhere, all you do is generate new records on your own PDS. You generate a "like" entry, or a reply, on your own PDS. It's your pds, all your stuff goes there. The AppView sees that and indexes it, attaches that like or that reply in the AppView to the post you're reacting to.
Other people can run AppViews too (that operate over the same or different data). There are just less people doing that than hosting Mastodon instances partly because it's much more expensive to, because some of the benefits of hosting a Mastodon instance can be obtained much cheaper through running a PDS server, and because AppViews doesn't serve the same exact social role that Mastodon instances do. (Mastodon has every instance be a semi-isolated community, so Mastodon instances are often made for the social purpose of running a semi-isolated community. Bluesky users expect a global timeline that's not partitioned by server instances, so it doesn't get many people running AppViews specifically for fostering semi-isolated communities. People on Bluesky who want to foster semi-isolated communities tend to use features like custom timelines to do so instead.)
When you write a post, you save it to your PDS. Think of it like writing a blog. You're done, you hit submit, it shows up on a server somewhere. You can run your own server with your own data, or use someone else's. That's exactly how a PDS works; it is a storage server for your data.
The AppView is a way to index all the PDSes registered across the whole network. If your server is crawlable by the AppView, all your data shows up in the app. This is like if your blog is crawlable by Google, you show up in search results.
When you like a post, you commit a "like" record to your personal server. When the AppView displays likes, it looks at every indexed PDS and shows every like it can find for that post (simplifying a little for clarity). Each one of those likes might live on different servers, some of which is self-hosted.
Because you can run your own PDS, you can commit any data you want to it. You can even commit things that services may find distasteful. However, the AppView may refuse to serve this content to users; this is how content can be removed from the network and how users can be banned. The federation equivalent would be defederation, except it happens to singular accounts rather than entire instances.
If you disagree with the moderation policies run by Bluesky the company, that's when you can look into running an alternative AppView. This is similar to disagreeing with the admin of a particular Mastodon instance and moving to a different instance. Of course, as mentioned running an AppView is much more expensive, but that hasn't stopped folks from trying (I believe Blacksky is trying to run their own AppView that is fully independent of Bluesky).
To use an alternate AppView, you'd simply go to a different website. This website will index PDSes the same way that Bluesky does, but it may index them in a different way and include/exclude different content. The data is still there (nobody can reach into your PDS on your server and delete your data), but the AppView admins choose which content they wish to serve to people using their community, just as Mastodon admins choose who to federate with.
In this sense, it is indeed truly federated. The primitives are simply different; it's more granular than Mastodon.
You can write your own content to your own server and let it get indexed to any number of AppViews; you completely control your personal data and nobody can reach in and delete that data randomly as they don't own it - you do (at the cost of ~$1/month or a Raspberry Pi).
When you use the Bluesky service, you are seeing their view of the network and what they choose to index. You may disagree with this view, just as you may disagree with the admins of Mastodon.social etc. In that case, you can choose to use another AppView (such as deer.social or Blacksky) that adopts different policies. Since account information isn't stored on the AppView and it simply handles indexing and moderation, moving between AppViews is painless and no data needs to be transferred from one server to another - you simply use a different bookmark.
It could be that you disagree with all the current AppView admins. You can host your own, it's just expensive ($300/month, as mentioned). You can also tailor your AppView to index less content, which will of course limit the amount of data you consume and give you a partial view of the network, effectively defederating you from anything you do not wish to index.
This isn't quite right. ATProto has a completely different "shape" so it's hard to make apples-to-apples comparison.
Roughly speaking, you can think of Mastodon as a bunch of little independently hosted copies of Twitter that "email" (loosely speaking) each other to propagate information that isn't on your server. So it's cheap to run a server for a bunch of friends but it's cut off from what's happening in the world. Your identity is tied to your server (that's your webapp), and when you want to follow someone on another server, your server essentially asks that other server to send stuff to yours. This means that by default your view of the network is extremely fragmented — replies, threads, like counts are all desynchronized and partial[1] depending on which server you're looking from and which information is being forwarded to it.
ATProto, on the other hand, is designed with a goal of actually being competitive with centralized services. This means that it's partitioned differently – it's not "many Twitters talking to each other" which is Mastodon's model. Instead, in ATProto, there is a separation of concerns: you have swappable hosting (your hosting is the source of truth for your data like posts, likes, follows, etc) and you have applications (which aggregate data from the former). This might remind you of traditional web: it's like every social media user posts JSON to "their own website" (i.e. hosting) while apps aggregate all that data, similar to how Google Reader might aggregate RSS. As a result, in ATProto, the default behavior is that everyone operates with a shared view of the world — you always see all replies, all comments, all likes are counted, etc. It's not partial by default.
With this difference in mind, "decentralizing" ATProto is sort of multidimensional. In Mastodon, the only primitive is an "instance" — i.e. an entire Twitter-like webapp you can host for your users. But in ATProto, there are multiple decentralized primitives:
- PDS (personal data hosting) is application-agnostic data store. Bluesky's implementation is open source (it uses sqlite database per user). There are also alternative implementations for the same protocol. Bluesky the company does operate the largest ones. However, running a PDS for yourself is extremely cheap (like maybe $1/mo?). It's basically just a structured KV JSON storage organized as a Merkle tree. A bit like Git hosting.
- AppViews are actual "application backends". Bluesky operates the bsky.app appview, i.e. what people know as the Bluesky app. Importantly, in ATProto, there is no reason for everyone to run their own AppView. You can run one (and it costs about $300/mo to run a Bluesky AppView ingesting all data currently on the network in real time if you want to do that). Of course, if you were happy with tradeoffs chosen by Mastodon (partial view of the network, you only see what your servers' users follow), you could run that for a lot cheaper — so that's why I'm saying it's not apples-to-apples. ATProto makes it easy to have an actually cohesive experience on the network but the costs are usually being compared with fragmented experience of Mastodon. ATProto can scale down to Mastodon-like UX (with Mastodon-like costs) but it's just not very appealing when you can have the real thing.
- Relays are things "in between" PDS's and AppViews. Essentially a Relay is just an optimization to avoid many-to-many connections between AppViews and PDS's. A Relay just rebroadcasts updates from all PDS's as a single stream (that AppViews can subscribe to). Running a Relay used to be expensive but it got a lot cheaper since "Sync 1.1" (when a change in protocol allowed Relays to be non-archiving). Now it costs about $30/mo to run a Relay.
So all in all, running PDSs and Relays is cheap. Running full AppViews is more expensive but there's simply no equivalent to that in the Mastodon world because Mastodon is always fragmented[1]. And running a partial AppView (comparable to Mastodon behavior) should be much, much cheaper — but also not very appealing so I don't know anyone who's actually doing that. (It would also require adding a bit of code to filter out the stuff you don't care about.)
[1] Mastodon is adding a workaround for this with on-demand fetching, see https://news.ycombinator.com/item?id=45078133 for my questions about that; in any case, this is limited by what you can do on-demand in a pull-based decentralized system.