*>I suspect that the cost of running AT proto servers/relays is prohibitive for ...

ttiurani · 2025-08-31T09:51:08 1756633868

> You can run one (and it costs about $300/mo to run a Bluesky AppView ingesting all data currently on the network in real time if you want to do that).

A clarifying question: the blog post [0] I found about zeppelin.social which I think is a full AppView, the author said this:

"The cost to run this is about US $200/mo, primarily due to the 16 terabytes of storage it currrently uses"

Last I heard the amount of storage was just a couple of terabytes so the growth seems to be very fast.

If and when the primary cost is the storage, IMO the crucial question is: what's the expected future cost of running community AppViews?

Because unless storage cost drops as fast as the BlueSky data grows (unlikely?), to me this architecture looks like it will very soon kick out smaller players and leave only BlueSky with enough money to keep the AppView running.

[0] https://whtwnd.com/futur.blue/3ls7sbvpsqc2w

danabramov · 2025-08-31T09:59:44 1756634384

I can’t speak to how fast it grows and what it was, but I mean — if what you want is to keep the entire data of the network (similar to having all tweets on Twitter) ready to be queried then you have to store them. That’s just unavoidable in any technological solution. Alternatively you could hydrate and query posts on-demand from their sources (PDS), and people have done that as an experiment, but you need at least some aggregation to happen somewhere (for reconstructing reply lists or like lists etc). If more collectively-run network caches are available, this becomes more feasible without storing everything yourself.

In any case, if you’re okay with a partial snapshot of the network (eg all posts during some window or even more partial) then you can arbitrarily narrow that down. In Mastodon, having a “full” archive is downright impossible which is why we’re not talking about the same with regards to Mastodon. Whereas ATProto makes it possible, with the cost being the floor of what you’d expect the cost for storing data to be. How could it be better?

ttiurani · 2025-08-31T10:14:35 1756635275

> if what you want is to keep the entire data of the network (similar to having all tweets on Twitter) ready to be queried then you have to store them.

They need to be stored, but do they technically have to be stored by just one AppView? I get that it's a 100x easier to implement it like that, but I don't think a distributed search would've been technically impossible (although, granted, necessarily it would have had worse UX).

Choosing this feature and then implementing it like they did was a technical choice. Technical choices have consequences and this, I think, was the one which will prevent BlueSky from reaching any meaningful decentralization.

And saying "you can create an inferior UX with affordable costs" is not a real answer. Any meaningful decentralization IMO can only happen if it's affordable to create feature identical nodes. That can only happen if you refuse to implement features in ways that need centralization to scale.

danabramov · 2025-08-31T14:26:02 1756650362

I’m not sure what choice you’re referring to here. Yes, simply loading data from the database is the most straightforward solution, and that’s what Bluesky itself did for its AppView. That’s kind of the default model in general in web development — and has nothing to do with decentralization. If you were running a centralized Twitter, storing the amount that Twitter stores would cost you exactly the same.

On the contrary, ATProto adds flexibility here. There are community-run projects like https://constellation.microcosm.blue/ that let small application builders avoid that burden. Of course you don’t want to overwhelm those by building a massive app on top. But the point is that ATProto starts with equivalent baseline to what you’d pay running a centralized service, and then gives you room to play with distribution of costs, potentially going all the way down to directly querying PDS’s on-demand or something in between like community-maintained caches or even potential third-party app-agnostic aggregation services. Eg you could imagine AWS, Vercel or Cloudflare building “app platforms” in five years that let you cheaply query shared data.

As for creating “identical” nodes, I think you hit the nail on the head — that’s not what ATProto aims to do. The insight is that it’s not useful or feasible for everyone to run their own copy of Twitter. But that it’s possible for everyone with “proportional interest” to run a “proportionally complete” part, with some of the costs being amortizable and poolable across many users and apps (thanks to shared infrastructure) and always individually replaceable (to avoid lock-in). This is strictly better than centralized.

Vinnl · 2025-08-31T09:17:52 1756631872

> [1] Mastodon is adding a workaround for this with on-demand fetching, see https://news.ycombinator.com/item?id=45078133 for my questions about that; in any case, this is limited by what you can do on-demand in a pull-based decentralized system.

I'm not super up-to-date on Mastodon's/ActivityPub's workings, but aren't replies to a post pushed to the original poster's server? So wouldn't followers then be able to pull from that server at any time to get an always-up-to-date view of replies, at least theoretically? (With maybe posts from the last few seconds missing if the network's slow.)

(Asking because I've seen you claim that the architecture is inherently limited to never be able to achieve the "cohesive" experience.)

danabramov · 2025-08-31T10:02:22 1756634542

This only works for direct reply chains, right? It doesn’t provide a realtime view into all existing conversations that are happening on the post.

Imagine if, when you refreshed this HN page, only comment chains you’re already in would refresh timely. Yes, this would “work” to some extent, but it would clearly be a regression.

Additionally, going viral can overload your server due to this architecture. In ATProto this never happpens for self-fosters (of PDS) because the cost is amortized by AppView. (Same as in centralized products where the cost is on the backend.)

Vinnl · 2025-08-31T14:08:56 1756649336

Ah gotcha, yes, indirect replies would need to be forwarded or something.

(To be honest, I'm already surprised that Mastodon scaled as far as it did. I will say, if I had seen the state of the web's architecture 20 years ago today, I probably also would have claimed that it was inherently insecure and that there was no way to get it to be secure enough to scale to billions of users, so... I don't know, maybe people will keep finding duct tape solutions to make it work, worse-is-better-style.)

izacus · 2025-08-30T21:57:58 1756591078

That's a big post that doesn't really explain in what way can smaller players federate with ATProto or how the structure allows federation.

Reading through it, it just sounds like sharding/scaling for a centralized service that's meant to be owned and provided by a single entity.

danabramov · 2025-08-30T22:54:55 1756594495

>in what way can smaller players federate with ATProto or how the structure allows federation.

Each of the pieces I've described (PDS, Relay, AppView) implement the protocol specified at https://atproto.com/. Anything that acts as an ATProto PDS can be used as an ATProto PDS, anything that acts as an ATProto Relay can be used as an ATProto Relay, and so on. I'm not sure I understand the question so pardon the tautology.

The structure allows federation by design — a Relay will index any PDS that asks to be indexed; an AppView can choose the Relay it wants to get the data from (or skip a Relay completely and index PDS's directly); anyone can make their own AppView for an existing or a new app. That's how there are multiple AppViews (both for Bluesky app and for other ATProto apps) ingesting data via multiple Relays from many PDS's. There aren't many independent operators of each piece (especially outside of PDS self-hosting) but nothing is privileging Bluesky's infra.

Additionally, Bluesky's reference implementations of each piece are open source. So people run them the same way you would usually run software -- by putting it on a computer and exposing it to the internet. To run a custom PDS, you can either use the Docker container provided by Bluesky (https://github.com/bluesky-social/pds) or implement your own (e.g. https://github.com/blacksky-algorithms/rsky). Ditto for other pieces.

>Reading through it, it just sounds like sharding/scaling for a centralized service that's meant to be owned and provided by a single entity.

You're right in that the goal is to make it on par with centralized services in terms of UX and performance/scaling. However, it is decentralized.

The picture at the end of this article might help: https://atproto.com/articles/atproto-for-distsys-engineers

HexDecOctBin · 2025-08-30T23:50:19 1756597819

I think what they are asking is: if I run a my own BlueSky AppView, how do I integrate with the bluesky.app so that users signed in on my AppView can interact with users on the the main AppView and vice versa? This is how most of us think about federalisation.

danabramov · 2025-08-30T23:57:42 1756598262

That's already the default behavior. You don't need to do anything special for that to work, it's what ATProto is designed for. Everything is always operating in a shared space by design.

When people "post", their posts go to their PDS's, which means that every AppView ingests data generated by every other AppView by default. There is no way to tell who's using which AppView — in fact, you can log into any AppView and your profile will be there with all your posts.

OneDeuxTriSeiGo · 2025-08-31T09:45:46 1756633546

An appview is really just a service for hydrating content in the skeleton of posts that feeds return back to give to the user.

The only things they do other than feed hydration are track notifications, (optionally) provide a search engine, (optionally) provide a CDN, and (temporarily until E2EE rolls out) handle DMs.

So you can actually do things like the Red Dwarf [1] project which is a bluesky client without an appview. It's slower, you visibly notice request loading/pop-in, there's no notifications, and no search but it works with any other bluesky appview (since appviews are basically a lens into atproto rather than an independent service).

--------

If you wanted to run your own infrastructure, instead you'd probably want to run your own PDS. Running an appview has its benefits of course but the main way you "self host" is to run a PDS. That's fairly trivial and people have run them on all kinds of constrained hardware (including a literal jailbroken microwave if I remember correctly).

1. https://tangled.sh/@whey.party/red-dwarf

jauntywundrkind · 2025-08-31T02:01:17 1756605677

AppViews do things like track likes, or replies.

The AppView doesn't do that only for Bluesky data. It does it for any Personal Data Stores (user accounts with all their user data) that it knows about.

When you "interact" with users elsewhere, all you do is generate new records on your own PDS. You generate a "like" entry, or a reply, on your own PDS. It's your pds, all your stuff goes there. The AppView sees that and indexes it, attaches that like or that reply in the AppView to the post you're reacting to.

AgentME · 2025-08-30T23:02:29 1756594949

Other people can run AppViews too (that operate over the same or different data). There are just less people doing that than hosting Mastodon instances partly because it's much more expensive to, because some of the benefits of hosting a Mastodon instance can be obtained much cheaper through running a PDS server, and because AppViews doesn't serve the same exact social role that Mastodon instances do. (Mastodon has every instance be a semi-isolated community, so Mastodon instances are often made for the social purpose of running a semi-isolated community. Bluesky users expect a global timeline that's not partitioned by server instances, so it doesn't get many people running AppViews specifically for fostering semi-isolated communities. People on Bluesky who want to foster semi-isolated communities tend to use features like custom timelines to do so instead.)

EnglishMobster · 2025-08-31T04:57:49 1756616269

I think it can be explained a little better.

When you write a post, you save it to your PDS. Think of it like writing a blog. You're done, you hit submit, it shows up on a server somewhere. You can run your own server with your own data, or use someone else's. That's exactly how a PDS works; it is a storage server for your data.

The AppView is a way to index all the PDSes registered across the whole network. If your server is crawlable by the AppView, all your data shows up in the app. This is like if your blog is crawlable by Google, you show up in search results.

When you like a post, you commit a "like" record to your personal server. When the AppView displays likes, it looks at every indexed PDS and shows every like it can find for that post (simplifying a little for clarity). Each one of those likes might live on different servers, some of which is self-hosted.

Because you can run your own PDS, you can commit any data you want to it. You can even commit things that services may find distasteful. However, the AppView may refuse to serve this content to users; this is how content can be removed from the network and how users can be banned. The federation equivalent would be defederation, except it happens to singular accounts rather than entire instances.

If you disagree with the moderation policies run by Bluesky the company, that's when you can look into running an alternative AppView. This is similar to disagreeing with the admin of a particular Mastodon instance and moving to a different instance. Of course, as mentioned running an AppView is much more expensive, but that hasn't stopped folks from trying (I believe Blacksky is trying to run their own AppView that is fully independent of Bluesky).

To use an alternate AppView, you'd simply go to a different website. This website will index PDSes the same way that Bluesky does, but it may index them in a different way and include/exclude different content. The data is still there (nobody can reach into your PDS on your server and delete your data), but the AppView admins choose which content they wish to serve to people using their community, just as Mastodon admins choose who to federate with.

In this sense, it is indeed truly federated. The primitives are simply different; it's more granular than Mastodon.

You can write your own content to your own server and let it get indexed to any number of AppViews; you completely control your personal data and nobody can reach in and delete that data randomly as they don't own it - you do (at the cost of ~$1/month or a Raspberry Pi).

When you use the Bluesky service, you are seeing their view of the network and what they choose to index. You may disagree with this view, just as you may disagree with the admins of Mastodon.social etc. In that case, you can choose to use another AppView (such as deer.social or Blacksky) that adopts different policies. Since account information isn't stored on the AppView and it simply handles indexing and moderation, moving between AppViews is painless and no data needs to be transferred from one server to another - you simply use a different bookmark.

It could be that you disagree with all the current AppView admins. You can host your own, it's just expensive ($300/month, as mentioned). You can also tailor your AppView to index less content, which will of course limit the amount of data you consume and give you a partial view of the network, effectively defederating you from anything you do not wish to index.

But there's nothing stopping you from doing so!