Having done a decent bit of hacking around ActivityPub, when I read the (thin, w...

vidarh · on May 10, 2023

If they'd proposed (or even just implemented) improvements that at least suggested they'd considered existing options and wanted to try to maximise the ability to do interop (even with proxies), I'd have been more sympathetic. But AT to mean seems to be a big ball of Not Invented Here that makes me worry that either they didn't care and try, or that they choice to make interop worse for a non-technical reason.

detourdog · on May 10, 2023

During this stage of discovery I'm completely comfortable with ground up rethinks.

I don't feel we have the correct solution and there is no commercial reason to ge this thing shipped. Now is the time to explore all the possibilities.

Once we have explored the problem space we should graft the best bits together for a final solution, if needed.

I'm not sure I see the value of standardizing on a single protocol. Multiple protocols can access the same data store. Adopting one protocol doesn't preclude other protocols. I believe Developers should adopt all the protocols.

vidarh · on May 10, 2023

Ground up rethinks that takes into account whether or not there's an actual reason to make a change is good. Ground up rethinks that throws things away for the sake of throwing them away even what they end up doing would layer cleanly are not. They're at best lazy. At worst intentional attempts at diluting effort. I'm hoping they've only been lazy.

detourdog · on May 10, 2023

I'm not disagreeing. To say that there is only one way or to project presumed goals and intentions is too far for me.

I firmly believe that protocols are developed through vigorous rewrites and aren't nearly as important as the data-stores they provide access to. I would like our data-stores to be stagnant and as required we develop protocols. Figuring out a method to deal with whatever the hosted data-store's chosen protocol is seems correct to me. I just don't see mutual exclusivity. Consider the power of supporting both protocols.

rainonmoon · on May 10, 2023

> "oh, this is going to be way more scalable than ActivityPub once it's done."

Can you elaborate on this?

luckystarr · on May 10, 2023

I think this is referring to the content-hashed user posts. Using this model one can pull content from _anywhere_ without having to worry about MITM forgeries etc. This opens up the structure of the network, basically decentralizing it even _more_.

Correct me if I'm wrong on this though.

vidarh · on May 10, 2023

ActivityStreams just requires an object to have a unique URI. ActivityPub says it "should" be a https URI. However, since this URI is expected to be both unique and unchanging (if you put up the same content with a different id, it's a different object), you can choose to use it as the input to a hash function and put the posts in a content-addressable store.

Mastodon will already check the local server first if you paste the URL of a post from another server in the Mastodon search bar, so it's already sort-of doing this, but only with the local server.

So you can already do that with ActivityPub. If it becomes a need, people will consider it. There's already been more than one discussion about variations over this.

(EDIT: More so than an improvement on scaling this would be helpful in ensuring there's a mechanism for posts to still be reachable by looking up their original URI after a user moves off a - possibly closing - server, though)

The Fediverse also uses signatures, though they're not always passed on - fixing that (ensuring the JSON-LD signature is always carried with the post) would be nice.

aboodman · on May 10, 2023

one important difference is:

> expected

Because the URI is only _expected_ to be immutable, not required, servers consuming these objects need to consider the case where the expectation is broken.

For example, imagine the serving host has a bug and returns the wrong content for a URI. At scale this is guaranteed to happen. Because it can happen, downstream servers need to consider this case and build infrastructure to periodically revalidate content. This then propagates into the entire system. For example, any caching layer also needs to be aware that the content isn't actually immutable.

With content hashes such a thing is just impossible. The data self-validates. If the hash matches, the data is valid, and it doesn't matter where you got it from. Data can be trivially propagated through the network.

vidarh · on May 10, 2023

The URI is expected to be immutable. The URI can be used as a key. Whether the object is depends on the type of object. A hash over the content can not directly be used that way, but it can e.g be used to derive an original URI in a way that allows for predictable lookups without necessarily having access to the origin server.

Posts are explicitly not immutable, so they do need to be revalidated, and that's fine.

For a social network immutable content is a bad thing. People want to be able to edit, and delete, for all kinds of legitimate reasons, and while you can't protect yourself against people keeping copies you can at least make the defaults better.

aboodman · on May 10, 2023

> Posts are explicitly not immutable, so they do need to be revalidated, and that's fine.

OK that's my point. In the AT protocol design the data backing posts is immutable. This makes sync, and especially caching a lot easier to make correct and robust because you never need to worry about revalidation at any level.

> People want to be able to edit, and delete

Immutable in this context just means the data blocks are immutable. You can still model logically mutable things, and implement edit/delete/whatever. Just like how Git does this.

vidarh · on May 10, 2023

But to model mutable things over and immutable blocks you need to revalidate which blocks are still valid.

You need to know that the user expects you to now have a different view. That you're not mutating individual blocks but replacing them has little practical value.

It'd be nice to implement a mechanism that made it easier to validate whole collections of ActivityPub objects in one go, but that just requires adding hashes to collections so you don't need to validate individual objects. Nothing in ActivityPub precludes an implementation from adding an optional mechanism for doing that the same way e.g. Remote storage does (JSON-LD directories equivalent to the JSON-LD collections in ActivityPub, with Etags at collection level required to change if subordinate objects do).

aboodman · on May 10, 2023

> you need to revalidate which blocks are still valid.

No you don't. Sorry if I'm misunderstanding, but it sounds like maybe you don't have a clear idea of how systems like git work. One of their core advantages is what we're talking about here -- that they make replication so much simpler.

When you pull from a git remote you ask the remote what the root hash is, then you fetch all the chunks reachable from that hash which you don't yet have. If the remote says you need the chunk with hash X, and you have a chunk with hash X, then you have the data. You don't have to worry if it has changed. Once you have all the chunks reachable from the latest head, you have the latest state of the entire repository. That's it.

(I mean simple in the sense of clear/direct/correct, not in the sense of "easy". It's certainly the case that a design based on consuming a stream of change events is a lot less code).

vidarh · on May 10, 2023

> When you pull from a git remote you ask the remote what the root hash is, then you fetch all the chunks reachable from that hash which you don't yet have. If the remote says you need the chunk with hash X, and you have a chunk with hash X, then you have the data. You don't have to worry if it has changed. Once you have all the chunks reachable from the latest head, you have the latest state of the entire repository. That's it.

Yes, I know how Merkle trees work, what it allows you to do. In other words you use the hash to validate which blocks are still valid/applicable. Just as I said, you need to revalidate. In this context (a single user updating a collection that has an authoritative location at any given point in time) it effectively just serves a shortcut to to prune the tree of what you need to consider re-retrieving in this context.

It is also exactly why I pointed at RemoteStorage, which models the same thing with a tree of etags, rooted in the current state of a given directory to provide the same shortcut. RemoteStorage does not require them to be hashes from a Merkle tree, as long as they are guaranteed to update if any contained object updates (you could e.g. keep a database of version numbers if you want to, as long as you propagate changes up the tree), but it's easy to model as a Merkle tree. Since RemoteStorage also uses JSON-LD as a means to provide directories of objects, it provides a "ready lift" model for a minimally invasive way of transparently adding it to an ActivityPub implementation in a backwards compatible way.

(In fact, I'm toying with the idea of writing an ActivityPub implementation that also supports RemoteStorage, in which case you'd get that entirely for "free").

> (I mean simple in the sense of clear/direct/correct, not in the sense of "easy". It's certainly the case that a design based on consuming a stream of change events is a lot less code).

That is, if anything, poorly fleshed out in ActivityPub. In effect you want to revalidate incoming changes with the origin server unless you have agreed some (non-standard) authentication method, so really that part could be simplified to a notification that there has been a change. If you layer a merkle like hash on top of the collections you could batch those notifications further. If we ever get to the point where scaling ActivityPub becomes hard, then a combination of those two would be an easy update to add (just add a new activity type that carries a list of actor urls and hashes of the highest root to check for updates).

goodpoint · on May 10, 2023

> Using this model one can pull content from _anywhere_ without having to worry about MITM forgeries etc

Does that makes it more difficult to implement the right to be forgotten and block spam and trolls?

XorNot · on May 10, 2023

Doesn't it make it easier? A list of hashes which should be blacklisted means servers obeying rulings are never at risk of returning that data (this would also work offensively: poll for forbidden hashes and see who responds).

goodpoint · on May 10, 2023

...and now you have to track which instance is authorize to block which hash, creating a lot of extra complexity. Plus, we need to trust all instances to really delete stuff.

It makes life really easy for spammers.

XorNot · on May 10, 2023

[flagged]

dang · on May 11, 2023

Please edit out swipes from your HN comments, as the guidelines ask: https://news.ycombinator.com/newsguidelines.html.

Your comment would be fine without that first bit.

goodpoint · on May 11, 2023

I explained the point twice.