I think this is referring to the content-hashed user posts. Using this model one...

vidarh · on May 10, 2023

ActivityStreams just requires an object to have a unique URI. ActivityPub says it "should" be a https URI. However, since this URI is expected to be both unique and unchanging (if you put up the same content with a different id, it's a different object), you can choose to use it as the input to a hash function and put the posts in a content-addressable store.

Mastodon will already check the local server first if you paste the URL of a post from another server in the Mastodon search bar, so it's already sort-of doing this, but only with the local server.

So you can already do that with ActivityPub. If it becomes a need, people will consider it. There's already been more than one discussion about variations over this.

(EDIT: More so than an improvement on scaling this would be helpful in ensuring there's a mechanism for posts to still be reachable by looking up their original URI after a user moves off a - possibly closing - server, though)

The Fediverse also uses signatures, though they're not always passed on - fixing that (ensuring the JSON-LD signature is always carried with the post) would be nice.

aboodman · on May 10, 2023

one important difference is:

> expected

Because the URI is only _expected_ to be immutable, not required, servers consuming these objects need to consider the case where the expectation is broken.

For example, imagine the serving host has a bug and returns the wrong content for a URI. At scale this is guaranteed to happen. Because it can happen, downstream servers need to consider this case and build infrastructure to periodically revalidate content. This then propagates into the entire system. For example, any caching layer also needs to be aware that the content isn't actually immutable.

With content hashes such a thing is just impossible. The data self-validates. If the hash matches, the data is valid, and it doesn't matter where you got it from. Data can be trivially propagated through the network.

vidarh · on May 10, 2023

The URI is expected to be immutable. The URI can be used as a key. Whether the object is depends on the type of object. A hash over the content can not directly be used that way, but it can e.g be used to derive an original URI in a way that allows for predictable lookups without necessarily having access to the origin server.

Posts are explicitly not immutable, so they do need to be revalidated, and that's fine.

For a social network immutable content is a bad thing. People want to be able to edit, and delete, for all kinds of legitimate reasons, and while you can't protect yourself against people keeping copies you can at least make the defaults better.

aboodman · on May 10, 2023

> Posts are explicitly not immutable, so they do need to be revalidated, and that's fine.

OK that's my point. In the AT protocol design the data backing posts is immutable. This makes sync, and especially caching a lot easier to make correct and robust because you never need to worry about revalidation at any level.

> People want to be able to edit, and delete

Immutable in this context just means the data blocks are immutable. You can still model logically mutable things, and implement edit/delete/whatever. Just like how Git does this.

vidarh · on May 10, 2023

But to model mutable things over and immutable blocks you need to revalidate which blocks are still valid.

You need to know that the user expects you to now have a different view. That you're not mutating individual blocks but replacing them has little practical value.

It'd be nice to implement a mechanism that made it easier to validate whole collections of ActivityPub objects in one go, but that just requires adding hashes to collections so you don't need to validate individual objects. Nothing in ActivityPub precludes an implementation from adding an optional mechanism for doing that the same way e.g. Remote storage does (JSON-LD directories equivalent to the JSON-LD collections in ActivityPub, with Etags at collection level required to change if subordinate objects do).

aboodman · on May 10, 2023

> you need to revalidate which blocks are still valid.

No you don't. Sorry if I'm misunderstanding, but it sounds like maybe you don't have a clear idea of how systems like git work. One of their core advantages is what we're talking about here -- that they make replication so much simpler.

When you pull from a git remote you ask the remote what the root hash is, then you fetch all the chunks reachable from that hash which you don't yet have. If the remote says you need the chunk with hash X, and you have a chunk with hash X, then you have the data. You don't have to worry if it has changed. Once you have all the chunks reachable from the latest head, you have the latest state of the entire repository. That's it.

(I mean simple in the sense of clear/direct/correct, not in the sense of "easy". It's certainly the case that a design based on consuming a stream of change events is a lot less code).

vidarh · on May 10, 2023

> When you pull from a git remote you ask the remote what the root hash is, then you fetch all the chunks reachable from that hash which you don't yet have. If the remote says you need the chunk with hash X, and you have a chunk with hash X, then you have the data. You don't have to worry if it has changed. Once you have all the chunks reachable from the latest head, you have the latest state of the entire repository. That's it.

Yes, I know how Merkle trees work, what it allows you to do. In other words you use the hash to validate which blocks are still valid/applicable. Just as I said, you need to revalidate. In this context (a single user updating a collection that has an authoritative location at any given point in time) it effectively just serves a shortcut to to prune the tree of what you need to consider re-retrieving in this context.

It is also exactly why I pointed at RemoteStorage, which models the same thing with a tree of etags, rooted in the current state of a given directory to provide the same shortcut. RemoteStorage does not require them to be hashes from a Merkle tree, as long as they are guaranteed to update if any contained object updates (you could e.g. keep a database of version numbers if you want to, as long as you propagate changes up the tree), but it's easy to model as a Merkle tree. Since RemoteStorage also uses JSON-LD as a means to provide directories of objects, it provides a "ready lift" model for a minimally invasive way of transparently adding it to an ActivityPub implementation in a backwards compatible way.

(In fact, I'm toying with the idea of writing an ActivityPub implementation that also supports RemoteStorage, in which case you'd get that entirely for "free").

> (I mean simple in the sense of clear/direct/correct, not in the sense of "easy". It's certainly the case that a design based on consuming a stream of change events is a lot less code).

That is, if anything, poorly fleshed out in ActivityPub. In effect you want to revalidate incoming changes with the origin server unless you have agreed some (non-standard) authentication method, so really that part could be simplified to a notification that there has been a change. If you layer a merkle like hash on top of the collections you could batch those notifications further. If we ever get to the point where scaling ActivityPub becomes hard, then a combination of those two would be an easy update to add (just add a new activity type that carries a list of actor urls and hashes of the highest root to check for updates).

goodpoint · on May 10, 2023

> Using this model one can pull content from _anywhere_ without having to worry about MITM forgeries etc

Does that makes it more difficult to implement the right to be forgotten and block spam and trolls?

XorNot · on May 10, 2023

Doesn't it make it easier? A list of hashes which should be blacklisted means servers obeying rulings are never at risk of returning that data (this would also work offensively: poll for forbidden hashes and see who responds).

goodpoint · on May 10, 2023

...and now you have to track which instance is authorize to block which hash, creating a lot of extra complexity. Plus, we need to trust all instances to really delete stuff.

It makes life really easy for spammers.

XorNot · on May 10, 2023

[flagged]

dang · on May 11, 2023

Please edit out swipes from your HN comments, as the guidelines ask: https://news.ycombinator.com/newsguidelines.html.

Your comment would be fine without that first bit.

goodpoint · on May 11, 2023

I explained the point twice.