Hacker News new | past | comments | ask | show | jobs | submit login
Braid: Synchronization for HTTP (braid.news)
196 points by tobr 62 days ago | hide | past | web | favorite | 47 comments



Reading the RFC, this seems to make HTTP stateful:

"A subscription is different from a GET connection (e.g. a TCP connection, or HTTP/2 stream). If a client requests "Subscribe: keep-alive", then the subscription will be remembered even after the GET connection closes."

It also uses PUT to patch things, but then why not use PATCH ? You could say it's non standard, but at this point, everybody uses it. Plus their stuff apparently deviates from the HTTP standard as well by allowing headers after the first line of the payload and introducing the forGET command.

Or am I missing something ?

Interesting anyway, especially the way they create of graph of unique ID to solve the ordering problem. Feels like Git. The flow of patch reminds me of react-redux.

Plus it's great to have more geeky technical things on HN. I'm missing it, among all the news and start up things.

They also mention a really cool RFC on a JSON patch format: https://datatracker.ietf.org/doc/html/rfc6902

I'm reading RFC and getting excited about it. Damn, that must be what they call growing old.


I agree, this RFC seems to propose a great deal of different additions/changes to the HTTP protocol without a unifying reason why. But the committee/process is all in the open, through the IETF, so now is the time to voice your concerns!

> Reading the RFC, this seems to make HTTP stateful

The reasoning behind this seems to be that the alternatives right now are either polling or a homegrown websocket based protocol. The authors argue that rather than continuing to let different implementors make bespoke or otherwise non-standard polling/subscription based websocket implementations, to standardize on a RESTful way to subscribe to a stream of updates.

> It also uses PUT to patch things, but then why not use PATCH ? You could say it's non standard, but at this point, everybody uses it. Plus their stuff is apparently deviate from the original standard as well by allowing headers after the first line of the payload and introducing the forGET command.

I've personally seen both PUT and PATCH used for updates, so this didn't seem particularly odd to me, and seemed more like a nod to one particular method of updates.

But yes, the overall RFC is proposing to modify HTTP from a state transfer protocol to a state synchronization protocol, so that clients and servers have to hold onto state and send diffs between their internal states, and then implement some reconciliation algorithm (e.g. OTs or CRDTs) to merge state. This seems like it is in response to a myriad of bespoke long-polling or websocket based update mechanisms.

> They also mention a really cool RFC on a JSON patch format: https://datatracker.ietf.org/doc/html/rfc6902

Yup! Super cool right?

> I'm reading RFC and getting excited about it. Damn, that must be what they call growing old.

Well, you could do that, or you could read yet another article about startup financials, which is all that seems to trend on HN these days ;)


> this RFC seems to propose a great deal of different additions/changes to the HTTP protocol without a unifying reason why

To read more of the unifying why, I suggest checking out the original Braid draft from last July: https://datatracker.ietf.org/doc/html/draft-toomim-braid-00. We had a much longer introduction in that version, but shortened it in braid-http-01 so that we could cut to the meat.

If the why still isn't clear after reading braid-00, please let us know on the mailing list: https://groups.google.com/forum/#!forum/braid-http. And on the other hand, if something in braid-00 helped, we'd also love to hear what, so that we can add that back into the braid-http-01 draft.


I really found these lines in braid-00

> Braid is a proposal for a new version of HTTP that transforms it from a state transfer protocol into a state synchronization protocol. Braid puts the power of Operational Transform and CRDTs onto the web, improving network performance and robustness, and enabling peer-to-peer web applications.

> At the same time, Braid creates an open standard for the dynamic internal state of websites. Programmers can access state uniformly, whether local or on another website. This creates a separation of UI from State, and allows any user to edit or choose their own UI for any website's state.

to be very useful to set the overall motivation. If I could humbly request, I think having them in subsequent drafts would be very nice! Thanks for the reply and thanks for all the good work!


Will do! Thank you very much!


I really like the RFC.

Two questions:

- do you think it would make sense to allow the "Patches" header only in PATCH and not PUT?

- how do you feel about a generic "subscribe" mechanism that is not specific to sync but can just say "I'm interested in this topic, with this params" ? Then this could be specialized with the version+parents params to get sync. This way we get a generic HTTP standard for PUB/SUB, which is badly needed, and is a very basic primitive you can build many things on. It would be a shame to have to add one after the fact.


Thank you very much!

Does EventSource meet your need for a generic subscribe mechanism? https://developer.mozilla.org/en-US/docs/Web/API/EventSource

I'm also curious what use-cases you have for a generic pub/sub beyond synchronization. In my experience, 95% of pub/sub implementations are used as a substrate for synchronization.

As for using PATCH vs PUT, this is an open question. I'd love to see more discussion of it on the mailing list: ietf-http-wg@w3.org. There are a number of pros and cons on both sides.


EventSource is only from the server to the client. Pub/sub goes in any direction: client to client, client to server, etc.

Crossbar.io provides that with websocket.

Eventsource + the sync rfc would mean everybody would build a non standard bridge for this to link a publication, a sync from the server and an event source. And because we only need 2, and they are overlapping, several completly incompatible de facto solutions would emerge.

Pub/sub is useful for any kind of communication that doesn't involve a resource: notifications of events , communication between microservices, etc. Basically anything that doesn't have a need for an history.

Of course, you can always create abstract conceptual resources you sync with to obtain this effect. E.G: the "streaming service is down event" could be a sub to a "service/streaming/heartbeat" event that you sync, and ignore versions and give no parents. It's just a bit twisted.

It feels more natural to have a pub/sub primitive that goes back and forth in any direction, and build the specific case for sync with that. Even if sync is 90% of the time what you want.


> "I'm interested in this topic, with this params"

Couldn't you just do this with URIs? Like if you "GET /topic/foo" with a Subscribe header you get the foo topic, and "GET /topic/bar" gets the bar topic?


Yes, the question is whether to make it a standard.


> The reasoning behind this seems to be that the alternatives right now are either polling or a homegrown websocket based protocol.

Right. Same with routed RPC and routed PUB/SUB. I currently use the WAMP (protocol, not stack) for that, as it is a Webocket based IANA standard, but with http2/3, having the HTTP version of it would make sense to have this baked in as well.

In fact, their subscribe thingy should probably be more generic and accept a topic and optional params. Params could be then use to upgrade the use case, like version+parents for sync. Plus client may want to push events as well.

I don't know how one can contribute to the debate. Would you explain it to me?

> I've personally seen both PUT and PATCH used for updates, so this didn't seem particularly odd to me, and seemed more like a nod to one particular method of updates.

PUT in REST is for sending the full object, PATCH is for partial updates. So both are valid, but I would allow the "Patches" headers for the PATCH method only.


> I don't know how one can contribute to the debate. Would you explain it to me?

The official venue for IETF discussions is the mailing lists. Braid is being discussed in the HTTP Working Group, on this mailing list: ietf-http-wg@w3.org. I suggest drafting an email saying something like:

"Hi, I learned about the Braid spec on Hacker News. I am interested in it for reasons X, Y, and Z. Overall, my thoughts on the spec are W. However, the current draft allows sending patches in PUT, POST, and PATCH, and I would only allow the Patches header in the PATCH method. Does anyone have thoughts on this?"

Thank you! I'd love to have your thoughts contribute to the consensus process.


> this seems to make HTTP stateful

Yes -- depending on how you look at it. HTTP already has cookies to implement stateful sessions. Braid does add subscriptions; but HTTP 1.1 and 2 also has keep-alive at the TCP level: https://en.wikipedia.org/wiki/HTTP_persistent_connection

> It also uses PUT to patch things, but then why not use PATCH?

You can use PATCH, or even POST, too. We haven't settled on a single best method to use, and in fact you'll notice different methods being used in different specs, depending on the personal preferences of the author that wrote that spec. :) I recommend checking out the Range Patch spec for more: https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-r.... You'll see a lot of PATCH in there. :)

Also, please feel free to start a discussion on the mailing list about this: ietf-http-wg@w3.org It's an issue that needs discussion and consensus more generally in HTTP.


> You could say it's non standard

Isn't PATCH standard?

What is non-standard about it?


PATCH became a standard in 2010 (https://tools.ietf.org/html/rfc5789), I just never patched my brain with this new revision and had stale data.


9 year max-age; impressive.


People don't use it and a lot of web frameworks don't have first class support for it. The main benefit is making the request more semantic. Idempotency etc. isn't always that useful on the protocol level so GET & POST is more than sufficient for most purposes.


I wrote 2 implementations for ETag, one using plain text files and one using a database file:

https://cup.github.io/autumn/talk-conditional-request

My implementations are in PHP, but they could easily be adapted to other languages. I was surprised when I found that cURL really doesnt have support for this. Yeah, you can do something like this:

    $ curl -I -H 'If-None-Match: "109-55035a2e5a100"' \
    > speedtest.lax.hivelocity.net
    HTTP/1.1 304 Not Modified
but its of limited usefulness. What you need is a cache storing all the requests youve made, so that the next time you make a request the cache can be checked. Without the cache its pointless. Also a problem is that some sites only return Last-Modified, not ETag:

    $ curl -I https://en.wikipedia.org/wiki/Main_Page
    last-modified: Sun, 03 Nov 2019 20:12:16 GMT
and some dont return either:

https://www.google.com


Can someone with a mathematical background explain what the practical uses of this protocol are? It talks about time travel and state synchronization v/s state transfer. Is this an effort to create a protocol that e.g. allows for distributed systems over http instead of relying on "custom" implementations such as Paxos? (i'm pretty sure I'm misunderstanding this, please feel free to correct)


Yes, the braid protocol could let you run a distributed system over HTTP. It lets you put a CRDT or OT behind any HTTP resource, which lets you distribute the resource, with multiple writers, and guarantee consistency after arbitrary edits. Each resource will still have a URL, with one particular hostname, but the actual state can live and be modified simultaneously on multiple hosts.

The first thing you might use this for is collaborative editing. Braid can give you the power of Google Docs at any HTTP URL, without writing additional code.

This also improves performance. Instead of using heuristics to determine when a cache needs to be reload (cache-control max-age, last-modified, etags), Braid will automatically push all updates to caches -- guaranteed. That means you never need to force-reload a page, or force-clear a cache. Also, updates are sent as minimal diffs, rather than re-sending the entire resource whenever it changes. This saves a lot of bandwidth and a lot of round-trip latency when loading a page.

As a third practical example, Braid makes it very simple to read and write data from multiple web sites. By collapsing time, braid implementations let you write code that manipulates state at any URL as easily as a local variable. It doesn't matter whether state is located on your server, or someone else's server, or distributed on everyone's servers and clients. It's all equally easy to interact with.

This also makes it easy to write a new UI for an existing site.

As for Paxos, yes, CRDTs are an alternative to Paxos, and CRDTs can be used in Braid. CRDTs have some performance improvements over Paxos -- Paxos chooses a leader, and whenever the leader is unreachable, a new leader election takes place which requires a couple network round trips before any new edits can be broadcast. CRDTs are always editable, and edits can always be broadcast.


> Braid can give you the power of Google Docs at any HTTP URL, without writing additional code.

What implements the OT or CRDT logic? The webserver itself?


Yes, the server and client implement the logic. Each URL specifies its "Merge-Type" in a header, which defines the spec for the OT or CRDT that they implement.


Why should this be implemented in HTTP itself, rather than as an application using HTTP as a transport?


I suggest reading the introduction to either of these specs:

- https://datatracker.ietf.org/doc/html/draft-toomim-braid

- https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-b...

People do synchronize at the application level today. But every application invents its own non-standard synchronization method, which is incompatible with every other application. This results in each website only accessing its own state, rather than sharing state with other websites, and the web becomes a bunch of walled gardens.

In order to decentralize the web, we need a standard for the internal state of websites, that makes it easy for websites to re-use the state of other websites. Where the original web allows any website's pages to link to any other site's pages, Braid allows any site's internal state to synchronize with the internal state from any other site.

Now, you might ask why we should implement a standard at the HTTP level, rather than make a standard on top?

Well, it turns out that HTTP and REST are already designed for sharing state-- but they are just limited to state transfer rather than state synchronization. It is very natural to extend it to synchronization -- we can do it with just 5 new headers, 1 new response code, 2 range units, and 1 new registry.

And if you try to build something on top, you'll have to re-implement all the great things that HTTP has invented that we now take for granted: caching, CDNs, idempotency, media types, etc. By putting synchronization into HTTP, we can add these features to the existing web. Caches (like CDNs) can suddenly support dynamic content -- not just static content. If you change a line of code in your Javascript, all clients will update with just a diff, rather than re-requesting the entire file. The reload button becomes obsolete. Existing HTTP network traffic becomes more efficient. Every TEXTAREA can become a collaborative editor.


> In order to decentralize the web, we need a standard for the internal state of websites, that makes it easy for websites to re-use the state of other websites.

This is not coherent.


To say more, state that can be modeled as an e.g. CRDT is always domain bounded, and almost always subject to domain constraints like, at a minimum, authn/authz. Even if you can come up with a domain whose state makes sense to share between website boundaries, making agnostic intermediaries like HTTP servers state-aware enough to perform semantic merges necessarily strips that state of any notion of privacy. This logic doesn't make sense to have at the transport or data layer of a public network.


Your concern is that adding CRDT merge semantics somehow prevents a server from implementing access control? That's not the case.

The Braid spec does not impede access control — that works just like it always has on the web. A client logs into a server. If a client does a GET request, the server decides whether the client can see the result. If a client does a PUT request, the server decides whether to allow it. The only difference is that these GET and PUT requests can now be broken into granular patches with a version history.

And if you want to build a peer-to-peer network, then you will replace the server with a validation function running on each peer, and authentication with a crypto scheme. But we aren't at the point of trying to standardize that stuff yet.


> Your concern is that adding CRDT merge semantics somehow prevents a server from implementing access control?

My concern is that this requirement means the state can't be encrypted.

> if you want to build a peer-to-peer network, then you will replace the server with a validation function running on each peer, and authentication with a crypto scheme

How do you break a GET request of some state blob into "granular patches" if the state is encrypted?


> Why "Braid"?

> 1. It adds versioning—and time travel—to the web, just like the videogame Braid.

That video game was the first thing I thought of when I read the name. Cool that it is one of the actual reasons they chose to name this that.


"Braid is an effort to incorporate new distributed technologies into the existing World-Wide Web. We find consensus on extensions to today's web standards that support distributed web technologies. We work in the IETF's HTTP Working Group. You can join the effort.

The Braid Protocol is a set of extensions to HTTP, which transform it from a state transfer protocol into a state synchronization protocol. When a resource is changed by one client or server, all other clients and servers update. Braid supports Operational Transform and CRDTs at web URLs, enabling peer-to-peer, offline-capable web applications."


> When a resource is changed by one client or server, all other clients and servers update.

This reminds me a little bit of a project I started some years ago: http://liveresource.org/

It wasn't nearly as fancy as Braid. Just a way to get a URL's current content and then listen for changes. Kinda like Firebase, but for the web. Didn't get much traction though.


The team behind matrix.org had one prior to Matrix very similar to this too (albeit longpolling) called Glow. It was basically simple & stupid pubsub on top of HTTP: you could GET an arbitrary url, and whenever anyone PUT stuff within that url tree your GET would return. It worked well enough to build a pretty massive instant messaging platform on top of it, but the lack of schema and lack of intelligent query language got a bit frustrating. Some of the ideas made it into Matrix though.

Braid looks cool; we've hoped someone would layer OT or CRDT semantics on top of Matrix but it hasn't really happened yet (unless you count Matrix itself as a set of add-only monotonic DAG CRDTs, which I guess it is). Eitherway, perhaps going in at a lower level like Braid has legs; time will tell :)


This is very interesting! Thank you for posting it. You have clearly thought the problem through, and we'd love to get you more involved in the consensus process.


Would it be correct to say that this is a CRDT protocol on the HTTP level, similar abstraction level to e.g. REST?


Yes. One way to look at this is that HTTP is already very close to a CRDT or OT protocol -- it just needs a few new features.

By adding those features into HTTP, we generalize HTTP and REST from being able to simply transfer state to being able to synchronize it, across arbitrary arbitrary edits, from multiple writers.

    HTTP: HyperText *Transfer* Protocol
    REST: REpresentational State *Transfer*

    HTSP: HyperText *Synchronization* Protocol
    RESS: REpresentational State *Synchronization*


So stuff like PouchDB, Gunjs will now be trivial?


These are databases that support synchronization. They have to design their own custom protocol, because HTTP (without Braid) does not support synchronization.

We want to add Braid support to PouchDB and Gunjs. Then they can interoperate, with one another, and with the rest of the web. You'll be able to build a distributed app that stores some data in Gunjs, and some in PouchDB, on different servers, on different websites.

The differences between different synchronizing databases are captured in "Merge Types": https://raw.githubusercontent.com/braid-work/braid-spec/mast...

Over time, I imagine that these databases will add support for each other's merge types, and then -- yes -- their abilities will be "trivial", and baked into most URLs of the web.


This looks very interesting indeed. I am currently working on a system that allows hierarchical system modeling and evaluation of so-called "observable process models". I can easily understand leveraging something like this over HTTP to reduce implementation details of model and state transfers.


What's an "observable process model"?


Oh, thank you for sharing! I would love to see an example of where you are dealing with model and state transfers, so that we can make sure that the protocol we are building supports it. Is there anything more you can say about this?


How does it compare with CouchDB Replication protocol - https://docs.couchdb.org/en/stable/replication/protocol.html?


Looks interesting, but where is the proof? Distributed consensus is HARD. Without a formal proof why should I trust this? TLA+ would be great, but I'd be happy with anything that demonstrated formal correctness.


Braid itself is just a neutral protocol— the proof you want applies to the particular CRDT or OT algorithm that you use with it.

For instance, you can use it with ShareDB, or Automerge. Both of these synchronizers are quite robust, and prove correctness with fuzz testing.

Links: https://github.com/automerge/automerge https://github.com/share/sharedb


What happens when you have OT and CRDT interoperating? They are different algorithms, so wouldn't they see different results?


The trick is that they only need to agree on how multiple simultaneous edits merge.

Each URL specifies a Merge-Type. If the algorithms implement it, they can merge conistently.


It seems like that's saying each client has to implement all the algorithms in use? So, no magic here, but a choice among standardized algorithms?


Almost -- you can actually have different algorithms that still merge the same way. See our interoperability demo here:

https://braid.news/demo/interoperate

This demo shows a CRDT and OT system interoperating. They use different algorithms, but merge (almost) the same way!

(I say "almost" because we aren't using the same sorting function to break ties when two people edit in the same location. But this could be fixed.)

In practice, you can certainly specify a merge-type in terms of an algorithm, by saying "this resource merges in the way that the Automerge algorithm merges." But you can also state it abstractly -- for instance, as we do here: https://braid.news/demo/interact#a-merge-type-defines-how-to...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: