Hacker News new | comments | ask | show | jobs | submit login
WebSub: Open protocol for distributed pub–sub communication on the internet (w3.org)
354 points by dgellow 7 months ago | hide | past | web | favorite | 107 comments



"Hub deliver content to each subscriber with a POST"

It's quite common on some social networks to have several thousands/millions subscribers for some pages/communities/accounts.

Is it really wise to build a publish-subscribe delivery system on top of HTTP? This seems to be a huge overhead.

In the meantime XMPP is already offering similar features (XEP-0060: Publish-Subscribe https://xmpp.org/extensions/xep-0060.html) for more than 10 years. It's implemented in several servers and can handle huge loads without problems (everything is handled in real-time through encrypted TCP sockets accross the network).

We are building social networks on top of XMPP for several years now, you can check Movim (https://movim.eu) and Salut à Toi (https://salut-a-toi.org/) :)


XMPP is an open TCP connection with its own protocol, like MQTT, Stomp, Redis, NATS, and all the *MQs. WebSub is mostly between HTTP servers, where one server can tell another that it wants a specific HTTP URL to be GET’d on a message. Each message requires a new TCP connection. So it will not handle very many messages - but I don’t think that’s the point.


There's no reason to open a new TCP connection for each HTTP request; multiple request per connection have been supported since HTTP 1.0, and since 1.1 it's even the default.


...and then there's even multiplexing in HTTP 2 (multiple http requests at the same time on the same connection)


Is it really wise to build a publish-subscribe delivery system on top of HTTP?

I'm sure there is some use-case where this makes sense, but I agree with you. Probably most people wanting to do large scale pubsub should just be using XMPP, or possibly something like MQTT.


Azure Event Grid already provides "reliable event delivery at massive scale" using HTTP based pub sub: https://azure.microsoft.com/en-us/services/event-grid/


Why is this even supporting HTTP 1.1? And why HTTP at all and not just HTTPS? Encryption should be the default on all protocols going forward. No exceptions.

And I agree, it should use something like a Noise protocol even instead of HTTPS.

http://noiseprotocol.org/

https://github.com/noiseprotocol/noise_spec/wiki/Noise-prope...


What's the overhead of HTTP vs XMPP? A few headers? Doesn't seem that different.


"a few headers" if 50% of your payload is headers, thats a big deal.

Mind you XMPP isn't all that efficient either. as its all based on XML.


The request and response headers for this HN page were 676 and 591 bytes, respectively, and the biggest ones (e.g. Content-Security-Policy, X-XSS-Protection) are specific to web browsers.

For the kind of content sent over WebSub (generally an Atom or RSS feed with one or more long messages), it's much less than 50%.


I'm inviting you to read this page https://xmpp.org/about/myths.html :)


I didn't say slow, I said efficient. Its hilariously verbose compared to a decent binary protocol. Failing that something based on protobuffers.

In the embedded world a JSON/XML parser eats a tonne of resources.

One could of course use it over satcomm as it says, but its hilariously expensive when you are paying by the byte. But, compared to a massive JSON goop with embedded pictures that twitter uses, its a paragon of speed.


XMPP uses decentralized architecture where communication is asynchronous. XMMP uses client-server model where client Do not talk directly to each other.

On the other hand HTTP is a simple protocol which is synchronous in nature.


> Is it really wise to build a publish-subscribe delivery system on top of HTTP? This seems to be a huge overhead.

I'm only just learning about WebSub tonight, but it looks like a lean, efficient, and fairly minimal protocol to me. What gives you the impression that there will be huge overhead - could you be more specific?

When new content is published to a topic in WebSub, it's delivered with an HTTP POST that will look something like this:

  POST / HTTP/1.1
  Host: foo.com
  Content-Type: application/x-www-form-urlencoded
  Content-Length: 13
  Link: <https://hub.example.com/>; rel="hub"
  Link: <http://example.com/feed>; rel="self"  

  say=Hi&to=Mom
If content is being published to a topic at high volume, then the HTTP connection for each subscription will remain open persistently, meaning that you pay the cost to establish it only once when receiving the first message. (If there aren't enough messages to take advantage of a persistent connection, then efficiency probably doesn't matter that much for the use-case.)

Furthermore, it looks like these messages can be sent using HTTP/2, if client & service support it (which is something that you'd prioritize for cases where efficiency matters). HTTP/2 is a binary protocol and takes advantage of HPACK header compression (RFC 7541). This means that if the same header appears in multiple requests, it will be transmitted very efficiently. Thus WebSub headers that are likely to be the same for all requests across a connection (like Host, Content-Type, and Link) will be transmitted virtually for free.

Even the vanilla HTTP/1.1 request described above seems reasonable though -- certainly not something that strikes me as a cost or efficiency problem -- and the HTTP/2 framing of the content is probably going to be not much longer than the content payload.

Now let's compare to XMPP PubSub. From looking at XEP-0060, an item published over that protocol looks like the following - based on Example 101 in: https://xmpp.org/extensions/xep-0060.html#publisher-publish

  <message from='pubsub.shakespeare.lit' 
  to='francisco@denmark.lit' id='foo'>
    <event xmlns='http://jabber.org/protocol/pubsub#event'>
      <items node='princely_musings'>
        <item id='ae890ac52d0df67ed7cfdf51b644e901'>
          <entry xmlns='...'>
             say hi to mom ...
          </entry>
        </item>
      </items>
    </event>
  </message>
[Edit: changed from example 99 to 101, and elided the part of the content that was Atom-specific to more fairly compare the framings.]

Based on this naive comparison, I don't see a reason to conclude that WebSub will have more overhead than XMPP PubSub. When implemented over HTTP/2 it may be more efficient.


Here you are comparing publishing an Atom post to XMPP and publishing a simple message. It will more look like something like that (even if it's not valid).

  <iq type='set'
      to='pubsub.shakespeare.lit'
      id='publish1'>
    <pubsub xmlns='http://jabber.org/protocol/pubsub'>
      <publish node='mom'>
        <item id='bnd81g37d61f49fgn581'>
          <body>say hi to mom</body>
        </item>
      </publish>
    </pubsub>
  </iq>
But indeed if you want to start to put a bit more metadata in your first example (publication, edition date, id, summary, alternate link) well you'll quickly reach a similar structure (with the XML around).

That is also the power of Pubsub. It is that it gives you the freedom to put what you want in it (it can be Atom posts like in your example, but also stock market tickets that are pushed each 5 sec, some server monitoring logs...). You define your own namespace, write a little parser for it and use the thing into your XMPP Pubsub library :)


This is a proposal for a W3 standard. How many dependencies and upstream configurations do you need to realise XMPP in your data flow?


< "Is it really wise to build a publish-subscribe delivery system on top of HTTP?"

Seems to be working fine for SQS. It all depends on your use-case. For high volume messaging or certain types of messages you might reject WebSub for the same reasons you might reject SQS in favour of AMQP or MQTT etc.


SQS isn't really pub-sub. That'd be more Kinesis.


HTTP/2 presumably cuts out most of the overhead. Multiplexing sockets, header compression, etc.


Not to mention PubSubHubub...


This is the successor to PubSubHubbub. It’s pretty much the same API. https://websub.rocks is a great resource for implementors.


From the page linked:

> WebSub was previously known as PubSubHubbub.


XMPP has already been mentioned (thanks edhelas). What I didn't see mentioned was ActivityPub (https://www.w3.org/TR/activitypub/), or some of the advanced messaging features users have come to expect (e.g., store and forward, forms, automated route handlers).

What I am curious about are the following questions?

1) What differentiates WebSub from XMPP?

2) What differentiates WebSub from ActivityPub?

3) How are you handling the N-squared delivery problem, if you are delivering content directly to each subscriber with HTTP POSTs?

4) Does WebSub currently support store and forward? If not, is that on a roadmap for a future version?

5) Same as 4, except for support for forms and form responses? Examples are a builtin Yes or No reply, or a builtin poll vote.

6) Same as 4, for automated message routing.

7) Why not have it be transport-agnostic, instead of mandating HTTP? And why HTTP? The growing trend is towards more decentralized.

8) How does this compare to Sir Tim Berners-Lee's SOLID (https://solid.mit.edu/)?


So I read the comments but I still don't understand what problem this specification solves. Does it enable new use cases? Does it enable new trust models? Since it's only a server protocol, how do final users actually read the content?

Edit : I don't understand the downvote. I use RSS daily to fetch news and I don't have problem with it. It's simple and deliver news to the edge (my mobile phone). I'm not saying WebSub is useless, I would like to understand what a 3 entities model brings to the table compared to a simple server-client delivery. What's more, the Subscriber entity cannot be a mobile device with the current network, because mobile internet providers block incoming GET requests. Therefore to fetch news, it has to be a pull model.

Why not add a simple rational in the header of the spec, explaining the problem, the existing solutions, and why this new solution? Does it solve a security issue, a scalability issue or a trust issue?


WebSub is the successor to PubSubHubBub, which was designed to make distribution of RSS items more efficient (ie receive items as they are published on the host):

- publisher would send updates, instead of having everyone poll - publisher would be protected from thundering herd if a content suddenly becomes popular - publisher and subscribers wouldn't need to exchange a full "page" of items when only one is needed


This sounds great in principle, but this is 2018... virtually none of the use cases for RSS can be replaced by something which requires the consumer to be online and capable of receiving an HTTP POST (which isn't possible on the web platform at all, so this shouldn't be called WebSub). To the extent to which a "thundering herd" ever existed for RSS, it was from large numbers end users with normal computers behind firewalls constantly checking the feed, which is an environment where this protocol has no hope of ever being applicable.

So far, no one on this entire thread has descried a single use case where WebSub actually provides value. Someone mentioned that theoretically a service like Facebook could use this for Facebook, but then they themselves linked to a page on Quora with an explanation from someone at Facebook for why they tested out PubSubHubbub and then gave up on it which stated that he "think[s] the benefits of adopting PubSubHubbub are less clear" for this very issue (that it is inherently a server-to-server protocol, where there wasn't really a problem in the first place).


Are you saying that webhooks have no purpose in 2018 ? Because while WebSub was created as an effort to help RSS get more realtime, it's a more generic architecture for publishers who have content and need to notify interested subscribers.


What we are saying is : In the attempt at understanding the use cases of this proposal, people said in the thread webhooks were created to replace RSS+GET pull, as an "enhancement".

As we are trying to explain : this proposal doesn't cover all use cases of RSS+GET pull, so it should not be considered a replacement of RSS nor an enhancement of it.

If WebSub has a purpose (if any), many people fail to explain a valid one. Does it worth wasting the resources of the W3C for such proposal?

The "world wide web" is wide, and doesn't only include servers


> people said in the thread webhooks were created to replace RSS+GET pull

That's the root of our misunderstanding then: never was PubSubHubbub _replacing_ RSS+GET pull, it was supposed to _help_ it for people who want notifications, on top of the existing system.


The root of the misunderstandong is that their is no rational in the proposal. How are people supposed to understand any proposal without a rational?


> distribution of RSS items more efficient

RSS is a 500kb static text file which is updated usually at most every hours. The whole file fits in server RAM and can already be served to thousands of people from cache with minimal CPU usage, or from Content Delivery Networks. A "full page of items" is easily compressed with gzip algorithms.

Surely, the Internet isn't congested because of RSS...


Don't forget about ETags.


> - publisher would send updates, instead of having everyone poll

Publisher still has to send the update to the thundering herd, no?


No, see the diagram (https://www.w3.org/TR/websub/websub-overview.svg): publisher tells the hub "here's the new content", the hub is in charge of contacting each and every subscriber telling them "here's the new content". Only the hub _could_ be overwhelmed, but it works at its own rhythm so it has more control.

The thundering herd happens when you have an incontrollable influx of traffic coming in, and you have no way to regulate it.


This is ridiculous. Any big enough palteforms such as Medium or Reuters will already partner with CDN, cloud firewalls, automatic provisioning, etc to handle traffic.

For small self-hosted blogs, authors can pay a CDN that will absorb any exceptional loads and almost cost nothing the rest of the time. Eventually the blog will go down for 24h and will be back online when things get calmer.

The idea of creating a hub that will be a central point is the opposite of distributed information.


Funny that Medium then decided to buy a company that was built around WebSub and was founded by one of the authors of the spec: https://techcrunch.com/2016/06/02/super-to-medium/ Seems like they do find some use for it? (As does eg Google who has it as the preferred real-time deliver mechanism for sitemaps.)


> (As does eg Google who has it as the preferred real-time deliver mechanism for sitemaps.)

Again, totally random fact without any single source to prove it.

WebSub basically comes from the need of that one entreprise? Does it worth making it a W3C specification?


WebSub enables real-time web (push instead of pull). Like WebSockets, it's a protocol, not a technology. It's designed to be used in creating new technologies. One such example: https://superfeedr.com/tracker


I'm glad the proposal is useful to actual businesses.

However, it now seems WebSub was created to add a middle-man that will a) read all your activities and b) take a transaction fee on every update?


Where do you take that from? Everyone can run a hub and pubsubhubbub hubs were generally free. Also, no, it won't read your activity. All a hub sees is that a site published an entry and that a feedreader - which can serve millions of users on its own - wants to be notified of that. Hardly confidential information.


In that context a feedreader (like feedburner) is a Subscriber, right? But then millions of users have still to potentially pull the data from the feedreader, because they can't implement WebSub, because they have limited connectivity and can only pull content. In the end, how is that different than a blog pushing a RSS file to a CDN?


> But then millions of users have still to potentially pull the data from the feedreader, because they can't implement WebSub

This is worth pointing out directly: Yes, of course. You always do have to transmit the data to the actual reader. But those are users of the feedreader, they do not contact the source at all.

The advantage of this scheme is that data is transported exactly as often as necessary and as fast as possible, unlike when data is polled.

It has no relation to a CDN I see. A CDN would mean the source has not to handle the incoming traffic, or less of it, but a CDN does not enable push nor reduces it the traffic between CDN and feedreader server.


Ah, I think I see where your problem comes from.

Again, I'm not too deep into WebSub, but I know the predecessor quite well.

This spec solves a specific problem. You have one URL resource many clients are interested in, that URL gets occasional updates, many clients want to fetch that new content. Think RSS feeds. so far, every client had to pull, meaning look at the file again and again and see whether there is a new update. Think a bit about that and you see how that is hard to do on client side (comparing content, making sure new stuff is really new and not just reordered, storing the old file, etc). Normally, those clients are servers. That's really important to understand, without that knowledge WebSub makes no sense.

Push-based protocols solve that problem with a middleman, the hub. Whenever there is an update the original server sends one single POST to the hub, the hub then sends that notification to all subscribers. And wush, no more polling.

The end result is way less traffic on the lines, way easier architecture on the feedreader side (it can be not built on a polling infrastructure, instead it just has one webhook open, and when it gets notified of an update it fetches the source once), less server load on the origin. For blogs for example that really is relevant.

This has nothing to do with a CDN.


Excuse me Sir, but I think you didn't understand what I meant.

- With today's technology, you don't need an intermediate online, 3rd party, feedreader. Devices can pull data from the sources.

- Not only devices can pull the data, but it's often the only way to get data, since mobile operators block any incoming trafic on HTTP protocol for security reasons.

- People don't need to read 1000 pieces of news in real time.

- With a source and a RSS reader, there is NO HUB at all. How convenient.

- For relatively static content delivery problem, CDN already exists, cache exists since decades and is already part of the base HTTP protocol.

- The only thing that protocol enables so far is middle-men that will try to monopolize and control the news ecosystem, while news delivery is already free (for the end user)

Bonus point : the traffic of text on internet is ridiculously small compared to video and other data. That argument is not a valid one to justify introducing another 3rd party


WebSub is not a RSS replacement, but enhancement. It's not created to be used by the end user directly. Such web feed management providers as FeedBurner[1] are pulling feeds from the origin no more than a few times per hour, causing a delay for the end user. With WebSub, publishers can update their FeedBurner feeds in real-time.

[1] https://en.wikipedia.org/wiki/FeedBurner


What makes you think people want news in real time? By the way, we can also have news in real-time by pulling RSS directly from the sources, like every 2 minutes


What you are proposing is by definition not real time, it is not scalable (not on the source side, not on the client side, and not traffic wise), and it is not fast enough for things like getting notified of a new chat message where this scheme can be used as well. And people do want real time there, and often enough for news as well.


So basically, this is IRC, but the initial authors were removed, and the whole thing is less efficient because it runs over HTTP.

At some point you have to go outside of the HTTP world and look for what already exists rather than reinventing the wheel.

Does it solve news delivery ? RSS do that already.

Does it solve instant notification and presence notification? Lower protocols already do that with lower overhead.


I'll say it straight: You really do not know what this spec is talking about, that is why you are indeed misunderstanding this. And that's the positive interpretation. By now your points amount to nonsense as they support a unsupportable position. Not one of those points has any relevance to this discussion:

- This would not change

- WebSub has nothing to do with mobile devices

- This is not up for you to decide, and the server powering online feed reader or platforms like facebook do have a lot of data sources

- That's not a point? For the type of RSS readers you are thinking about that would not change.

- A CDN has nothing to do with this.

- No. It is a decentralized/federated protocol, no middle man can do that.

Your bonus point is invalid: The 3rd party is already there (pubsubhubbub is used in production), this is not primarily about this type of traffic reduction you are thinking about, but yes, the ressources needed to do without websub what can be done with websub on a big scale are enormous.

Again, this is about enabling push architectures for server to server communication and a scalable way to achieve real time notifications. The tiny feed reader application on your smartphone is not directly related to any of this. It would not use this protocol, it can't use this protocol, and it would not stop working because of this protocol.

Edit: I made this sound a bit nicer than initially. It really feels like you want to misunderstand this spec, this annoys me a bit.


Sorry, but the title says "Open distributed protocol for pub sub communication", the only buzzword missing is "federation", which is by the way not used in the proposal.

The difference between a proposal like ActivityPub and something like WebSub is that ActivityPub solves an actual problem of social network monopoly.

The first products that come up with ActivityPub are free (Mastodon), while the products presented in this thread makes money from that protocol. In my opinion, it's sketchy.


The initial protocol used by Mastodon, OStatus, used Pubsubhubbub (which is the same thing as WebSub) as it’s realtime delivery method.

WebSub is a spec from the very same working group that published the ActivityPub spec and WebSub is actively used by eg. the IndieWeb movement that do indeed very much solve social network monopoly as everyone hosts their social profiles themselves there.


If you subscribe to 1000 RSS feeds, it could take a while to update your feeds. Whereas with pubsub it's like all your feeds are merged and you can get a full update in one transaction. It's way more efficient at scale (I.e. when feeds are actual people, like in social networks)


> If you subscribe to 1000 RSS feeds

I'm subscribed to about 40 RSS feeds on my mobile phone, and I don't even have the time to read everything.

> It's way more efficient at scale (I.e. when feeds are actual people, like in social networks)

Could you show me basic math to prove that? Firms such as Facebook and Reuters already operates at world scale, and don't need a new protocol to deliver news.


They actually already use protocols like this. I did not dive into the differences yet, but so far it just seemed to be a more open spec of pubsubhubbub, and that would be very useful.


No they dont. I use an offline mobile feed reader which pulls RSS feeds directly from the sources and it doesn't require any 3rd parties, like Flipboard, Feedly and others.

RSS with a simple pull GET request is actually more OPEN than this proposal.


You're just wrong. Platforms like Facebook already use protocols like this for their content propagation. This has nothing to do with your offline mobile feed reader.

> RSS with a simple pull GET request is actually more OPEN than this proposal.

It is not. Also, Websub does not change anything for clients just directly fetching the RSS feed.


Please, provide a source as a proof that "Platforms like Facebook already use protocols like this for their content propagation".

> It is not. Also, Websub does not change anything for clients just directly fetching the RSS feed

Indeed, it simply adds a middleman between the content producer and the final consumer.


> Indeed, it simply adds a middleman between the content producer and the final consumer.

No, it does not, not for the clients you seem to think about. The regular RSS feed does not vanish.

For platforms like facebook using schemes like this, see https://www.quora.com/Why-doesnt-Facebook-implement-PubSubHu... and https://developers.facebook.com/docs/graph-api/webhooks


In the same very quora response : "Additionally, the fact that PubSubHubbub requires applications to have a server endpoint makes it difficult for desktop and mobile clients to implement."

That's fine, you precised it would be used between servers. But it limits the "openness" of the concept imho


It limits the applicability of the concept, sure. But I would not call that openness. It is not like there is a proprietary element or something that hides information from clients, it is just an enhancement that works only in a specific situation.


Fair enough


1. Why not a SUB HTTP request? And a PUB http request. The response URL could be a required header.

2. We have HEAD, can we do service discovery using HEAD?

3. Why not let a topic be a HTTP URL? “PUB /user/john/position HTTP/1.1\r\ndata...”.

4. Subscription expiration as a way to force subscribers to renew and upon renew get redirected to other servers is pretty cool. NATS has a special message (the INFO message) to do the same, but you might be in the middle of an important request-reply session you don’t want to abort.

5. The authors could have made this protocol very “non-http-ish” by implementing what amounts to Redis but in HTTP. I’m glad they didn’t. This still feels like HTTP, which is great.


"topics" are indeed HTTP URLs:

  1. Definitions
  Topic. An HTTP (or HTTPS) resource URL.
However, you subscribe to a topic by interacting with a different "hub" URL, passing the "topic" URL as a parameter (`hub.topic`).

Service discovery does appear to support HEAD requests. (See section 4.)

Having a new HTTP verb for subscribing and publishing would seem like unnecessary complexity to me. Rather than ask "why not a new verb", I think a case would need to be made that a new verb is required, that the operation does not cleanly fit into the semantics of existing verbs. The existing verbs are capable of modeling quite a lot.

With the protocol as they've described it, subscribing is just sending an HTTP POST to the hub URL, passing in the topic URL. That's a simple HTTP operation that a lot of clients and programs can be instructed to do easily. Requiring the use of a new HTTP verb will make interoperability difficult without apparent benefit.


> Having a new HTTP verb for subscribing and publishing would seem like unnecessary complexity to me.

Complexity for who?

Introducing a secondary "hub" resource here is just accidental complexity. If I want to subscribe to resource A why am I talking to a different resource B? And once you introduce a secondary resource now you need yet another service discovery mechanism to support discovery of these pseudo-resource hubs. (Heaven forbid using an existing service discovery mechanism like RDDL.)

Honestly stuff like this is just so poorly thought out it's difficult to understand why W3C stamps approval on this crap. There's no consideration given to alternate protocols like WebSockets or XMPP and there's no attempt to layer on top of existing standards in a meaningful way (hub.secret -- really???). Worst of all there's no real understanding here of what it means for a resource to change. The entire Content-Distribution model is geared towards just one very narrow use case.

It's clear the W3C is all about being "inclusive" and "moving fast" there's real fear of "overthinking" things -- but seriously if this is the result we'd probably better off with better standards once a decade then this.


>Introducing a secondary "hub" resource here is just accidental complexity. If I want to subscribe to resource A why am I talking to a different resource B?

Think about what the hub has to do. It may have to notify millions of subscribers, deal with any errors, retry, etc. This is a very heavy duty messaging system that most publishers will not want to run themselves. And yet you want the publisher's domain name to be the well known resource that ultimately controls things.

Publishers may be blogs hosted on small websites or even things like cars, phones, laptops or home appliances that are not always online or have to work under tight resource constraints.

Publishers may wish to distribute their content through more than one hub. We don't even have to think of avoiding censorship to see why this increases availability.

I think making it possible to split the roles of publisher and distributor is a very good idea. You can still decide to implement both roles on one server.


Best explanation for me so far. Problem I see is that the same resources constraints could be on the Subscribers (limited connectivity, limited uptime) and the protocol does not address it?


Good question. I don't know if the proposal addresses it directly, but as subscribers have to provide an HTTP endpoint for notifications, I would expect that subscribers would not normally be end user devices but rather gateways similar to SMTP servers or application APIs.


> better standards once a decade

Funnily enough, the first drafts of this protocol (back then, called PubSubHubbub) were written circa 2008, so this specification is about a decade in the making.

At the time it was distributing content between a number of the bigger blogging/publishing platforms of the day, and also notifying search engines so they could update their indexes more quickly.

If anything it seems like the standardization process was too long and missed the boat here (this particular problem is now most often solved by proprietary protocols), rather than being "rushed through".

Can't deny that the world has changed a lot during the lifespan of this idea, though. Cellular-connected computers in our pockets were barely on the radar when this spec was first written. I'm sure some would argue that the burdens of publishing have now shifted on to the reader (probably battery powered, spotty connectivity) whereas in this spec's original universe the burdens were on the publisher (CDNs not yet as widespread, more independent publishing from web hotels, etc).


New HTTP verbs require support in the web server, and all proxies along the way, while using existing ones doesn't and is just part of the normal application framework.


I've just learned about this protocol tonight, but from skimming the specification, I think it's a well-thought-out, practical, minimal protocol.

WebSub is a protocol for people who want to implement the publish/subscribe pattern over HTTP callbacks (aka webhooks). Using webhooks means that subscribers don't need to have any kind of ongoing connection or session open to receive publishes. Subscribers are passive web servers and merely wait to receive an HTTP POST. No state, no connection, polling, or anything. The general model of HTTP callbacks is a simple scheme that's easy to implement using any programming language or platform out there, all of which have HTTP clients and servers capable of getting the job done with minimal fuss.

I have actually built custom systems that worked using a very similar pattern as this protocol, where clients of a service pass in URLs where they'd like to be notified when an event occurs. Perhaps this is why I find myself nodding along when I read the protocol spec. There wasn't any standard way to model this, so I just invented something on the fly. You also see this pattern implemented in services like AWS SNS's support for HTTP [1], in Google Cloud PubSub, Twilio, etc. Each of these has an entirely custom protocol for PubSub over HTTP callback, and not something that's standard. They all tackle similar issues like preventing attackers from creating unauthorized subscriptions to URLs, but in different ways.

WebSockets doesn't solve the same problem as WebSub. WebSockets require a continuous connection from a client to a service. An application will need to devise its own logic for resuming a session if the connection breaks.

WebSub requires no active connection nor session. WebSub subscriptions could remain functional for months at a time (really indefinitely), with there being no communications whatsoever between messages. The system that initiates the subscription can be different than the one that receives the publishes, which is valuable because it means that messages don't all have to go to a single place. Publishes are sent to the domain name specified in the subscription URL. The subscribed web server could change regularly and everything will work as long as the DNS name keeps pointing to the right place. You could use multiple web servers to handle the subscription, by putting multiple servers in the DNS record, or you could use a load balancer in the same way as other web requests. This means you can scale easily. These are the kinds of benefits you get from building subscriptions on top of HTTP. All of the standard techniques and standard software "just work".

I'm not an expert on XMPP, but I suspect XMPP would also be a bad fit for this use-case, and would also require continuous connectivity from the subscriber (please correct me if I'm wrong.) I think the same is true for MQTT but I'm not an expert on that either.

As a person who has built and used multiple systems following this general abstract pattern, I think this is a good attempt at drafting a standard protocol. My impression reading the spec is that its designers had a good idea what problem they wanted to solve, and what kind of characteristics they wanted the solution to have, and came up with a protocol that succeeded in meeting those requirements.

What's the objection to hub.secret? That facility doesn't seem essential to a minimal version of a protocol like this, but I understand why they included it. It provides a simple way for the subscriber to authenticate that the content they're receiving is legitimately the result of their subscription to the topic, and not e.g. an attacker's subscription, or an attacker system that's trying to impersonate the hub. How would you tackle this issue in a simpler way? (It would not be easy to solve this problem within the protocol using TLS, for example.)

[1] https://docs.aws.amazon.com/sns/latest/dg/SendMessageToHttp....


Thanks for the explanation.

I still don't see the plus value compared to a simple pull-news-from-URL model. RSS with GET is already session-less, and remains functional for months, AND also works when the clients cannot receive incoming connection (mobile devices)


And Pubsubhubbub/WebSub is a plain simple upgrade on top of that model, for when you do not want to rely on polling only, e.g. because you want updates delivered quickly (vs polling, where most implementations sensibly have logic to adapt their polling speed to posting rate of a resource).

One of the benefits closed platforms have is that they can deliver posts inside the platform immediately, WebSub brings the option for that to feeds on the open web, without requiring subscribers to poll every <5 minutes and without requiring them to do large changes under the hood, e.g. introducing new non-HTTP protocols which can't be used on all hosting options.

For end-devices other update mechanisms are useful as you say, and systems speaking them could hook onto Websub hubs to get notifications they then translate. E.g. your typical wordpress blog has no chance of offering a XMPP channel, but it can ping a WebSub hub since it's only HTTP.


If you want to "upgrade" the old server-client model (request-response) to a realtime instant bidirectional information flow (dialogue communication), you surely want to look at protocols such as XMPP.


Again, you exclude large parts of the web if you require something like XMPP. It totally makes sense to look into such things where it's viable, and you can bridge them with WebSub, but that was not in scope for pubsubhubbub (and WebSub is intentionally only pubsubhubbub with minor clarifications and cleanup).

Pubsubhubbub was a (relative) success because everyone doing RSS feeds could easily add it with their existing tech stack.


I will maybe sound harsh, but to me it looks like the Web (JavaScript?) kiddos wanted their own half-baked solution. As they only know about the Web, then they reinvent the wheel on top of the Web.

Anyway, if the proposal is useful to some people, then it won't do harm to have it in the public domain.


Pubsubhubbub is actively used on the open web for over a decade, from large centralized services to wordpress instances on random PHP shared hosts, has been used for at least parts of Wikipedia, ... It actually works. I'd say that's a success compared to theoretical XMPP based solutions.


I don't think we need more HTTP methods, unless a real case can be made for the existing ones not being good enough. The HTTP strongly discourages it.

But I love the idea of topics being Uris (not just HTTP)


This reminds me a lot of the SUBSCRIBE method [0] in Microsoft’s WebDAV extensions (I think this is used for EAS?). You pass a Call-Back header with a URL that gets called with NOTIFY [1]

[0] https://docs.microsoft.com/en-us/previous-versions/office/de... [1] https://docs.microsoft.com/en-us/previous-versions/office/de...


Middleboxes and bad HTTP servers/clients would explode, I guarantee it.


But we have the PATCH method, which is... Non-standard, I think? And it works fine.


It's been a standard for a few years: https://tools.ietf.org/html/rfc5789


Always use HTTPS?


Superfeedr[1] (acquired by Medium in 2016[2]) was probably the biggest company yet, that based its entire business model on WebSub / PubSubHubbub protocol.

[1] https://en.wikipedia.org/wiki/Superfeedr

[2] https://techcrunch.com/2016/06/02/super-to-medium/


Just thinking beyond a replacement for RSS, this could be used for large-scale persistent virtual worlds. WebXR version of Second Life. GLTF as a content transmission format. As the Hub updates with a POST request, Subscribers callback could return local state. Hub becomes global state of truth. And there is no need to manage peer net or continuous WebSocket connections.


This has come along way from "WebSub was previously known as PubSubHubbub"


The former is a much better name IMHO!


So it's an alternative to something like WAMP? (Web Application Messaging Protocol on top of websockets - https://wamp-proto.org/)


Except you don’t have an open connection where you receive messages on. Instead, you register a callback url. It’s more like webhooks, but it’s it’s own thing, and not a setting on a page.


Just bring back Fidonet


How does this compare to ActivityPub used by Mastodon et al?


It's part of the Social Group at W3C IIRC. WebSub solves problems around RSS, notably that clients have to poll a lot and smaller sites thusly need to maintain those requests.

WebSub solves that by designating a hub that can handle this instead. Ie, you federate your blog feed to elsewhere.

ActivityPub fills another niche, mostly everything around human interaction in social networks.

WebSub could be used to feed data into ActivityPub networks.


After reading the responses I don't see yet why the non-transport bits of WebSub can't be done with ActivityStreams and ActivityPub. That said, I haven't put the kind of time into it like the W3C Social Group has, nor am I recognized expert on it, so I'll hold further comment until I can finish and publish to Show HN a related proof-of-concept involving Peer-to-Peer topic based publish/subscribe using ActivityPub, that I've had on the back-burner for a while.


See it more as an alternative to RSS. Aggregators subscribe and feed information to their users. The polling step is just switched to pushing.


Not really an alternative, more of an addition to. The most common use case for Pubsubhubbub/WebSub is in combination with RSS/Atom feeds, informing subscribers about updates to those.


I wish there was a pull aspect to it, like NEWNEWS with NNTP. If the subscriber is offline it seems that there doesn't seem to be a way to retrieve missing items except through the publisher.


> The subscriber must be directly network-accessible and is identified by its Subscriber Callback URL

Does this mean the subscriber needs to have a forwarded port open to the internet for this to work? Without IPv6, users behind NAT (and specifically behind CGNAT) wouldn't be able to use it.


I think Websub is enhancement to XMPP. Both are asynchronous. Which means it should be faster than HTTP. Actually it's a push-based protocol where server sends single POST to the hub, the hub then sends that notification to all subscribers.


So as with webhooks, this will require you to have a permanently reachable server sitting on the internet. Couldn't they at least have defined an alternative transport using websockets or SSE?


WebSub is essentially just the transport; the payload is usually Atom or RSS, but it's not specified. Had they included a websockets version, it would essentially be something almost completely different with the same name.


What I don't get is what the motivation is for running the hub, other than "get bought by Medium?"


Is this compatible with new distributed crypto projects like scuttlebutt or dat, and if not, why not?


WebSub is just the standardization of PubSubhubbub, which has existed (and been used by publishers like Feedburner) for longer than those projects.


So essentially there's no difference right? When I was reading the spec I couldn't tell of any notable differences between the two texts.


It is not, because it predates them both by several years.


This makes me think the Web is dangerously close to becoming obsolete because it doesn’t embrace the new end to end encrypted, decentralized paradigms without central web servers and pins everything on http.


I really don't think that is even close to reality as the majority of users don't care all that much about e2e encryption (because they don't understand it) and that same user base has no idea what you are talking about with regard to decentralization. I would even venture to say that I'm not in a tiny minority of professionals that do care about e2e encryption and do understand decentralization but think that decentralization will never be much more than a science experiment. Users care about their experience and they want to have unencumbered conversations. The decentralized dreams impede those things.


Have you tried Patchwork, the scuttlebutt chat/social client? It’s zero effort to get started, not like you have to compile Linux and run your own DNS server.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: