Hacker News new | past | comments | ask | show | jobs | submit login
Direct Sockets: Proposal for a future web platform API (github.com/wicg)
65 points by olalonde on Feb 19, 2023 | hide | past | favorite | 53 comments



So the thing is: the original "websocket" was prevented from being a full arbitrary connect() implementation out of security concerns. That is, the web page is running inside the user's network security boundary, and might be able to make connections which appear trusted.

If there's an API for "desktop application" web pages which can make arbitrary connections, what does the security model look like?


The only one I can imagine is still that the server can tell a client "you're allowed to direct socket to these ports on this DNS address specifically", with possible some rules around DNS addresses similar to cookie sharing on domains, and with a mandatory CORS preflight request to an HTTPS server on the same target domain.

Why?

The server can't be handing out arbitrary permissions, for two major reasons. One is that you just can't be putting up servers that are handing out permissions for other networks for a host of obvious reasons. The second is that while the Internet is sufficiently connected that we are often able to just pretend it's one big happy IP namespace, that's not true. The addresses reserved for local networks create one obvious exception (there's a ton of 192.168.1.1s in the world), but just in general what the server thinks another resource's identifier is may not be the identifier from the client's point of view. A lot of hacking opportunity in exploiting the gap between a server's concept of network identity and the client's.

DNS isn't enough, because I can set up a DNS subdomain to point at any IP I want. I'd need to pre-flight check the request to ensure a cert establishes at least some minimal level of ownership over the domain, and there's no protocol-generic way to check, so we have to reuse HTTPS.

Now, by the time this is all set up, you probably might as well just have set up a websocket proxy. You're certainly not using this to build a glorious P2P application or anything. (If you could convince your users to install a new root SSL cert, I can see making this work with some other grease, but without that I think you're stuck being the MitM router for all traffic, which is hardly P2P.)

There is also the YOLO option of just letting browsers open unrestricted sockets and letting the internet pick up the pieces. Which it eventually would. But would probably result in an even more restrictive set up than we have now.


It shouldn't matter in theory, perimeter model is dying. Current security paradigm is to treat every device on a network as actively hostile.


Even with connections to localhost? Should a computer no longer trust itself?


This has already happened. A technique called DNS Rebinding has made it possible for remote websites to issue connections to your localhost (which should not normally be be allowed) for the last 15 years or so. As a result, it is a security vulnerability to web serve on localhost without checking the Origin: header or having the connecting browser prove that it's local by reading a token from disk and using it to authenticate.

(And whether it's a severe vulnerability depends on what the web server provides. In many cases, this has been "RCE on your machine".)

Here's an example from 2018: https://bugs.chromium.org/p/project-zero/issues/detail?id=14...


Yes. A computer trusting itself used to be the primary way, and still is in many cases, of how viruses work.


Most OS in use have multi-user security models, these days mostly used to compartmentalize system components and service accounts. Lots of vulnerabilities come from cutting corners here.


The "everything in the network is 100% trusted" model is dying, but it isn't being replaced by "everything in the network is 0% trusted".


> Threat

> Attackers may use the API to by-pass third parties' CORS policies.

> Mitigation

> We could forbid the API from being used for TCP with the well known HTTPS port, whenever the destination host supports CORS.

I'm really curious about the implicit ethics here.

There's this idea that the web should be able to do everything that native apps can, an idea that I'm inclined to agree with. One thing that native apps can of course do is to bypass third parties' CORS policies. And there are may legitimate use cases for that, like feed readers for example. Right now, if you want a feed readers as a web app, you need a backend, to be able to request feeds (and homepages, for autodiscovery). For a native app, you could do that all client side.


I also don't get it. The whole reason for CORS is "Resource Sharing" (e.g. indirectly using resources like cookies belonging to another domain). With direct sockets, no shared resource is being accessed, all the browser does is open a TCP connection (e.g. no cookie accessed or sent anywhere).

I can understand that TCP connections could be abused by some websites (e.g. using your browser for spamming, accessing unsecured local services, etc.) but this can be solved with a permission style popup just like with the geolocation or webcam APIs.


I'm pretty sure "Resource Sharing" is about resources on a server being loaded from other origins, not about sharing client-side information. This is a client-side restriction. By offering an alternative you ship a workaround for CORS, effectively disabling it.


If that's the case, what does it protect against? Reducing server load? This has always been trivial to bypass by proxying requests through a server. With a direct socket it would be the same except you wouldn't require a proxy server.


Yes, the way you're stating it is fully correct. It is trivial to circumvent, but it would be so in your scenario too - sharing cookies is easy to implement in a proxy server. Still, for a simple website it's a big hurdle, since it forces you to build a proxy server.


You’ve got a point, however almost every native app out there that constantly interact with a web api is likely written with web technologies and shipped with Chromium / Nodejs.


I wonder if it’d make more sense as a PWA only feature. The use cases are definitely there for websites but the security minefield makes the proposal extremely limited. If the user is already going to install the PWA maybe it opens up room to do things like be less worried about needing the user to type every address accessed manually while simultaneously reducing the number of weird new permission asks they see from websites.

In general though it’d be really nifty to have this functionality. Chromium already runs most apps it’s just right now they all ship their own Chromium.


At a given point it might be worthwhile to accept that a native application should be written instead.


Seeing that a big chunk of the use case is p2p. That’s difficult to do well without a listening API, SO_REUSEPORT and/or UPnP (or other port mapping protocols). If p2p is an explicit use case someone should prototype how that could work and also make it better/simpler than WebRTC before putting a new proposal forward for the entire web.

Since this is meant for high trust anyway, perhaps just shipping the more familiar Node APIs for tcp/udp would be better? That would let people do cool things (including p2p) that actually works, and it’s pretty battletested. Or maybe I’m missing something? It was a while ago I used that.

Another thought: don’t we have too many streaming protocols already? Do we really need SSE, WebSocket, WebTransport and now raw sockets as well? It’s a lot, (but not this project's fault)


> make it better/simpler than WebRTC

This certainly looks simpler than WebRTC, which is absolutely impenetrable.


What do you find impenetrable about WebRTC? In most cases I have found the complexity necessary. When you find yourself in unique situations it’s a protocol that gives you the knobs.

I created https://webrtcforthecurious.com to try and solve the accessibility/education issues around WebRTC


Let me open the first page of the guide for example.

> https://webrtcforthecurious.com/docs/02-signaling/

The moment I open this page, which is ostensibly the first on the way to understanding WebRTC, it slaps me over the head with hundreds of terms I’ve never heard before. More importantly, I don’t understand why I should care about them.

I just want to do Server.serve(5648), and Client.connect(server, 5648) and then pass binary messages back and forth. This is the mental model I already have, and works fine when I’m anywhere other than a browser, and even for websockets (mostly, except no UDP) and whatever website explains WebRTC will first have to explain to me why such a simple thing cannot work.


To me the issue is that any real-world deployment of this is going to want some form of encryption, or you are just leaking the poor user's data to a ton of random people. In addition to encrypting the data, this means you are going to need to build in a key sharing mechanism. Like, this is table stakes: anything less is unacceptable, and is a complexity you don't escape even with a native app. I think you then rapidly discover that just satisfying that one need causes most of the complexity of WebRTC (and to the extent to which it isn't documented well enough to show that that doesn't mean the protocol or even API sucks and should be replaced: it just means we need better documentation for people who don't care enough to learn how it all works and how to bypass the issues).


The oblivious mental model was something like `WebRTC.send("12.34.56.78", "hello world!"); WebRTC.addEventListener("text", function(e){if(e.ip==="12.34.56.78"){ /* do something */ }})` not that I expected it to be that easy (it isn't php) but it turned out to be (ehm) slightly more complicated than that.

Thanks for writing the book.


Exactly. Having worked with plain tcp/udp sockets, and then trying to understand WebRTC is like trying to run a marathon after winning the 100m sprint.


To be able to accept connections, you would need to have a white IP, configure router to map port, configure firewall to allow connections. Too many things could go wrong.

With BitTorrent plenty of clients can't accept connections but that does not render them unusable.


You could use STUN, or if you can make arbitrary connections talk to the user's router over UPNP.


These things may be a desired use case but as the explainer lays out they were already rejected for being too easy to abuse. This proposal is an attempt to get some basic portion of the functionality that is easier to limit even if there is no way for everyone to agree the rest can be secured too.


This would be so amazing. In order to access most vanilla services like redis, postgres etc. you need to deploy a bridge https://github.com/zquestz/ws-tcp-proxy

CloudFlare/Deno etc. all have these workarounds around tunnelling but all that would disappear with this protocol. I made a service for writing servers on the web (https://webcode.run -- somewhat abandoned at this point but it is still running), and a big problem with the approach was the web's inability to make TCP connections.


This would be so awful. Most vanilla services like redis, postgres, etc... would need to deal with the frontend spew directly instead of offloading that to a bridge as an intermediary.


You could still have a reverse proxy to deal with the worst of it. But if the client could already talk the right protocols it would remove a lot of unnecessary complexity.


the bridge is just a proxy, the volume if traffic is the same but you now have added a regional hop. Exposing your redis and postgres on the public web is not a good idea, but there are tons of internal usecases, and things like postgres actually have a security mechanism so its not default "insecure"


You CAN expose your db, cache services as long as they are secured.

The bridge just pushes the can down the road and you have to secure it. It assumes the internal network is safer which is kind of a security sham to begin with.

In the long run it would make the individual service software more robust because security vulnerabilities would get attention and get fixed. Therefore benefitting more than just the bridge owner.


This would also allow browsers to act as direct database clients if I am interpreting this right?

That means that your web app could act like "psql" and open a direct TCP socket connection to "postgresql://user:pass@host:5432/db", which would change everything

You'd no longer need a backend just to middleman SQL requests, assuming you have proper RLS implemented.

(Whether or not this is a good idea/applicable to all cases is debatable, but it at least becomes possible if I'm reading right)


Alternatively databases could simply implement HTTP and do this today.

Connecting browsers directly to databases is generally a bad idea for other reasons.


Very cool for projects like Hasura and Supabase which rely on middleman http servers (like PostgREST). But also great for building tools like database inspectors / SQL managers right in the browser!


Kinda curious. How do you propose to keep the db/cache login credentials secure on the client side in JavaScript?

Wouldn't they be accessible to anyone reading the front end source files or plugins installed in the browser context?


Temporary expirable sessions like we have today could be one way. Generate a temporary session (DB creds) for an authenticated user.


Maybe in the browser context it would reduce the security risk.

However that db backend is still listening for logins, it does not know who the client is.

What happens today imho is that you have access to pieces of the data tables not the whole database at once to run queries at will.

When you fill out an html form or click a button that runs business logic code which might run sql queries based on a token/id you passed.

That token/id does not have access to the whole database.

Temporary database wide sessions are still a risk in the browser context.


For public data I am inclined to agree with the parent.

You could just pass a single auth token to the database if it supported it for public data only and fetch it that way. Kinda like a bearer token, etc...

Therefore having direct access to the database from the client side only for public data.

This would be very beneficial to the web as a whole as a lot of the data is public data.

Then the separation of privilege/access has to happen directly at the database level which is totally possible.

Would be a nice addition to the web to treat public data differently.


At the database table level, each field could have the following properties which processed together would decide the level of access for that piece of data stored.

data_access: private|public data_scope: single|composite data_exempt: age|gender|otherpiifield

Somewhat protecting PII by not allowing querries which would infringe PII rules when selecting multiple sensitive fields.


I don't understand why one cannot just use a binary websocket as the fairly general case? (https://www.rfc-editor.org/rfc/rfc6455#page-29 binary frame 0x2)


You would need a proxy to convert the WebSocket connection back to normal byte stream and proxy that to the target.


Yes, but what is the use case? Why cannot you modify your backend to accept Websocket handshake? After that, it's a normal byte stream.


Because you don't always control the backend used by client apps. The "Use cases" section provides examples of this.


as stated in readme, it's for legacy services that would require WS proxy.


It seems like this would not do anything to address the UDP portion of the proposal.


I appreciate the effort, but I'd rather want libdweb to be resurrected. It's a shame that Mozilla killed it

I want to be able to write extensions that can do whatever the hell I want. It's my computer and it's my Browser. If necessary, I would enable it in the about:config. But there should be an option to allow me to do that.

https://github.com/mozilla/libdweb/issues/109#issuecomment-5...


The moment a legitimate website being granted these permissions by their users is compromised, its users become part of a botnet.


Any website can already send GET requests to any IP address (and was able to since the beginning of the web).


And just that had so many security implications, a boatload of security features and restrictions were implemented: being unable to access information about a remotely loaded resource (like an image) from JS, CORS, etc.

Additionally things like form elements being able to POST to any address still is something everyone else needs to be aware of and protect against.


While there will certainly be a security implications, software is written to enhance capabilities and security must support software and not the other way around.

There will be new ways to DDoS but as it's already possible fill bandwidth on any TCP port, I don't think that it'll be drastic.

IMO main reason behind CORS is to send request with cookies to another domain. If you can send arbitrary queries from JS but you still not have access to cookies for that domain, it's not really adds obvious attack lines. Yes, you'll be able to issue arbitrary GET requests and read responses, but I never understood why is it a problem in the first place.


> but I never understood why is it a problem in the first place.

You can trivially find information on why all the security features that fall under the same-origin policy exist and what kind of vulnerabilities they prevent. I will not be repeating all that here. Most of these protections were added because actual exploits were demonstrated or found in the wild. They weren't added willy-nilly.

I'll just leave you with some reasons for not even allowing reading response of cookie-free cross-origin GET requests: You don't want websites to turn their users into web-scrapers or even use them to brute-force logins on another site that happens to use GET for some API endpoint. Maybe they'll even try some IP-addresses that often lie in personal or company intranet. Is the HTTP control page of your cheap printer not password protected? One GET request later the attacker's website is downloading your recent print jobs.

You're likely not the only to whom these are not obvious, because some of the attacks possible really aren't obvious. Be glad browser vendors are watching out too.


Secure shell and ipfs are legacy systems now? This article is nonsense.


We switched the url from https://github.com/WICG/direct-sockets, to the post it points to which seems to have the most information.

There's also an explanation at https://discourse.wicg.io/t/filling-the-remaining-gap-betwee....




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: