Hacker News new | past | comments | ask | show | jobs | submit login
Why Is This Website Port Scanning Me? (nullsweep.com)
1294 points by BCharlie 13 days ago | hide | past | web | favorite | 430 comments





It's why Tor Browser restricts access to localhost by default. This problem was already predicted and considered by Tor developers back in 2014, see ticket #10419 - Can requests to 127.0.0.1 be used to fingerprint the browser [0] and has been fixed since then. Scanning localhost is a dangerous way to fingerprint the user if there are local open ports.

If you are not using Tor Browser and want to fix the security hole without disabling WebSocket completely, running the web browser in a separate network namespace is a workaround - you get a loopback interface which is independent from the main namespace, and you create a NAT interface within the network namespace to allow outgoing traffic. It's also a possibility for a website to probe other machines, such as the setting page on your router. For better protection, you should block all the local addresses defined by RFC1918 via netfilter/iptables as well.

For developers who needs less restrictive blocking for debugging, you can run multiple Firefox processes in different profiles (firefox -P --new-instance), each running in a different network namespace - to make it easy, you can code everything in a shell script and create desktop icons for them. I normally use an ad-blocked and 3rd-party-cookies-blocked profile for web browsing, but a naked Firefox profile for development.

[0] https://trac.torproject.org/projects/tor/ticket/10419


> It's why Tor Browser restricts access to localhost by default. This problem was already predicted and considered by Tor developers back in 2014, see ticket #10419

Sorry to invoke the meme, but Opera did it first[0], in Opera 9.50 (2008). I don't have a good reference to hand, but [1] is a developer complaining about this. [Edit: [2] covers the feature in some detail.]

Opera also blocked access to private IP addresses (so there were three tiers: public IPs, private IPs, localhost; higher tiers could communicate with lower ones, so the block was only unidirectional).

IE10+/EdgeHTML-based-Edge (and I know there was some talk about blocking this in Chromium-based Edge) also blocks it, so that too is prior art to the Tor change.

[0]: https://w3cmemes.tumblr.com/post/62942106027/if-you-can-thin...

[1]: https://stackoverflow.com/questions/1836215/access-to-127-0-...

[2]: https://web.archive.org/web/20140302021701/http://my.opera.c...


To add more about why current browsers don't do this:

One is clearly that you need to communicate the requesting IP deep enough into the network stack to the point where you get the DNS response (if there is one), which means there's a fair bit of work to ensure this is done everywhere;

Another is it's known to break corporate websites (https://internal.bigcorp.com/ on a public IP expecting to be able to access private IPs on their intranet), and smaller browsers are reluctant to lose corporate users by changing long-standing behaviour;

Then there's the WebRTC question: if two users on the same local network start a WebRTC call, do we want all this traffic to go to via first point where the public IP is known? For the majority of users the answer is no. And as long as that's possible, there's a (very limited) means of local communication. (But if it's limited to only things that respond to the WebRTC handshake, we're already in a much better place!)


In Kazakhstan we have e-government website. This website allows users to use crypto-tokens to access government services (every citizen can get a digital certificate representing his identity).

This website used to run Java applet. This applet was signed and it could access restricted APIs to access USB device. So website talked to applet and applet talked to USB device to sign data.

After major web browsers disabled Java applets, they implemented another approach. Now user must install a software which runs a web server on 127.0.0.1. This webserver listens on a specific port and uses web socket to communicate.

So government website now uses JavaScript to connect to 127.0.0.1:12345 using websocket. And then it uses that connection to interact with USB device.

So an ability for external website to connect to 127.0.0.1 actually is essential for this use-case.

My guess is that there are plenty of other websites which use local web server to interact with locally installed software. I know at least one another such a website: Blizzard website. It runs web server in its game launcher and website can communicate with it.

PS also they have to install custom trusted certificate because browser requires wss from https and there's no easy way to get a legitimate certificate for that kind of use.


Since these use cases already require having software installed on your machine, it seems fine and safer to use a browser extension with native messaging for this:

https://developer.chrome.com/extensions/nativeMessaging

https://wiki.mozilla.org/WebExtensions/Native_Messaging

That bypasses the entire certificate question and lets the website know it's communicating with exactly this app and not something happening to listen on the port (and vice versa, too).

... Or, depending on what you're doing, just use a real desktop app, perhaps with an embedded browser.


Yep, that might work. But that would require significantly more work to support all browsers and platforms. Currently it's just a Java application and it works independently of OS or browser.

> So government website now uses JavaScript to connect to 127.0.0.1:12345 using websocket.

It sounds like random other websites (Ebay, etc) would be able to interact with people's USB devices this way too. Maybe without people knowing?


Yes, if this is programmed badly (missing security or a security hole).

The browser connecting to the government website accesses two servers: the original one and the second local one you install yourself on your system. The local server runs natively and therefore can access the USB device. Like all servers it should be programmed such that misuse by hackers is prevented.


That's already a security hole.

The only thing missing is a rogue website abusing it.

There's no guarantee you will never connect to any rogue website that abuses this government mandated backdoor.


When JavaScript establishes websocket connection, it sends its origin URL (I don't remember exactly where, probably that's in Referer header). So local webserver can deny connections from unwanted websites.

The app running on localhost using WSS is not why they want[ed?] you to install a custom CA certificate.

https://en.wikipedia.org/wiki/Kazakhstan_man-in-the-middle_a...


No, that's not true. They used different certificate to MITM connections.

There's no need to roll your own hardware integration for crypto tokens. Browsers have been able to do PKCS#11 client certificates from smart cards for a long time, in case WebAuthN / U2F are too modern for you.

Outside of DNS, the other reasons are about local networks, not 127.0/8. There's no good reason to permit port-scanning 127.0/8.

So remove "port scanning" since the only difference between port scanning and "legitimate" connection attempts is rate.

Should websockets be able to connect to the local machine?

In some cases that could be useful (even if something the kinds of people visiting this site might be uncomfortable with). E.g., a local daemon that listens for a websocket connection and a public web page that's used for managing it.

I don't think it was implemented this way, but think something like Battlefield 3's matchmaking: the user visits the online service and finds a match to join, and when they want to join it passes the relevant information to the game via websocket.

I imagine stuff like this has been implemented in at least a couple cases that you'd break just wholesale blocking connections to 127/8.


It can ask user explicitly like it asks for microphone. It is not common for a website page to need access to the local network services as it is to record microphone or request location data. So it's browser fault IMO.

Still need to know where the top-level page is loaded from, though: you do want to allow requests from 127/8 to "localhost".

In Chromium it might not be blocked just because of an oversight (or because there was no consensus), see my other comment (and its parent): https://news.ycombinator.com/item?id=23253264

UWP apps block requests to localhost too for the same reason

I feel like uMatrix covers this as well. I've sometimes found myself redirected to some shady Chinese site and seen blocked attempts to localhost or 127.0.0.1 show up in the dashboard.

I always wondered what was going on when I saw localhost show up in the connections listing for a site in uMatrix, TIL.

> If you are not using Tor Browser and want to fix the security hole without disabling WebSocket completely, running the web browser in a separate network namespace is a workaround - you get a loopback interface which is independent from the main namespace, and you create a NAT interface within the network namespace to allow outgoing traffic. It's also a possibility for a website to probe other machines, such as the setting page on your router. For better protection, you should block all the local addresses defined by RFC1918 via netfilter/iptables as well.

As someone less tech savvy but still concerned, are there any guides available on how to do this?


You can look into tools like firejail to make this easier.

+1 for firejail [1]. There's a guide on how to do this for firefox [2] (see the network setup section), but this can be used with other applications as well.

[1] https://firejail.wordpress.com/

[2] https://firejail.wordpress.com/documentation-2/firefox-guide...


Note that the further I went down the sandboxing rabbit-hole, the more questions it raised about whether it's more or actually less secure. The main problem is that in order to work, these tools often use a setuid binary, which actually has more permissions than most users. So in theory if a sandboxed app finds an exploit in the sandboxing program (like firejail) that you're running inside, you could actually be worse off than it breaking out of whatever program you're sandboxing in the first place.

I think in this case though where you're more concerned about the very real problem of websites accessing localhost, it probably outweighs the maybe of a firejail exploit.


> these tools often use a setuid binary, which actually has more permissions than most users.

These tools often drop privileges as soon as the program is executed, in firejail, there's also an option to disalble root entirely within a namespace.


Interesting point, I hadn't made it that far down the rabbit hole. I agree that it's not a complete solution, and possible risks of exploiting the sandbox itself should be taken into account on a case-by-case basis.

It's also possible to run a web browser in a docker container which can be interacted with on the host OS. This avoids the permissions issues with solutions like firejail:

https://blog.jessfraz.com/post/docker-containers-on-the-desk...


`docker` implies access to the Docker daemon, which is not an improvement over the setuid binaries anderspitman found distasteful.

https://docs.docker.com/engine/security/security/#docker-dae...


Genuine question, would LXD be any better? I'm not an expert in containerization but I find it really interesting.

There are some blogs that talk about how to do this: https://blog.simos.info/how-to-easily-run-graphics-accelerat...


If it runs in the same Xwindows session no.

If your docker is in fact podman your rootless might be attainable.

Please don't suggest using Docker to sandbox a GUI app.

That's not a good idea. The attack surface of docker is enormous compared to firejail.

The greater issue is that browsers are allowing code executing from the public Internet scope (scope meaning security domain) network access to the localhost scope or the Intranet scope (RFC1918 addresses.)

If anything, this should require very explicit permission granting from the user. I’d prefer it be something more like an undocumented toggle accessible solely to developer types.


> to the localhost scope or the Intranet scope

That's too little. All access from a different origin should be blocked by default, not only to local nets.


I stand corrected. I think yours is the correct approach.

How shall origin be defined? I can envision the likes of Microsoft which have many, many second-level domains making calls between them.

We can’t allow the site itself to grant access. How would this be managed, other than “please stop and think what a domain name is supposed to be before spraying your product across twelve of them?”


To be clear, evil.com can define sub.evil.com to resolve to 127.0.0.1. You basically can’t look at domains to mean anything much. You have to look at IP addresses.

(Which is in turn made harder by IPv6 public addressing, where you can’t just block the private IP range because you might not be behind a NAT in the first place, instead only behind a firewall. So your address A::B, can route to your Intranet peer A::C, which isn’t public-routable, but is a public address. But there’s nothing, other than the firewall, that says that that’s not a public address. It’s a hard problem!)


To add on to your point, even if you allow evil.com to only access evil.com and not any subdomains, your browser is still vulnerable because of short TTLs on DNS resolution.

evil.com can set a short DNS TTL, and after you access it, it can rebind its address to 127.0.0.1. Then subsequent requests to evil.com go to localhost (e.g. fetch("evil.com", ...) on evil.com will go to 127.0.0.1 if the DNS rebound successfully).

Caching a website's IP on first use doesn't help, either, because it breaks long sessions on websites that use DNS rebinding for legitimate purposes (load balancing, fallover).

The only real way to fix this is for the local webserver to check the Host header on the HTTP request... or look at IP addresses. But building a global registry of IP addresses is hard, so we're stuck with trusting application developers (and malware writers) who run servers on localhost to use good security practices.


This would prevent most users from visiting that site (since most of the time it will resolve to 127.0.0.1)

Evil.com could be set to resolve normally but redirect you to <plausible random word>.evil.com, which resolves normally once and then performs the attack, leaving evil.com able to keep serving new visitors.

We already have a notion of origin that is used for most of the browser security policies (exact match of domain, protocol, port). Websockets allow servers to enforce this policy by sending an Origin header, but unfortunately observing the error messages/timing still allows you to determine if the port is open at the transport layer even if you can’t establish a connection. Since websockets routinely need to connect to different origins (they can’t be routed exactly like normal requests, though many CDNs/reverse proxies can handle both), browsers would need to remove the information leak themselves by normalizing error messages and timing across failures.

Granting access down seems ok to me. To make it really generic, you would need a way to query the upper domain if accessing a sibling is ok.

The process is different enough from cookies to warrant another large discussion about how to do it, with plenty of trial and errors. But the stakes are much lower, as in the worst case the user will get a dialog, instead of a site being broken.


That's what CORS is for, but it appears that there is no CORS for WebSockets.

CORS is not in the hands of the user. I don’t want a CORS policy authorizing access to my intranet or localhost.

CORS is set by the target, so localhost CORS policy is directly in the hands of the user. intranet CORS policy is set by whoever operates that intranet service

You’re right; a bit of a brain fail there for me.

Still, doesn’t mitigate attacks against non-HTTP speakers.


It does in the sense that you won’t be able to control the websockets payload, so the target server generally won’t respond. The problem here is that the information leak is happening prior to any data being sent over TCP, so the fact that the server will drop the connection as invalid doesn’t help.

If the user decides to run a service on their intranet or localhost with a wide open CORS policy, isn't that their choice?

Forgoing CORS and making all inter domain requests user opt-in would make the web experience a lot worse, IMO. Making all intranet or localhost requests user opt-in seems less disruptive.


However, TCP sockets can't publish CORS policies.

In the case of scanning, a CORS denial can still reveal information about the user's internal network, as a CORS denial is a different result than a network timeout or a TCP RST.


WS does have its own way of cross-domain opt-in. I think it users slightly different headers than CORS for historical reasons but effective does the same.

That a script is able to gather information about an origin that did not it in seems like a serious bug to me.


This is because websockets were created after the same origin policy, so they have always sent the origin header that allows the server to filter connections. CORS was only needed because the browsers were adding the same origin policy to HTTP requests that had historically never had it, so they needed some set of rules (and overrides for them) that didn’t break existing websites.

CORS has nothing to help with here. The site doing the scanning is not able to make connections. Rather it's only able to tell if the port is listening or not. CORS would still be listening so they'd get the same info they're getting now.

Yeah, that's the best solution. It should be like microphone or camera access. It should say "this web site is attempting to access a resource on your local system / network."

I don't think you need to overdo it in terms of making the warning red, etc. Just a popup will really discourage people from trying to use this for fingerprinting.

BTW the site says:

"Port scanning is malicious." I don't agree. There are many many things that can look like a port scan but are not malicious, most notably NAT traversal attempts by WebRTC, games, chat apps, and so on.


Right. "it is clearly malicious behavior and may fall on the wrong side of the law."

It's not against the law. It might be _shady_ but it's not illegal. When I'm teaching cyber intro classes, I let folks know portscanning is NOT illegal but shady. It's like going to a business after hours. It's not illegal to rattle doors and windows to see if they're locked. The police might have a different take on it, but it's not illegal.


Yeah - if anything this post is evidence of a use case that isn't malicious.

While it can be used to get information to bad things, it itself can be used for good things too.


Exactly, port scans on my public IP address are not an attack, but crossing the boundary to my localhost and private networks is malicious behavior.

Serves you right for browsing the web, you dumb dummy! /s

There are legitimate reasons for port scanning, but I'm not sure most websites out there are using it for noble purposes. I guess browsers could allow it based on explicit permission from the user, just like it's already done for microphone and camera.

Port scanning from a user’s browser is effectively sneaking behind a user’s firewall. The only legitimate reasons I can envision are security research, and this, to me, is such a small edge case that I’m not sure such access is ever warranted.

I’d be all for a user notification that says “fnord.com wants to access 192.168.0.10 on tcp/443, which seems to be a web server on your home/work network. Are you sure you want to allow this?” I’d want to see this for each new access request, such that port scanning would not be a use case that was supported.

Sure, have an about:config toggle to shut this off, with appropriate warnings.


This. It could even have a "remember my choice for this domain/subdomain".

I wonder if there is a browser add-on for that...?

Yea this is very surprising. I run a file server in my local network. There is no access control on it because it’s behind my router’s firewall, but everyone in the LAN can access it.

I find it very surprising that now any random website can access it with no oversight. Why worry about spectre and meltdown when such blatant backdoors are implemented in browsers?!


Does your file server speak HTTP? If so you might have a problem. If not, it sounds like it's inaccessible to this attack, except to discover that it exists.

I'm curious, what would be a good reason to do this? I'm not creative enough to think of anything this enables a site to do that isn't malicious. If I'm running a service on localhost, and that service needs to communicate with the site I'm browsing, surely I could just direct that service to communicate with the site itself.

For instance, if I'm running a local chat application and need it to communicate with the web version, why does the website need to be able to port scan to accomplish this? I can think of other ways to accomplish this that are a lot more secure.


Ubiquiti routers have a fairly magical browser SPA that can run on their domain and talk to local routers. It involves webrtc connections to local addresses.

But I think if same-origin were enforced more strictly, they could have found another way.


Huh, I never looked but always assumed this was proxying through the controller.

It does this most of the time, either through the cloud or direct to the controller. But during setup of the first device on a network it does something direct from the browser to get it connected to the cloud.

How is this different than the admin page for any other router brand? (SPA does not seem relevant to this discussion)

I'm not defending this use case, but one example I can think of is that Spotify runs a local server so that websites you access can control it, e.g. if you are on Billboard looking at top music charts, clicking on a song could start the song in Spotify, and even embed a player in the web browser, without you needing to be signed into Spotify in your browser.

Here's an interesting tangential article about how they get around obstacles with SSL certs for localhost: https://letsencrypt.org/docs/certificates-for-localhost/


This might be banks trying to detect compromised users. Many "tech support" scams aim to get remote desktop access to users PC's and then have them log in to their bank while scammers are connected. I could see how banks looking for remote access software could be a useful heuristic in fighting this problem.

Synology uses it to find your unconfigured device on the network for first time setup.

Here's my hypothesis: it's to detect bots.

Your bot is running a redis server locally, it allows local connections, because it's just a bot, boom.

Taking it a bit further, if we have really smart people involved: the timing of the attempted connections/rejections tell you something about the system that you can use to detect bots/scrapers.

Another example of this being used in the past is to scan for chrome extensions that scrape site content as well. I believe LinkedIn might have gotten hit hard for trying something similar but they were using extension URLs not localhost. Some extensions do spin up localhost services though.


>> There are legitimate reasons for port scanning

Such as?


Without context people might get the wrong end of the stick. There are legitimate reasons to use nmap on your own equipment, sure.

A port scanner running on a webpage without the users knowlege is never legitimate.

So the question is what legitimate reason is there for a port scanner running in a web browser with the users knowledge?


IRC servers detect open proxies that way.

IRC servers don't run in a browser. Instead they scan ports from the outside, which is not a problem. Anything they find is open to the entire internet anyway.

You run a network, and want to run a security audit. You need to know what devices are operating on it, and what services they are offering.

I don't get upset if someone opens and closes a socket to my VPS to see if something's there. My VPS is exposed to the internet. If a socket opens, it should be secure anyways. There's the chance nginx has an unknown zero day, but if I wanted to avoid that, I'd firewall it.

Things are a little less nice if you open a socket and start sending data to see what's there, assuming the server doesn't respond with a banner.


> You run a network

That might be a reason for you to port scan your network.

It is not a reason for your website to port scan my network. Especially since your website running inside my browser is inside my firewall.

> and want to run a security audit

Then you use tools designed to run security audits. You don't open a huge security hole in everybody's browser just so you can use a browser to run a security audit.


But that's a very different use case than having a website you visit portscan your computer (which I believe is what the user above you is referring to.

There's really no legitimate reason for eBay, or any other website, to portscan your computer. There's nothing there needed for browsing their website.


> I don't get upset if someone opens and closes a socket to my VPS to see if something's there. My VPS is exposed to the internet.

That's not what's happening here.

My laptop is not exposed to the public internet because it's behind a firewall / NAT. This is like going to my house, plugging a device into an Ethernet port on my router, and scanning my internal network from inside my network.

Except instead of them planting a device, all they have to do is get you to navigate to their webpage. They're getting your laptop to do the port scanning for them, and in so doing, they get access to your internal network. The problem isn't port scanning, the problem is NAT busting.


What about port scanning your service before you've secured it during development? At some point we have to be able to trust the network we're on. It's ludicrous to expect everything to be configured correctly and securely right from the start especially if you're developing the thing being scanned while it's being scanned. I'd much rather websites not be able to scan my home or office network than have to treat that network like I treat the Internet.

Yes, it's very similar to CORS. They just need to block all localhost requests from non-localhost pages. Maybe carve out an exception for when the dev tools are open.

Agreed, or at least disabling localhost and 192.168.0.1 and whatever that is in IPV6

You will usually have a public address with IPv6

The company may be interested in whether they want to grant access to the user to access to their systems. Does the user shoulder any responsibility?

No, absolutely not.

It shouldn't matter what malware is on a client device as long as the client has authenticated; the server/company/ebay should be protecting their API from abuse at the API layer, not the client layer.


I think what you’re saying is the user might be an employee on some internal trusted company network. The employer should have control of that browser (and entire endpoint), otherwise the network should likely not be considered trusted. So, in this case, no, the user shouldn’t have the ability to authorize this; the administrator of that browser should.

Know your network.


Consent.

> Port Scanning is Malicious

Though port scanning can be (and maybe even frequently is) done with malicious intent by looking for misconfigured/bugged servers, I disagree that it's inherently malicious. Port scanning is just about checking to see what services a host is offering you. It's like going to a random shop at a mall and asking what services they provide. Would asking about their services be malicious?

It feels like the reason asking about services is considered malicious is because shops frequently give out info to the public that they shouldn't have. It's like:

client: What services do you provide?

shop owner: Well, I can provide you with a list of all my clients along with their personal information they entrusted to me.

So, is the client being malicious for asking or is the shop owner the one that was in the wrong for mistakenly providing that info to the public?

I feel the only reason we don't blame the shop owner is because even though he's the one that mistakenly discloses private info, sometimes he's just following a script written by a random programmer unassociated with him. Maybe the response was a mistake on the programmers part, maybe it was a mistake in how the shop owner used the script (a configuration error). In the end, it's simpler to blame the client for asking out-of-the-box questions (after all, most clients just come in to ask if you're giving out flyers/pamphlets because that's what everybody does) and so they don't feel responsible for the response that results.

I can provide a shop that also offers things different than http(s) with open access to the public. It shouldn't be a crime/violation to ask me if I offer them.


I think the dynamics of the Internet have shifted from the early days. Basically, HTTPS on port 443 is pretty much the only service that anyone intends to make publicly available. This is different from 30 years ago, when those same sites had HTTP, FTP, Gopher, a public Telnet server, a public NTP server, etc. and they wanted you to use them. It was very reasonable to look around back then, but nowadays anything that is available publicly is probably an accident.

Exactly! And do we want to continue on that trend? Personally, I don't.

I dislike the growing idea that HTTP is a core part of the internet, and not just the most popular part. The difference lies in if we're going to see legislation that dictates proper use of the lower networking layers like TCP/IP by stuff of the upper layers like HTTP. I'd really hate to see something along the lines of "it's illegal to use a TCP port unless it was specified as available to the public in some (possibly js-rendered) part of an HTTP response."


I don't think it's worth getting caught up on which data framing protocol everyone is using. Everything that Gopher, IRC, FTP, etc. did are perfectly expressible as any other RPC protocol; these things were just RPCs before we invented the term RPC. Now we have protocols that can generically transport any RPC, and so we don't need to think about these things in terms of port numbers or running services.

True, as if the browser is the only tool to access the Internet. Today with the much bigger security awareness it would be thinkable to allow file sharing over Internet or to fail-over to the neighbour's Internet uplink when the own DSL provider has a problem. All these things become increasingly difficult. (Actually Bruce Schneier was once writing on his blog that he has an open Wifi at home)

I don't think port scanning and computer intrusions are comparable. As always, I believe, in both state (like CA 502) and federal law (like CFAA), state of mind is what matters. You have to intend to gain unauthorized access (or, in California, the resources of that computer). A port scan by itself can't do that; on the flip side, randomly accessing URLs can do that, so even though you don't need special "malicious" tooling to hit a URL, you can charged with a felony for (say) dumping lots of private information from a URL you simply type into your browser bar.

Even in California, the resources that you can access and consume from a port scan of a browser visiting your site are essentially the same as you'd get from running Javascript on your page. A legal claim based on those scans seems very far-fetched.

Message board nerds seem totally convinced of the idea that computer crime law tracks the state of the art in offensive computer security, but the two concepts aren't directly connected at all.

I speak both for myself and, I think, for a lot of security researchers both academic and professional when I say that I am very, very nervous poking at a website that hasn't given me permission to, say, check if an input that generated a crazy error is, say, letting me inject SQL, while at the same time I am never scared about port scanning things. There are companies, well-respected companies, that do nothing but port scan everything on the whole Internet.


I remember when finger (and even rsh!) were common.

I think it's a bit more like going on to a shop and trying to open all the doors, cupboards and drawers to see which ones are locked ;)

Isn't that the wrong analogy?

In this case, eBay is the shop, and I'm the customer. It's like walking into eBay and when I walk in I have to empty out all of my pockets and open my phone screen to show them that no one is telling me what to shop for (VNC).


No, because of the existence of client-side scripting with javascript, it's actually eBay that's running on your computer acting as the customer toward the shop that's your computer. You're right that the end effect is similar to having to empty out your pockets, but the underlying issue of why they're able to do that is a whole 'nother can of worms.

That's a bad analogy. It wrong because you can see what doors, cupboards and drawers are available for the public. Doors that are in-reach but that shouldn't be used by the public have signs like "restricted access" or "employees only". You can't do that with the internet. You can't see that a port is not available to you until you try it.

If you want to continue using that analogy, then you have to consider that everybody is blind and deaf, and checking to see what's locked is the only way to know if something is available.


> That's a bad analogy. It wrong because you can see what doors, cupboards and drawers are available for the public. Doors that are in-reach but that shouldn't be used by the public have signs like "restricted access" or "employees only". You can't do that with the internet. You can't see that a port is not available to you until you try it.

But you can see what ports/doors are available. TCP doors are defined in the RFC and they are numbered 0-65535. Those are the ones available.

Port scanning still is analogous to trying all these doors and see which one are open.

Just because it is a lot of doors to choose from doesn't make it very different. That's why guests ask a host where the bathroom is.

When you visit a website, it's not very cool for that site to check which of all your TCP ports are open. It's none of their business.


Hmm, then how about going to the changing room area and trying every door instead of waiting for the guy to tell you which one to go to?

I made this edit to the post you replied to. You probably missed it:

> If you want to continue using that analogy, then you have to consider that everybody is blind and deaf, and checking to see what's locked is the only way to know if something is available.

About this:

> instead of waiting for the guy to tell you which one to go to?

How does that translate to TCP/IP? What is "the guy" representing? The way I see it, there is no guy.


The guy is you installing Steam to run on port 27036.

Port scanning is a brute force, over-reaching probing technique. A better analogy would be like visiting a shopping mall and trying to open every closed door you see, including the ones that say "authorized personnel only", "private", "do not enter" with an excuse like "I was trying to find out which shop was open".

> including the ones that say "authorized personnel only", "private", "do not enter" with an excuse like "I was trying to find out which shop was open".

I don't think this part of the analogy is accurate. There are no "authorized personnel only" ports

The first half of the analogy is good though.


Any port that isn’t advertised to you explicitly is an overreach. You don’t run through hosts and ports to “find out services to use”. What’s a legitimate use case for that other than peeping?

I used to think the exact same thing about wardriving.

Nobody should catalog wifi access points and their location!

Of course, now this ethical lapse is a business model and apple, google and everyone else does it. literally anyone with a smartphone is doing this to your wifi access point. And they will do the reverse to find out precise location.


Exactly. There are assumptions in every threat model, and violating them isn't a legitimate use case because someone "forgot to protect" their private resource. "Door wasn't locked" isn't an excuse.

Or in the case of being a Comcast customer, you get no say in the matter as they force an open guest network if you use their equipment.

I answered an almost identical reply, here[1].

> including the ones that say "authorized personnel only", "private", "do not enter"

Basically, to answer you separately, an analogy like that doesn't represent TCP accurately. In your analogy, you can

1) see from afar for visual cues indicating whether access is being given to you, and

2) try opening it.

Your argument is that doing 2 is invasive, because they can do 1.

However, in TCP, you can only try to make the connection. There is no see from afar. If I give you an IP address, there's no standard way for you to tell me whether FTP is available, without trying to connect to the port! That's your only choice!

So, yes, "I was trying to find out which [service is available]" is a very valid reason.

> Port scanning is a brute force, over-reaching probing technique.

It certainly is brute-force, and that sucks. I think there's a network service / protocol called Portmapper/rpcbind[2] that lists the services available and port numbers they're on. I only know that NFS uses it, but nothing else. If that were standard, then I'd agree port scanning is over-reaching. As it stands, though, I don't consider it over-reaching when it's the only TCP mechanism to see what's available online.

[1] https://news.ycombinator.com/item?id=23249246

[2] https://en.wikipedia.org/wiki/Portmap


Nope. Ports that you can use are already advertised to you: web links, MX records, registrar records, WSDLs etc. Enumerating all ports that aren’t advertised to you is an overreach.

In a store you can read the sign. Where is the sign saying that the port is not for you?

But when I'm visiting a website, I am not a "store". These ports are on localhost, that's more like asking where is the sign saying that my bedroom door is not for you to try and open. However, I will tell you behind which door the bathroom is.

Any port that is not advertised to you isn’t okay to probe.

A server most definitely should not be looking at what random services a client has available.

Sure, like I said:

> Though port scanning can be (and maybe even frequently is) done with malicious intent

I agree that it's wrong for eBay to be doing this. What I disagree with is specifically the statement "Port Scanning is Malicious".


Surely in the context of a website performing a port scan on a client it is always malicious?

Unless it asks for explicit consent for a security audit or something.


FWIW port scanning is illegal in some countries as France.

This raises the question: Is port scanning without consent a violation of the CFAA? Either it is legal, and researchers should face no repercussions for doing so, or it isn't and eBay is non-compliant with CFAA. I recall hearing about someone either being arrested or convicted due to port scanning a courthouse, but it was many years ago and I can't find the case with a cursory Google search.

I have to wonder what value eBay would get from port scanning its customers. Is it part of an attempt to detect bots/attackers? Is malware running on their server trying to determine if the client is likely vulnerable to some propagation method?


When did networking support we often had old code + lower quality equipment that could / would crash if you used off the shelf security software that would go out and scan and then try all sorts of things and then generate a report.

I'd say 90% of the time the powers that be at the company had no idea someone was running that software, or that it was still running at their company, and then someone moved a firewall and the system was exposed to more than intended. Then they'd turn it of ... and find another similar tool running somewhere else.

It could be a simple as a test or security system run amok.


> I'd say 90% of the time the powers that be at the company had no idea someone was running that software, or that it was still running at their company, and then someone moved a firewall and the system was exposed to more than intended. Then they'd turn it of ... and find another similar tool running somewhere else.

This demonstrates the absurdity of the CFAA more than anything else. Sorry for sounding like a broken record but the CFAA is not salvageable and MUST be repealed.


This guy got arrested at least:

https://www.securityfocus.com/news/126


Article also states civil claims were dismissed - and criminal charges are unlikely to hold.

I hope they weren't debilitated by the legal fees incurred.

nmap's "Legal Issues" section states that the guy went on to start a successful digital forensics company, after spending years crushed under 6-figure legal fees.

https://nmap.org/book/legal-issues.html

Edit: Link to the company http://www.forensicstrategy.com/ He also has a data recovery company now http://www.myharddrivedied.com/


It's probably part of their fraud detection and mitigation strategy. Combined with other info about your transactions it could help raise a flag about changes.

As for the CFAA that's seemingly down to how aggressive the prosecutor is feeling about your case. I don't think it should be there's no real access happening and unless it's extremely aggressive and degrades network connectivity it's hard to argue there's any real damage done.


They're almost certainly doing it as part of a heuristic to detect bots. Hence the VNC / RDP ports. I would assume it's quite common for bots to have those ports open so they can be monitored

> Either it is legal, and researchers should face no repercussions for doing so, or it isn't and eBay is non-compliant with CFAA.

Criminal law is usually not this simple, as most criminal laws will take into account the mental state of the person performing the action.


Almost certainly not. Commercial unauthorized port scans are utterly routine. There are well-known companies premised on it.

You can get to the same answer axiomatically from the text and case history of CFAA (a port scan literally can't grant you the access a CFAA claim needs to prove you intended), but that's obviously treacherous for non-experts to do; instead, the empirical demonstration should be conclusive here.

I don't know why this scan is occurring, but fingerprinting is the most obvious guess, and intrusive fingerprinting performed by real companies is usually about ATO prevention, which means they're not going to tell you any more about it (ATO defense is an arms race).


> I have to wonder what value eBay would get from port scanning its customers.

From the article:

> Looking at the list of ports they are scanning, they are looking for VNC services being run on the host, which is the same thing that was reported for bank sites.

> VNC is sometimes run as part of bot nets or viruses as a way to remotely log into a users computer. There are several malware services that leverage VNC for these purposes.


Bypassing a firewall to run a port scan is almost certainly illegal.

That’s what these sites are doing.


I agree with the sentiment, but by visiting the site and running the code, I believe you bypassed the firewall on your own.

Why don't you run this executable. No, this ransomware message has nothing to do with me, it's all you. Why do you do this to yourself?


No, I think it was probably close to a decade ago, but I likely am misremembering some of the details. Could've been a police department, but I'm not sure.

That one you linked is a messed up case. There is a phenomenal podcast that interviews those guys and walks through their engagement. https://darknetdiaries.com/episode/59/


That wasn't port scanning. They actually physically went inside the building.

My guess is bot detection + user fingerprinting.

[flagged]


"Innocent until proven guilty" suggests that everything is legal unless there is a law against it.

Not quite the same thing. A legal system could use presumed guilt (defaulting to assuming an accusation is true) while still having a 'blacklist' approach to which actions are punishable.

https://en.wikipedia.org/wiki/Everything_which_is_not_forbid...


Do you know how many laws there are? Not to mention common law, which is law established by previous court decisions on matters that have never been covered by any statute?

It doesn't make any sense in trying innocent people!

(No, seriously, I know people who believe that.)


Over the years I've seen "hacker" news become more of an echo chamber and instantly downvote anything against doctrine...

I'll be downvoted for saying this.


> I'll be downvoted for saying this.

Because it's against the site guidelines.

> Please don't comment about the voting on comments. It never does any good, and it makes boring reading.[1]

[1] https://news.ycombinator.com/newsguidelines.html


Yep

Not illegal. Sites like shodan.io would have an issue if it was.

IANAL but this type of websocket port scan seems inherently different from what Shodan does.

Shodan is outside your network's firewall, therefore only able to access services you've exposed to the wider web.

If I understand the article, the websocket scan eBay is doing is trying to connect to local listeners on your laptop, behind your network's firewall and possibly even behind your laptop's firewall.


This is such an obvious consequence of web sockets that I wonder how anyone could have entertained the idea long enough to sober up and write the code. This is worse than letting a web page script have access to the clipboard, record mouse movements, and similar information leaks, because instead of just stealing information, now a web page can actively compromise any host on your network.

I agree this is quite disturbing.

It does not, however, sound like an attacker can establish arbitrary TCP connections (at least using the technique from the article). Instead, the attacker can determine if something is listening on a port because it will take a different amount of time to negotiate/drop a connection to a port when there is a listener than when there is not a listener.

In other words, this sounds like a variant of a timing attack. As such, presumably, this particular avenue of attack can be mitigated by the browser vendor inserting a delay s.t. no information can be gleaned from how long it takes to negotiate/drop a websocket connection.

EDIT: I also wonder if it would be possible to do a similar port scan using the timing of XHR requests to localhost (e.g. http://localhost:[port]).


> It does not, however, sound like an attacker can establish arbitrary TCP connections

Maybe not, but what if the ports you have open actually are HTTP servers for development purposes? In that case wouldn't a website be able to crawl your unreleased work, and/or mess with what you're doing, with requests seemingly "out of nowhere"?


Yep. Just waiting for this "feature" to be added to metasploit.

That's a fallacious argument. The fact that someone is doing something doesn't mean it's automatically legal.

IANAL, but more likely it depends on intent and context. So shodan.io is okay because it’s not explicitly malicious, and they have clear paths to contact them if you suspect abuse. Whereas, if you’re suspected of hacking a website, the fact that you port scanned it a week prior to password spraying it might serve as evidence against you. That is, it seems unlikely anyone would be prosecuted for port scanning alone, but it could be an act that demonstrates intent of a later action.

One time, I port scanned my public IP (of my ISP) from an EC2 box, and I got an email from EC2 saying they received an abuse complaint from the ISP for port scanning activity.


What's Shodan.io's legitimate use? Sounds like the "torrents can be used for legitimate content" type argument where in reality you a rounding error the use is not lawful??

There are plenty of legitimate uses of port scanning, and specifically, a port scanning database like Shodan. For example:

- Monitoring your own network or that of your clients for exposed ports

- Researching Internet topology, or performing aggregate queries like “how many nginx servers are connected to the Internet”

Can you use it maliciously? Yes. But, most of the time, if you have a target it would make more sense to do the port scan yourself. And if you’re just dragnet searching for vulnerabilities, most you find will probably already have been exploited. Sites like shodan are good for the overall health of the web because they force website owners to maintain security posture. If you know that foregoing a wordpress upgrade means you’re one script kiddy with a shodan account away from getting hacked, you’re going to keep your site up to date. This saves you from script kiddies, but also from the more sophisticated hackers who would run a port scan themselves anyway.


>There are plenty of legitimate uses of port scanning, and specifically, a port scanning database like Shodan. //

Any legitimate security service is going to be doing there own scans, surely.

Statistics, yes, but I can't see those stats being especially good. You could probably get equally good nGinx data from netcraft, who IIUC get the data from http responses banners on :80 :443.

I'm not sure I buy the "security posture" line, isn't it circular. Tools to help crack your site are good because it means to have to have counter-measures to combat tools for cracking your site?

Only legitimate use of port scanning for me has been testing access to my own/clients computers, I feel. That's not too say I've not used it for illegitimate things ...


If I were a serious baddie, I'd be afraid of using Shodan. Who knows who has what logging on that, and what honeypots may have been seeded into it for just such an occasion? It's not that hard to get that information yourself, from sources you control yourself.

Legitimate usage from researchers and people reading about infrastructure they have the right to do security testing on may be a larger percentage than you think.


Shodan is used by most of the Fortune 100 companies for a variety of use cases. Here are the most common ones:

1. External network monitoring: know what you have connected to the Internet and get notified if anything changes unexpectedly. This has actually gotten significantly more challenging with services deployed to the cloud where your IT department might not even know which IPs to keep track of.

2. 3rd-party risk assessment: understand the security exposure of your partners, vendors, supply chain or other 3rd-parties. For example, lets say you're an insurance company that wants to provide cyber insurance. Shodan data can help you understand what sort of risk you'd be taking on. The data has also been used in M&A as part of due diligence to get a metric on the security of the IT department of the company they're thinking of acquiring.

3. Market intelligence: basically Netcraft on steroids. Shodan doesn't just have web information but also for many other protocols. This information is used by hedge funds and vendors to understand which products are purchased and deployed to the Internet. The data is skewed due to the nature of public IPs but there are still things you can do.

4. Policy impact: get a measure for how policies at the country-level are impacting Internet connectivity. For example, the OECD used Shodan to get a measure of Internet-connectivity per capita.

5. Fraud detection: is your customer trying to make a purchase from a machine that's been compromised? Or running a a VPN/ proxy? Shodan is used in transactional fraud detection to flag suspicious payments.


I used to use torrents a lot and always for legitimate data transfers.

Yes, I've used it to download Linux distros, but the point still stands.

Not really, it totally contradicts the made-up point.

Can you explain.

What percentage of torrent traffic do you suppose - or better have stats for - is not copyright infringing? I'd think it's about 0%.

Would certainly be interested if you can prove that's wrong.


The more pointed argument would be there is no federal law prohibiting port scans.

Doesn't the Curl/For-loop Abuse Act (CFAA) cover it?

> Furthermore, when I installed and ran a VNC server, I didn't detect any difference in site behavior - so why is it looking for it?

Not an eBay employee, but used to work in fraud detection. Two very obvious related guesses from my experience:

1. Fingerprinting a user to help identify account takeover (ATO). Open port signatures is probably a pretty good signal for that kind of thing (and it doesn't seem to be measured in https://panopticlick.eff.org/).

> However it is also a valid tool used by administrators for remote access to machines, or by some end user support software, so the presence of VNC is a poor indicator of malware.

2. In a Bayesian sense, this probably isn't right. I don't know what eBay's traffic looks like but I'm willing to bet that all other things being equal, traffic coming from a machine with an open VNC port is riskier. Fraud detection is a game of probabilities, so the existence of a valid user showing a particular characteristic doesn't mean that the characteristic isn't useful in a fraud model. The example I always give is that when I was doing this (quite some time ago), we could have had a 99% accuracy rate for a simple rule banning IPs from Turkey, Ghana, Nigeria, and Vietnam. It's not because there weren't any valid users from those countries, it's just that the fraudsters where overwhelmingly likely to be using IPs from those countries.


Those four are still considered untrustworthy, and I've had to add India, Ukraine, and Brazil to the list of nations I filter entirely.

panopticlick is specifically about browser fingerprints. It doesn't include your IP address for example.

Can you say what were the final false positive rates? Was this part of your research?

Port scanning from a web page, combined with DNS rebinding, can present a really nasty attack, and can effect an entire private network, not just localhost.

Some more info here: https://medium.com/@brannondorsey/attacking-private-networks...

Example code: https://github.com/brannondorsey/dns-rebind-toolkit

A malicious DNS rebind server: https://github.com/brannondorsey/whonow

Disclaimer: I performed some of this research a few years ago. So those resource suggestions are my own, but they feel very relevant here.


First of all, fraud detection seems like a legitimate use case here. And WebSockets has many valid uses.

HOWEVER -- how the hell is localhost port scanning allowed to happen without my permission?!

This feels no different from a website trying to check the existence of named directories on my file system or something.

Does WebSockets not require permission to function at all, or shouldn't it be limited to some kind of CORS-type policy or something to connect without a permissions dialog? Or even if it's allowed to port scan the entire public internet, at least block your local machine and network without explicit permission?


If you find a way to prevent this in Chrome/Edge please let me know.

Edit: https://defuse.ca/in-browser-port-scanning.htm

There doesn't seem to be a way to access anything locally, just test for open ports. I use SSH tunneling a lot and was having a minor freak out.


This use doesn't seem to be covered by eBay's privacy policy https://www.ebay.com/help/policies/member-behaviour-policies...

Lots of chat in the comments about how this is all websockets' fault, but don't forget you can portscan localhost with pure JS as well.

https://portswigger.net/research/exposing-intranets-with-rel...


Ach! That's diabolical.

Timing attacks make it very hard to prevent port/host probing generally, sadly, with the sheer number of things that are observably loaded cross-origin (iframes in that example, but also images, scripts, stylesheets…).

(In the private/loopback IP ranges we should really just make those requests always fail, but I addressed that in another comment as to why that's not trivial.)


Private and loopback space should really be outside the sandbox, or at least in a permission. I'm happy with mycorp.net accessing 10/8 space, but not ebay.

Every time I hear about some shiny new feature being added to a browser, I think...

1) Will I ever actually use this

2) How is this gonna screw me over

WebSockets, WebBluetooth, WebAssembly, Web-You-Can-Access-my-Accelerometer-and-Battery, haven't ever wanted to use those. Ever. For anything. For any reason. (Edit 3: Oh yeah, I forgot! WebRTC!)

Edit: Fantastic. You can't disable it in Firefox. So what, does Firefox need a freaking iptables implementation now? [1]

1 - https://bugzilla.mozilla.org/show_bug.cgi?id=1091016

"The only theoretical reason for the WebSocket pref these days is the possibility to disable it easily in case there is a security issue found in the protocol itself or so."

The protocol itself is the security issue. ALL OF IT.

Edit 2: So I don't have the time to investigate every new fad when it comes out. I originally thought WebSockets were raw sockets, but they aren't. Firefox blocks access to port 22 -- I was hoping all privileged ports, but it seems just those. Opening a WebSocket to netcat dumps out a HTTP request, so it seems unlikely that you'd be able to talk with anything that doesn't talk HTTP and WebSockets. Firefox also seemingly blocks access to 192.168/24 and 10/8.

This makes me less angry. But what STILL make me angry is that I have to sit and research about some stupid thing that I don't want and can't turn off. Sooner or later, some web dev is gonna argue that all sites should be loaded over WebSockets because his bloated javascript stack performs marginally better, and then WebSockets won't be something I can turn off. Websites will just whitepage.

Edit 4: Done researching this now. I went to ebay on Firefox, and wasn't getting websocket scans. But I've got a stack of uBlock and NoScript... maybe that's interfering with it some how? Opened up a stock config for google-chrome -- that's my browser for "some dumb new web tech that isn't working in Firefox" -- not seeing any scans when I open up inspector and click "WS".

Regardless, his point still stands. You can totally use WebSockets as a port scanner for localhost, assuming the Content Security Policy allows it. Now I gotta go update my nginx configs...


> WebSockets ... haven’t ever wanted to use those. Ever. For anything. For any reason.

You’ve never used a web-app chat client?

> WebBluetooth

APIs like these don’t exist for the sake of regular unprivileged web-apps. They exist for the sake of browser extensions (or browser “apps”, or apps within a browser-projector like Electron), specifically in order to be used to add driver-like or service-like capabilities to devices like Chromebooks where the browser is the OS.


> APIs like these don’t exist for the sake of regular unprivileged web-apps. They exist for the sake of browser extensions (or browser “apps”, or apps within a browser-projector like Electron), specifically in order to be used to add driver-like or service-like capabilities to devices like Chromebooks where the browser is the OS.

That's not really true, though: they're part of the Chrome team's belief in not limiting the capabilities of the web as a platform for app development (on the basis of "if you lack one feature the app needs, the entire app ends up native"). This is a large part of Project Fugu: https://developers.google.com/web/updates/capabilities


>You’ve never used a web-app chat client?

BOSH? Awkward, but it works without websockets.

* https://en.wikipedia.org/wiki/BOSH_(protocol)


BOSH is essentially long polling which is pretty difficult to scale (the worst case scenario can become 1 connection per message for a single client).

I'm pretty surprised however, that a nearly 10 year old standard is being considered as "superflous" as newer technology like WebBluetooth and WebUSB. What we had before Websockets wasn't really long polling, it was Flash.


Before websockets webapp chat features would use long polling. Not sure if all current ones require websockets or if they have a fallback mechanism.

Before websockets, webapp chat features—as implemented by your average web-backend programmer—couldn’t pass the C10K challenge.

Yes, fundamentally, on a technical level, there’s not much difference between holding open an HTTP connection in long-polling, vs. holding open a websocket connection.

But the abstractions presented by webserver gateway interfaces (e.g. CGI, prefork process-per-connection language-module embeds), languages/web frameworks (e.g. Ruby on Rails), and platforms (e.g. Heroku) back then, just didn’t support long polling efficiently.

HTTP backends, back then, were all built on an assumption of serialized queuing of HTTP requests—with each web server/web stack/worker thread serving requests one-at-a-time, getting each request out of the way as quickly as possible. There was no concept of IO asynchrony. Web servers were request loops, not event reactors. Libuv didn’t exist yet; nginx didn’t exist yet. The standard web server was Apache, and Apache couldn’t “yield” from a request that was idle.

And, as such, providers like Heroku would queue at the load-balancer, and only proxy a single open request to your web backend at a time, under the presumption that it wouldn’t be able to handle concurrent load. So you had to pay for 2x the CPUs (e.g. Heroku “dynos”) if you wanted to be able to hold 2x the connections open!

Entire third-party businesses (e.g. https://pusher.com/) were built around the hosting of custom web servers that were written in an event-driven architecture, and so were able to host these pools of long-polling connections. But they were freakin’ expensive, because even they didn’t scale very well.

Eventually, it was realized that the just-introduced Node.js could do IO asynchrony pretty well, and people started building explicit “websocket servers” in Node, culminating in the https://socket.io/ library. Back then, you couldn’t just put your regular HTTP load-balancer in front of your websocket backend, because your regular HTTP load-balancer almost certainly didn’t support held-open connections. You needed to host socket.io on a separate host/port, directly open to the internet. (This was one of the driving forces of Node’s adoption: as long as you were putting a Node app directly on the Internet, you may as well just put the rest of your HTTP app in there as well, and make the “websocket backend” into your whole backend.)

Sure, these days, every backend, load-balancer, and NAT middlebox can handle long-polling just fine. But we got there with a decade of struggle, and “legacy”-but-not-really code that used WebSockets because they guaranteed the semantics that long-polling couldn’t.

(I should mention, though, that WebSockets still have some advantages in the modern environment; namely, idle WebSockets are known to be idle at the runtime level, and so, unlike with a long-polling HTTP request, a mobile device can relax its wakeup-timer intervals when the only network connections it’s holding open are idle WebSockets.)


You’ve never used a web-app chat client?

Nope. Not once. And I've been using the web since Mosaic.

I see business web sites offering to chat with me all the time. I ignore them. If I want to chat, I'll let you know.

Apple's business-to-Messages thing works so well, I hope it puts the scammy webchat companies out of business.


I didn't mean the "chat with us now" engagements widgets; I meant, like, Google Hangouts, or Slack, or Twitch chat, or even an pre-Google-Docs Etherpad sidebar chat.

Though, honestly, I prefer the web-chat customer service for my bank/cellphone provider/etc. to calling them on the phone. I don't want to wait an hour on hold with my phone using up both battery and minutes; I want to just leave a window open on my computer and have it ding when they're ready.


> You’ve never used a web-app chat client?

A decade ago people use comet for this, with php and apache! Every single ongoing comet connection occupy an apache process, a significant resource hog, yet people still use it because they have no other choice. These days we have websocket but I bet comet can be implemented with minimal resource penalty these days thanks to proliferation of async webserver support in modern backend stacks.


You forgot WebUSB – I wish I was joking, but I'm not:

https://developer.mozilla.org/en-US/docs/Web/API/USB


I see WebUSB and I immediately think: "This is something that already exists in ChromeOS and Google wants to standardize it".

Chrome the browser is a stalking horse/test harness for ChromeOS.


I see irony in all this web functionality.

Back in the 90's if you wanted an ohms law calculator you had to go download a poorly written program from some random website. Network admins started locking down what you could download, run, and install due to security problems. Flash became a hit and they started piling on features in the browser so you could run things dynamically without having to download something.

Fast forward almost 30 years and the browser has become so full featured it is practically a weak OS sandbox that allows you to run just about anything. It was originally being extended to avoid that in the first place, and here we are almost back to square one.


The browser is basically the reinvention of the operating system. Its huge advantage is that it's built on the assumption that the user is trusted and the code isn't. In contrast most operating systems are designed on the assumption that code is absolutely trusted, but the user isn't. That's why rights management in Windows is concerned with who's allowed to access which file, while rights management in Firefox is concerned which which website is allowed to access the Webcam.

The big disadvantage of the browser is of course that there's huge competitive pressure, and most users prefer usability over security. Keeping things secure without asking the user about their intentions every step is a huge challenge (see also Windows UAC, which struggles with the same problem).


> Its huge advantage is that it's built on the assumption that the user is trusted and the code isn't. In contrast most operating systems are designed on the assumption that code is absolutely trusted, but the user isn't.

It's not necessarily an advantage, it's just a different threat model. An OS is protecting against an attacker already having access to the system (whether physically or over network.) The assumption is that the system is working properly and it's the operator that is malicious.

For the browser, the assumption is that the operator is working properly, but the systems they will be accessing are malicious.

The browser security measures are like a guard at the castle gate, allowing or preventing people from entering. The OS security is like locks on the doors inside the castle so that only people with the right keys can get into various protected rooms.

Both are necessary because they're preventing different things (access to the system vs. access once you're already inside the system.)


yes, because for most user choosing between security and functionality is not a choice they are prepared to reason about.

I'm no expert, but I am computer savvy. My experience is that more security ALWAYS limits functionality that I want. I get around it all with VMs and snapshots. I have totally wide open VMs that have no host access and just reset to the snapshot when I shut them down. the irony is that it is not only harder to choose functionality today, when you want it, but also harder to choose security. The choices are made for you. it is infuriating.


dont these features allow web apps to compete with native mobile apps? I really don't like the app or play store... so if that is one way to get away from them, that is a good thing.

...we are screwed.

There is media.peerconnection.enabled in about:config. When set to false, WebRTC doesn't work but I'm not sure if there isn't anything left active.

Also uBlock has an option "Prevent WebRTC from leaking IP adresses".

WebRTC should be disabled by default or firefox should ask explicitly like with webcam-access.

There have already been reports where sites use your browser as a peer in a P2P-network (without your consent). This can be really problematic depending on where you live.


Browsers no longer leak local ip addresses, there's a new feature that uses mDNS instead of local ips.

> Prevent WebRTC from leaking IP adresses

A local IP?


FYI about these two websites that demonstrate the various data your browser shares:

https://browserleaks.com/

https://webkay.robinlinus.com/


Yes, oddly enough. It can be used by a website you visit to gain information about your local network which turns out to be incredibly effective for fingerprinting.

WebRTC can also leak your IP when you're hiding behind a VPN.

Any way to stop that?

The best solution is running the VPN client in your router. That way, your machine can't see its ISP-assigned public IP address.

If you're using a workspace VM, you can run the VPN client in a pfSense VM, which is functionally equivalent.

If you're not using a workspace VM, and can't run the VPN client in a router, you can run the VPN client in a pfSense VM. You bridge the WAN interface of the pfSense VM to the host LAN adapter. So then the host can't use it. And you configure the LAN interface of the pfSense VM as host-only. So now the host machine (your workspace) can reach the Internet only through the VPN client in the pfSense VM.

Or you can just make sure that WebRTC is disabled.


Websockets are nice for some things. I hack on Mastodon and it uses WSS for streams and they're very helpful.

But WebBluetooth, ASM, etc are all fairly insane. WebRTC feels like a massive security issue (I've seen a demo of someone using WebRTC to find computers on an internal network at a security conference years ago. Even if that hole is fixed, it's still a hacky solution to video streaming behind NAT).

I agree; most of this stuff needs to have ways to disable it, in the base configuration screen of the browser (not hidden somewhere in about:config).


Why is ASM insane? Are you talking about WASM? That’s got the same security model as JavaScript.

WASM is great peace of tech but I can't help to think it would be abused a lot in the future. For example, right now we can use ad blocker to block ads and analytics by blocking its js from loading. Imagine when wasm gains mainstream popularity and ad companies begin to ship their ads and analytics product as a libaries to be linked at compile time. How do we block something like that? Sure the adblocker can hide the relevant dom contents, but the code is still run and doing whatever it want on your browser.

These kind of complaints are based on a misunderstanding of how JS works or how the browser works.

You can do the same exact thing in JS right now. In fact, if anything, JavaScript makes this way easier than WASM. With JS, you can just use something like Rollup or Webpack to put your analytics code in the same code.


Yeah but using webpack is not how majority of websites deployed, so it's probably not worth the effort for ads companies to support it. They will consider this when webpack/wasm become mainstream enough (approaching 50% web), which may or may not happen. Probably won't happened but the thought always linger in my mind.

WASM is way less popular than Webpack.

Almost every site with some kind of front-end framework like React will use a bundler of some kind.


It's easy to first party ad code. I would guess that as a percentage of website usage Webpack, or at least code that uses custom bundling, is over 50%. JS knows all.

Turing strikes again…

A stronger one, actually, since it can't access a bunch of stuff.

Which is where the trouble started.

Anyone who has enough technical knowledge to have a reason to turn these off should be just fine with about:config. We don't want a situation where normal users just randomly go into the config turning things off because they think they are doing something good. That's how we get another group of low-tech "turn off Windows Update" types who just harm themselves as a result of their incompetence.

I think people turned those updates off for good reason... but it wasn’t the users that were incompetent.

Anyone who has enough technical knowledge to have a reason to turn these on should be fine with about:config. The default settings should be secure.

> Even if that hole is fixed ...

Last I checked (> 1 year ago) it was WONTFIX because of some very idiotic (IMO) reasoning. I keep it permanently disabled and have never missed it (media.peerconnection.enabled in Firefox btw).


No doubt like canvas, font, audio, etc. fingerprinting, it will eventually be required to use Google apps like Maps and Messages for Web (to cite two recent discoveries).

"If you're not paying, you're the product" has its own form of economic inflation.


> I hack on Mastodon and it uses WSS for streams and they're very helpful.

I'm not familiar with Mastodon or WSS. Can you describe how using WSS make the end user's experience better? What would be different if web sockets weren't used?


Mastodon is a federated/distributed social networking server that communicates with other servers via a protocol called ActivityPub. The interface feels sorta like Twitter, but it doesn't have to be.

There are other FOSS ActiviyPub servers such as Pleroma (written in Elixir), Pixelfed (Instagram type interface) and PeerTube (distributed video). ActivityPub is a protocol (like e-mail/SMTP) for subscribing and replying to posts. ActiviyPub is just used for the backend (how servers communicate with other servers; like SMTP sending e-mail or RSS readers polling RSS).

Mastodon uses websockets to stream posts to the client/web browser. People who make mobile apps and desktop clients also tend to use WSS. It does fall back to regular HTTP polling in case WSS fails. Pleroma and others also re-implement the mastodon API. It means you can have different front-end web clients (Pleroma-fe, soapbox-fe, etc) on top of different backends (Mastodon/Pleroma).

The advantage of WSS is being able to stream new statuses with a socket, rather than polling constantly for new updates.

If you get on a Mastodon/Pleroma/Pixelfed instance (there are hundreds out there or you can setup your own), you can follow me at @djsumdog@hitchhiker.social


Thank you for the explanation.

I built a simple aggregator to give people a preview of mastodon

https://mastodonia.club


WebAssembly is awesome! It's a substantial performance boost, and will allow the off-loading of standards to open-source communities so browser developers can focus on core browser features rather than having to stretch themselves.

Also js_of_ocmal does a great job, i.e. you can usually run a CLI application in browser with zero changes.

The subset of web technologies that seems reasonable to any given speaker often closely matches the subset of web technologies the speaker uses (either for their own code, or in apps they use).

Personally, I'm looking forward to the point that web apps are capable enough to let PWAs do absolutely anything an Android or iOS application could do.


I'm sorry, what's the alternative for (soft-)real-time applications on frontend if not WebSocket? You probably do want to use it.

Server Sent Events and HTTP? With a modern setup it’s going to be sharing an HTTP/2 pipe anyway. Even handles disconnections gracefully/transparently if you’re clever about it.

Can anyone expand on why this technique isn’t more common? I’m so sick of seeing folks reinvent HTTP (poorly) on top of WebSockets. I get if extreme low latency is (allegedly) a requirement.


>Can anyone expand on why this technique isn’t more common?

Hard answer: WS has been around for longer, and it's had more "marketing" for lack of a better word -- more people know about it and know how to use it. Retooling existing WS-using code to use HTTP/2 pipes would be a considerable effort with little to no perceived benefit to most users and teams.

Speculative answer: most web developers live closer to the application and presentation layers, and there's resistance to learning technologies that involve HTTP connection management e.g. in nginx (not to say that there are none of these, there are just fewer of us). WS was at the right place at the right time with a good high-level interface available to the client, and gained traction because of this.


From what I remember about SSE, no Microsoft browser supported them. It seems like they've finally added support to Edge this year though.


If you're using HTTP/2 this problem goes away.

>Can anyone expand on why this technique isn’t more common?

Because WebSockets is 10 years old and HTTP/2 is 5 years old, and that's not including support in major frameworks for SSE.


Web sockets, but redesigned to only connect to the host shown in the address bar, on port 443.

Doesn’t work if www.example.com is just an S3 bucket, with the actual website at api.example.com.

That sounds like the developer's problem, not the browser's problem.

I mean, I could half-ass my work a lot more often if they'd get rid of these burdensome restrictions on cross-origin requests. But they ain't going to.


Domains are never going to be just one origin. If we could force developers (or, more precisely, ops teams) to do this, then CORS would never have needed to exist in the first place, because there would be no need to allow any crossing of origins in the first place.

But, at every point in the web’s evolution (including today), there was always been something that needed to live on a different host or port for some reason or another—usually because it’s too leading-edge for load-balancers to understand how to route it correctly.

TCP load balancers that can handle long-lived flows with connection stickiness et al, are a very modern invention in the web’s history; and even they still stumble when it comes to e.g. WebRTC’s UDP packets, or QUIC.


This man right here understands things.

Native applications?

Not everything needs to run in a browser.


But that’s even worse! Native apps have even less (i.e. zero most of the time) sandboxing than the browser.

This wouldn’t be a meaningful security improvement for anyone.


You use a much smaller set of applications than web sites. Moreover, you usually vet your applications and do not run random stuff. Application developers build up trust over time.

Even if I want to use the web as hypertext + some Javascript for interactivity, every stupid web site can pull these shenanigans.


You use a much smaller set of applications than websites because websites exist.

You're imagining a world where the most popular dev environment goes away, and service providers decide to use HTML forms instead of forcing me to download an app every time I want to order a pizza. That world does not exist. The apps aren't going to go away, and your security model can't be, "people just won't install untrustworthy apps."

And put things in perspective here -- we're talking about a security vulnerability that allows port scanning primarily for fingerprinting purposes. A native app can not only port scan, it can literally just make POST requests to those open ports across separate domains. The security risks we're talking about are not even remotely equivocal.

Don't get me wrong, stuff like port-scanning should be fixed in web browsers. But even with these vulnerabilities, the web is still unquestionably the safest consumer-accessible application platform that we have today. Moving applications off of the web and back onto native platforms would be setting security back half a decade.

When someone comes to me and asks how they make their phone more secure and more private, the number one piece of advice I give them, every single time, is "avoid native apps and use websites instead. Don't install Facebook, use the website. Don't install random clicker games, browse them online instead."

The web has been a major asset in my quest to get friends and family not to install a bunch of random malware on their devices. Doubly so when you throw kids and younger users into the equation. I am eternally grateful that the web is advanced enough that people can join a Zoom meeting without installing Zoom on their computer.


A flashlight app asking for your contact list disagrees (don't forget about the FB SDK sitting quietly in the corner)

Maybe that’s true for you, but if the modern web apps didn’t exist, most average people would be downloading dozens or hundreds of miscellaneous apps. “Oh, I want to order a pizza on my computer, time to download the Domino’s app.” And then now you have Domino’s ads running as daemons on your system.

The alternative is to ask the user on a site basis.

That would be amazing, but it would never happen. Mom and Pop would always click "no" out of fear of the unknown, and my awesome feature wouldn't get used! It has to be enabled by default, they don't know what they're missing!

That's literally the opposite of the concern with permission prompts: all the evidence from years of SSL/TLS certificate errors is that users will blindly grant permission.

This is true -- and it may also be the same for miscellaneous permissions. But that doesn't mean it won't be used as an excuse by the feature developers.

Honestly though, this is just me being bitter at web developers.


That’s already how the browser microphone/camera API and notification API work, though. At this point, it’s the expectation/default for new APIs to act this way. The browser makers would just have to be convinced to align the behaviour of these slightly-older APIs to the best-practice UX paradigm for new APIs.

Surely everybody needs My Feature(TM)!

Right. Firefox is open source. Why is nobody adding prompts for Web Sockets, WebGL, Web Assembly and all the new stuff that a security / privacy concerned user has at least mixed feelings about. Prompts are not a good solution for a wider audience, but at least they satisfy the curiosity of power users and offer the possibility to leave the site if you get the feeling it's too dodgy.

Webrtc, plain requests, and side channels involving streams

Native applications.

Native applications have orders of magnitude more access to your system than a website has.

This isn't a loaded question, but how come people (especially the people here) don't know that?

the real question is: Why do people write websites to behave more and more like native apps? In order to prevent a client-side install?

For applications that want special access to my machine, there SHOULD be a barrier to entry or inconvenience like a client-side installation.


Web browsers really are the new operating systems.

Chrome OS was far ahead of its time, both in its general user hostility and in its excessive resource consumption for seemingly mundane tasks. Like when your roommate's YouTube streaming makes your Excel-like app horribly slow, because your new spreadsheet app needs high-speed internet for no obvious reason.

BTW, WebRTC and WebAssembly were great for crypto mining trojans embedded into ads. Before, they had to port their crypto miner to JavaScript by hand. By now, you can just cross-compile it. The progress of technology ... helps only the technology, but not you, the disposable user in front of the screen :p


> Before, they had to port their crypto miner to JavaScript by hand

No, cross-compilers to JS are and were thing too, and were used by cryptominers.


Yes, but before webrtc they were easy to block.

What do cryptominers need WebRTC for?

To connect to arbitrary IPs and ports to send home the results of the calculations.

Which can be done with HTTP as well.

Yes, but that is A LOT easier to filter and detect, because then it'll be using the XHR API and HTTP protocol, whereas WebRTC allows free-form protocols.

You can't cross compile anything other than crypto miners or how does it not help anything else?

Most of the other uses are pointless.

I don't need WebRTC for chat, because IRC is still working just fine.

I don't need WebRTC for video calls, because Skype is still working just fine.

I don't need WebRTC and WebAssembly for online gaming, because I have Steam to install games locally. Plus latency and performance of games emscripten-ed to wasm tends to be atrocious.

So the only uses where I have seen the new Web* APIs in my life was obnoxious ads and newspaper websites trying to shove recommendations and notifications into my face.

What would you want to cross-compile that couldn't be done much better by building a proper local app?


Plenty of downvotes, but no productive suggestion of what use-case would be best served with WebRTC ...

WebRTC enables many of these things on the browser. That is the point. So your "I don't need WebRTC because I use native apps" is basically just "I don't need the internet because I can go next door to talk to my neighbors".

Just because something doesn't have value for you does not suggest it does not have value, period.


Almost. I'd say my argument is "I don't care about an online egg retailer shipping from China, because using the local supermarket is better in every way."

And obviously, Skype and IRC use the internet, too. So what is the benefit to me - the user - of "Skype inside web browser" over "Skype separate of web browser"?

I understand that WebRTC makes "Skype inside web browser" possible, but it also makes a lot of less desirable behavior possible. So there is be a trade-off between the advantages and the risks from adding WebRTC.

Which benefit does WebRTC generate for the user? I.e. why is it a good thing that web browsers added this?


That's just a matter of browser software. The Web platform should not suffer just because the browser is imperfect. You can write your perfect browser, fairly easily with the libraries available. I will try when I have some time.

Strictly speaking, it is more like the shell of the distributed operating system.

Which is also funny because, if I remember correctly, IE was deeply integrated into Windows Explorer and the desktop was essentially rendered by the browser. I remember being able to even change my wallpaper to a custom HTML page in Windows

Hey, refID af64e30y, get back to looking at ads while your browser mines cryptocoins! We'll be watching!

> So what, does Firefox need a freaking iptables implementation now?

umatrix is the layer7 firewall you're looking for, it can block websocket connections, cross-domain ones in particular are quite easy.


I've had performance issues with umatrix. I tried blocking all JS by default and explicitly enabling scripts. In theory that should make the browser preform better. I suspect there are a bunch of sites that can't run a function or reach a JS resource and then just go into spin loops eating through resources .. either that or the blocking itself is resource intensive.

> I suspect there are a bunch of sites that can't run a function or reach a JS resource and then just go into spin loops eating through resources

Google Maps.


blocking has non-zero cost, yes. especially if complex whitelists need to be evaluated for every request. but at least for me it still is a net win most of the time.

But you can also operate umatrix in a blacklist-based approach, i.e. simply let most requests through except the categories you deem problematic (e.g. cross-site websockets in this case)


>So I don't have the time to investigate every new fad when it comes out

I'm finding this to be my main problem. I have a fairly solid computer repair/troubleshooting background but can't keep up with the layers of crap that are pasted on top of modern software.

An ongoing list of "features" being added to web browsers and how to get rid of them would be hugely helpful, but I've never found a centralized site for this topic, it's all just scattered around various tech sites.


Whatever happened to a control panel to turn off all of the things you Don't Want? All of these protocols should have triple toggle switches, Enabled, Ask, Disabled. If something on the page doesn't load because that protocol is disabled, it logs to the console so you can turn on what you need for that page.

I get that adding lots of user controls makes state management difficult, but there are tried and true ways to do this; users need the browser to work for them, not the other way around.


Firefox is open-source, so you could add that. I would really appreciate being able to turn WebUSB WebBluetooth and WebRTC off entirely as those expose a lot of devices that I need not to be exposed to the Internet.

I'm pretty sure someone at Firefox has decided the settings panel should be 'clean' and 'easy to use' rather than providing maximum control.

I mean, there's a lot in about:config that the settings panel doesn't support.


... and it was around the time when Firefox 3 was introduced.

The ability to selectively enable those features would be ideal, but I disagree that their presence alone is a net negative.

Web browsers are the new cross-platform runtime, better to get used to it.


I realize Hacker News isn't a hivemind, but I see these two assertions frequently, and I find the contradiction striking:

1. Why are web browsers becoming application platforms? Web browsers are document readers, and we should treat them as such.

2. Why does Zoom (or Slack, or insert-thing-here) want me to download a native app? I should be able to do it in my web browser!

I sympathize with both philosophies, but they cannot co-exist.


Nope, this is a strawman argument you're making.

Parent is asking for granular access control over these very advanced and double-edged features, which is a perfectly valid request.

Having permissions is the norm for native apps on mobile, and it's slowly becoming the norm also on desktop, finally.

Browsers are the new OS, they have to implement permissions too without anything enabled by default.


Actually I think there's middle ground or a way to somewhat satisfy both.

Does Zoom, Slack, or insert-thing-here need the technical ability to portscan localhost in the manner of this blog post? Well, Zoom had their vulnerability making web requests to their app bound to localhost, but did they need to be able to do it? They could get the job done without that functionality.

They already have both an app and a web-only experience. (They try to hide the latter by only presenting the option when the former has evidently failed, but it's still there.) So, it has it both ways by your 1 & 2. Almost as if users should be able to choose.


> Opening a WebSocket to netcat dumps out a HTTP request, so it seems unlikely that you'd be able to talk with anything that doesn't talk HTTP and WebSockets.

AFAIK this is only partly true, if the web server does not support the websocket protocol, you cannot connect to it [0].

So if I am understanding this correctly, WebSockets only support a small subset of HTTP and it should therefore not be possible to use them to connect to "classic" HTTP servers or to send GET or POST requests to it.

[0] https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_...


But you still have to make sure all programs bound to local ports handle WebSocket requests gracefully.

Binding to 127.0.0.1 is a nicely cross-platform way to do inter-process communication (I've done so in the past to mitigate JVM startup/warmup issues).

I've never written this code defensively, because if you run programs that throw random shit at locally-bound ports, that's your responsibility. The web community has decided it's a good idea to give arbitrary websites that capability. It's true that the 'random shit' may only take the form of WebSocket requests, but this is only a minor comfort.

From my perspective, this needs to be locked down.

edit: On second thought, you have always been able to trigger similar requests by e.g. just setting the src attribute of an image: Opera aside, browsers apparently never implemented proper cross-network protections. So from now on, I'll be extra careful to make sure all my servers can handle unexpected, potentially malicious HTTP requests even when bound to 127.0.0.1.

That said, I still do think this is something that needs fixing on the browser-side.


WebSocket connections are initialized by sending an HTTP request.

> Firefox also seemingly blocks access to 192.168/24 and 10/8.

What about 172.16.0.0/12?


I remember when you could have a link to file:///etc/passwd and you could click on it, and the browser would load it.

It blocks that too on my machine.

> Firefox also seemingly blocks access to 192.168/24 and 10/8

Chrome, OTOH, will happily open a websocket to these IP ranges. Another good reason not to use Chrome.


> Chrome, OTOH, will happily open a websocket to these IP ranges. Another good reason not to use Chrome.

Why is “I can use websockets on a private network” a bad thing?


Uh... Why not block all private networks?

A search seems to suggest Chromecast devices are controlled via websocket, though that information is pretty old.

Probably IoT.

>> “ Edit 4: Done researching this now. I went to ebay on Firefox, and wasn't getting websocket scans.“

...Just in case you missed it and you’re using Linux, from the article...

“I thought it might be because I run Linux, so I created a new Windows VM and sure enough, I saw the port scan occurring in the browser tools from the ebay home page”


We need a simple browser implementation for the masses. Is there any such browser in existence?

There are no simple browsers, thus certainly no simple browser for the masses.

If you are on mobile there is Firefox Klar/Focus.

I'm a big fan of https://surf.suckless.org/ .

With some polish and maintenance it could serve as base for a simple(r) browser for the masses. It is essentially a small wrapper around WebKit, which is far from simple, but it has the benefit of great support for modern web features.


Agreed. Firefox99, latest Firefox but party like it's 1999. Web* disabled. Canvas disabled. LocalStorage disabled. DRM-content disabled. Anything else we don't need?

You've inpsired me!

In Firefox:

LocalStorage - about:config Dom.storage.enabled

Canvas - block JS via noscript

DRM - Preferences->Disable DRM content checkbox

WebRTC - about:config media.peerconnection.enabled;

As a side note, noscript has been an eye-opener for how many things are loaded when I open anywebsite! It's been fast, but I've been getting annoyed by guessing what to enable to move past a blank webpage. And WTF is newrelic and why does half of my internet browsing use it?


Newrelic is an application performance monitoring platform, basically give you performance insight on your webapp, so it gain popularity really fast. Really great for identifying performance bottleneck on the server side. Later on, they launched browser monitoring too, so now all webserver that uses newrelic automatically insert newrelic js analytics on every html endpoint.

Netscape.

Qutebrowser.

DOS is simpler than Linux. Lets go back to that.

I don't think there's anything wrong with any of those things with a sane permissions system attached (strict opt in per site).

The web is now a no-install app delivery platform, it makes sense for these things to exist.


We are in ActiveX 2.0 era. Can we call it GoogleX?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: