Bug reporter here: fwiw I just want to credit Tailscale for how they handled the disclosure and remediation.
Deploying a prod fix within 24 hours is pretty great, they’ve been super cooperative and pleasant to deal with and (as far as I could ascertain) there was no apparent way to get the database IDs for nodes you didn’t have pre-knowledge of
I wonder how many bugs are found this way, rather than actually looking. I found a way to extract private emails from npm a while ago, purely through chance.
I appreciate how Tailscale reacted here. This isn't a great bug, but things happen.
I specifically appreciate:
- There's a specific date when the regression was introduced
- There's an end-date when the regression was fixed
- There was a thorough and conclusive analysis that was done that proves no exploitation occurred
It's good that they fixed it, but this sounds like it would have been difficult to exploit:
>A malicious individual who knew a target node’s database ID could generate and accept a sharing invite for that node without being an admin of the target node’s tailnet...as long as the individual knew the target’s node ID.
...
>The node ID is an integer used in the admin panel’s database, and is not related to the node "StableID" that is visible to Tailscale clients... IDs are not sequential or otherwise easily guessable.
I guess it depends on the search space of the IDs. If they're 64-bit integers, very hard to guess a valid ID. If they're only 32-bit integers, then you could have guessed, but it would be pretty obvious in Tailscale's logs.
Did you spend any time trying to see if you could steal the values from somewhere else? Given the other breaks in Tailscale, I'd bet there's an info leak somewhere. Deriving it from first principles is hard. Stealing it from somewhere should be easier.
Yeah, we're adding people slowly because decentralized authorities like the one that tailnet lock implements can have nasty failure modes, e.g. some bug that prevents any new addition to the tailnet at all and forces manual recovery on each of your devices separately. So, we're putting miles on it with a little care, and making sure folks who sign up are aware of the current limitations and risks.
If you're excited about tailnet lock and want to get on the alpha sooner rather than later, feel free to drop me an email. As Dave mentioned we are slowly crunching through the waitlist to get some miles in, but I'm also happy to take on enthusiastic testers ahead of that!
This seems pretty bad. I was pretty leery of Tailscale having too much control over my network when I tested it out and now seeing this security notification reinforces my decision to use vanilla WireGuard and Nebula for my home and datacenter use cases.
Every time it gets mentioned here I wonder the same thing. Eg in the discussion of their last vulnerability they said they knew the vulnerability had not been exploited maliciously because they did not detect it on their end when their software phones home to them.
> Why on Earth would I want to use a security product that phones home... Regular WireGuard works perfectly fine for me.
Then you should continue using regular WireGuard. However, modern industry is plagued by an endless war with vulnerabilities, exploits, and malicious insiders. Even with adequate staffing, it's like a dam you're constantly patching to prevent leakage. We have to log everything, often offsite, and often into immutable storage. When the dam does eventually leak, we have to know how much and how it started. The logging is a feature to me.
Tailscale is building a service that doesn't require me to run and maintain a centrally connectable server, one that ties into a single-sign-on solution, one that logs activity, one that's introduced a system in which I don't even have to trust their control plane exclusively (Tailnet Lock). Just the seemless integration with Azure AD has saved maintenance time over NPS+Radius+ADConnect+OpenVPN.
Wireguard is great, I'm using it for all my site-to-site still (and it blows OpenVPN out of the water). But Tailscale has replaced all my client vpns for good reason.
Really? I use the (official?) Android and it worked for me fine first time, and has for years.
I think maybe you ran into a UI bug where it wouldn't work unless a subnet definition ended in a 0 (ie defining 192.168.0.1/32 would cause it to silently not boot up).
I also would like to be able to use my home lan DNS server for home lan hosts.
It might not be a Nebula problem, but to do this I need to use DNS-over-tls in Android, through the Nebula tunnel.
Edit: noticed you were referencing the WIREGUARD phone app. I'll go try harder.
> When the dam does eventually leak, we have to know how much and how it started. The logging is a feature to me.
Logging is a feature. Logging to some random 3rd party is not. Sure, if the 3rd party provides the service itself they need the logs to make it better but if stuff stays within your own infrastructure it should not phone home. And VPN service for enterprise certainly shouldn't have controller hosted on outside of company's own infrastructure
Well it's not a "random" 3rd party, it's a company you decide to rely on for your security. You pay, they offer you a service. If you don't trust them, that's fine, the product+service isn't for you. But it's not a "random" 3rd party.
>Why on Earth would I want to use a security product that phones home..
I share your opinion, I don't want my security products to phone home, but the question was answered earlier : 'you' want a product that phones home so that the central data collection group can make statements like "vulnerability was not triggered or exploited." -- otherwise the onus is on the data holder to make similar assessments.
in other words , the proprietors of wireguard cannot make statements regarding their entire userbase. (This may or may not be a good thing.)
WireGuard is a pain in the ass to setup though. Also, their Windows client is no longer supported and has showstopper bugs (tunnels disappear until UI is launched and they are manually reactivated there).
I mean, if they had proper bug tracker instead of a mailing list, we could see the answer to that right away. But I am referring to https://lore.kernel.org/wireguard/CAHmME9pbY0Cgbj5JbTVvsxpkd... which I believe just made me travel to a remote location to restore connectivity.
> Why on Earth would I want to use a security product that phones home...
Same as outsourcing security: When you want a team of professional security engineers take care of it.
(In that case, you may want to wish they are ethical, not overworked and actually care about your servers…)
Same as outsourcing security: When shit hits the fan, you have somebody else to point your finger at. No inconvenient questions to ask withing your org, potentially having to blame someone you like on a personal level etc.
Worst case, you have to ask the guy who green-lit the outsourcing some inconvenient questions, but if the company you outsourced to is big enough, you can easily take the "nobody ever got fired for buying IBM" escape route.
>>Why on Earth would I want to use a security product that phones home..
that ship sailed a long time ago in enterprise networking. Almost all major vendors now have Cloud Control, and all of them phone home for various things
>How do you connect back to your home from the outside?
>- your ISP IP change all the time
I have a static v6 IP on the WG server (home router, running OPNsense).
Even if I did have a changing IP, I have a programmable DNS server that points to the WG server's IP.
>- You need to open port
One time setup on the server.
>- You need to setup wireguard
Installed just like any other distro package, plus one-time setup to generate the key and import it and the server's public key into systemd-networkd / NetworkManager.
>- You need to manage auth / authz
You generate a key on each device and register the public key with the server. There's literally nothing else to it.
>Lot of things can go wrong and it's a lot of things to setup / maintain properly.
I set it up about two years ago and it's been working unchanged since.
>I have a static v6 IP on the WG server (home router, running OPNsense).
Excuse my snark, but not everyone has a static v6 IP. I don't.
>One time setup on the server
Unless you happen to be behind a CGNAT or you're on a mobile network or or or or...
>Installed just like any other distro package, plus one-time setup to generate the key and import it and the server's public key into systemd-networkd / NetworkManager.
And if that won't work you're gonna be stuck debugging the network setup. I certainly do always end up debugging the VPN network stack eventually.
>You generate a key on each device and register the public key with the server. There's literally nothing else to it.
This won't scale to more than like 5 devices without being a major work item if a key was compromised or needs to be rotated (what if it turns out the RNG device was bad on your kernel at the time? Happened to SSH Keys on RPi's).
>I set it up about two years ago and it's been working unchanged since.
>Excuse my snark, but not everyone has a static v6 IP. I don't.
My ISP doesn't have it either. (They've been promising it for 2 years but keep delaying working on it.) I have an HE tunnel.
Also you ignored the sentence after that, which I had written to hopefully preempt this response.
>Unless you happen to be behind a CGNAT or you're on a mobile network or or or or...
Your server needs to have a globally reachable IP, yes.
>This won't scale to more than like 5 devices without being a major work item if a key was compromised or needs to be rotated (what if it turns out the RNG device was bad on your kernel at the time? Happened to SSH Keys on RPi's).
Depends on which key, of course. If a device's key needs to be rotated, you only touch that device and the server. If you rotate the server's key, then yes you touch every device.
>And if that won't work you're gonna be stuck debugging the network setup.
>if a key was compromised or needs to be rotated
>Not everyone is that lucky.
Sure. I know these things that happen never or rarely (currently, never), so I'm okay with doing them manually when they happen instead of outsourcing. I don't make decisions for other people or claim that what I do is best for other people. You should make decisions for yourself by evaluating the pros and cons for yourself.
> Even if I did have a changing IP, I have a programmable DNS server that points to the WG server's IP.
I believe that WG resolves the domain only once, upon config load. Not much help if the server IP changes after starting the WG client. Up to the user to realise and stop/start- not so convenient for eg embedded WG clients in travel routers
If the client is still at the same address it was last time it contacted the server and the server sends data to the client it will be transparently handled. This means that persistent keepalives should be used on both sides. It also still leaves open the possibility of the connection being lost if IP addresses on both sides change during the period between the keepalives or if there is any kind of network disruption preventing traffic from the server to the client during the period the server IP address changes. I had this become an issue for me a few times when I ran a wireguard server at home.
This won't work if the server is being blocked. My mobile ISP blocks incoming UDP traffic unless the phone establishes the "connection" first. Since change of the server's public IP means that "connection" is broken, all unrelated UDP packets never arrive.
The NetworkManager WG integration re-resolves. systemd-networkd doesn't, though it has an open PR for it, though that PR has been stalled for a long time.
A VPN solution for your home network is a very different thing than what is needed for any 50+ employee org. Bare wireguard is pretty useless in a business environment; this is why Tailscale is growing so fast.
Why would VPS provider need to be trustworthy at all? They just need to be reliable. Having VPN doesn't absolve you from securing hosts within it, like you would in any other network. VPN just simplifies the networking.
And if they're not reliable, VPS providers are much easier to switch than "Tailscale".
I was not claiming Tailscale has to be trustworthy.
Though, thinking about it, it's a privileged software running on every node in the virtual network, that is controllable from outside (cloud). So yeah, it needs to be way more trustworthy than some static wireguard setup you setup on a VPS.
What scenario are you imagining where your vpn is compromised through Hetzner or Digital Ocean for example? This is a standalone server set up just for you, not sure how tailscale works but does it spin up a segregated server for each customer without a single point of failure?
So they can do analysis on the logs and proactively fix vulnerabilities before they are exploited? I'm sitting here in shock wondering if you don't realize what you typed, or your implicitly saying you don't want a software company to have useful analytics to fix their product. The second option is understandable, but at least acknowledge the utility of logging.
Also, I'm pretty sure you'll find detailed discussion of why people use Tailscale when there are options like vanilla Wireguard and OpenVPN if you google it or check their homepage. It fits many peoples' usecases.
>I'm sitting here in shock wondering if you don't realize what you typed
After you're done being shocked, maybe you should read my comment more carefully. I didn't say "Tailscale shouldn't have logging". I didn't even ask "Why does Tailscale have logging?" I asked "Why would I want to use a security product that phones home?"
I don't make decisions for Tailscale. If they want to have logging, analytics or free puppies, they're welcome to. I make decisions for myself, and I've decided that I wouldn't use any security product that phones home.
I'm so glad you 1. read the article 2. precisely understood the vulnerability 3. carefully analyzed the situation 4. came up with a high quality analytical comment
Ding, ding, ding! A lot of security/compliance folks like to pay money for scapegoats. It certainly helps with job security. Worst case scenario? Find another vendor.
> Or because SaaS is easier to use and requires little or no maintenance.
YUP
We recently deployed Tailscale in our organisation; I had looked at Wireguard very closely but decided that the ease of use of Tailscale meant we could get up and running much more quickly and easily and at relatively low cost (small team).
We'll re-evaluate as we move forward & probably need to deploy it to more users, but it was the difference between a few minutes setup to get all users into it versus several hours of figuring out the ins and outs of Wireguard & then getting the team onboarded.
While any security incident is frustrating, they are inevitable, and the only way to really judge an organisation is by its response. Tailscale's response here gives me more confidence, not less.
Just the default Google Workspace one for the moment. Only a small handful of users at the moment & didn't need anything fancy.
We are probably going to be looking at a redo of our whole auth at some point in the next 6-12 months as we plan to grow in size & move a lot more people into Tailscale.
Nebula doesn't really have a control server of this sort, it largely uses a CA to do the node authentication and a coordination server that helps nodes get introduced and NAT bust, more like the DERP server for tailscale.
The Nebula equivalent of this would be the Defined Networking folks, who do run a control server more akin to Tailscale. They say they are moving slow to focus on security, and I haven't heard of vulnerabilities like Tailscale, but also I think Defined Networks is much, much smaller in terms of users, so it may be a time will tell situation.
IMO it's more about agency. With SaaS people think "they had a bug and there's nothing I could have done to prevent it or expedite the fix" but with on-prem software they think "once I discover a bug I can whip my people to have it fixed within an hour". This is not true of course.
Edit: For the record, I think Tailscale (as a company) builds excellent software and I like the idea of them making money and staying alive to keep doing the great work they do. I personally feel uneasy about using their proprietary control server in my own (home, etc.) networks, but I honestly wouldn’t even think twice about it if I was making that decision on behalf of a company.
I had not heard that, and Juan's twitter says he works at the European Space Agency.
On the other hand, they certainly have on a few ocassions gone out of their way to help Headscale, like documentation the changes to the V2 control server protocol, and even putting some relevant code in the open source repo to make it easier for Headscale.
Plus they try to keep backwards compatible with older control servers in the client, and trying not to break headscale is something that has been mentioned a few times in pull requests reviews.
Isn't it more likely that this would have unfound vulnerabilities in it, and you'd still need to have this open to the internet to get similar benefits to Tailscale proper?
Interestingly, they just updated their ToS [1] to force arbitration [2], and moved from Canada to NY (which my lawyers were happy about that second part).
> Interestingly, they just updated their ToS [1] to force arbitration [2], and moved from Canada to NY (which my lawyers were happy about that second part).
....I don't see the connection between their change in ToS to mandate arbitration & the bug itself. There's no direct equivalency between the two.
On a personal note, I'm more neutral on arbitrations as a concept: They help facilitate faster legal resolutions in an environment where the time & cost overhead is purely human-derived. It often takes months, if not years, to form an impartial jury, a judge that's available to review the case, and for all relevant evidence to enter under the case's purview.
Also, the website that was cited in your previous comment doesn't provide any justifications for the claims made in their list. It simply states 10 reasons with no expansion into details/reasoning on the last 9 reasons. The first reason alone gives a tautologically-derived justification to its statement.
As noted in other comments [0], part of the reason why people use Tailscale instead of just wireguard or headscale is that they think they they have a throat to choke. The arbitration clause has the potential to make things more difficult in that regard.
> As noted in other comments [0], part of the reason why people use Tailscale instead of just wireguard or headscale is that they think they they have a throat to choke. The arbitration clause has the potential to make things more difficult in that regard.
>>> This seems pretty bad. I was pretty leery of Tailscale having too much control over my network when I tested it out and now seeing this security notification reinforces my decision to use vanilla WireGuard and Nebula for my home and datacenter use cases.
>> People mostly seem to pay for these products so that they can blame a third party if things go wrong.
> Ding, ding, ding! A lot of security/compliance folks like to pay money for scapegoats. It certainly helps with job security. Worst case scenario? Find another vendor.
-------
The Oracle/Dropbox model of 'here's someone to blame, just pay us for that' is a reasonable business strategy in an environment where the business in question wants as little involvement in the development of a product/service that they rely on: They just want the features that are being advertised, and are willing to pay for its siloed-off development. The actual proceedings of the potential arbitration in the future don't matter as much as the name on the sheet that they can rely on for blame/support.
-------
-------
>> They help facilitate faster legal resolutions
> Faster is not always better.
-------
Faster legal resolutions are better when the alternative is having to wait for months/years on a judgement.
The only scenario wherein arbitration loses out is in a constructed system where:
1) There are always impartial juries with the relevant subject matter knowledge to draw from for a given case
2) There are enough judges that there are effectively no wait times between being assigned a judge & getting a legal case resolved
In such a scenario, the bottleneck would exist in the discovery phase of the legal case, which can be resolved by mandating ALL conversations & interactions between the two entities be immediately submitted within a given time limit, whether directly, indirectly, or as part of a larger group.
> The Oracle/Dropbox model of 'here's someone to blame, just pay us for that' is a reasonable business strategy in an environment where the business in question wants as little involvement in the development of a product/service that they rely on: They just want the features that are being advertised, and are willing to pay for its siloed-off development. The actual proceedings of the potential arbitration in the future don't matter as much as the name on the sheet that they can rely on for blame/support.
Yup, that's why I'm a paying customer of Tailscale (biz tier).
Im a bit frustrated with this. I say this as a big fan of tailscale, having promoted them in various places.
Granted, you cant enumerate the ids, but neglecting permissions when adding a node seems like a really stupid oversight.
May I suggest that Tailscale spend some time to double-check that they are applying access control where applicable throughout all aspects of the application.
I appreciate the feedback on this vulnerability, and will continue to be a happy user - but please check that these oversights dont exist elsewhere.
Nobody in this industry consciously neglects authorisation, especially not startups building security solutions. This sort of stuff sometimes just happens, and the root cause always eventually boils down "we are humans and it is a human thnig to make mistakes". They have displayed fantastic response time and transparency on the issue, had the infrastructure in place to assess the impact, credited the reporter, hang out on HN to answer questions...
If anything, this kind of response builds my confidence and trust in them.
The product is great. Their security thinking though is pretty bad - they've had a sequence of pretty serious security flaws in what is supposed to be a security related product. The google login stuff is great though so not sure if there are good alternatives.
> Their security thinking though is pretty bad - they've had a sequence of pretty serious security flaws
They are targeting 6 OSes plus web, each of which has unique ways to get security wrong. On its own, that's one hell of an attack surface and I'd say the rate of vulnerabilities reflects that.
However, I'd say their security _posture_ is pretty good - from what I've seen, reported issues are typically patched within 48h of initial report (most companies ask for a 90 day window before bug reporters go public with issues, and often let it pass without fixing the issue).
They've had many flaws become widely publicized because they have great writeups of the issue, but I think I've only seen one critical-severity issue - most are low-impact. EG in this case you need to make requests until you guess an int64 correctly - which on average would 'only' take a year if you could somehow make ten trillion requests per second without being detected.
> However, I'd say their security _posture_ is pretty good
The clients for Windows, iOS and macOS are closed-source.
As a general rule, a closed-source server is bad, but ultimately it's somewhat tolerable. A closed-source client for a product or service that appears to have an acknowledged security posture is full-on unacceptable. I can't fanthom why they would shoot themselves in the foot like this.
Not clear on why they made that call, but those are incidentally the exact set of platforms where they can't rapidly ship security updates (it's dependent on the vagaries of the platforms store).
> Mostly. Tailscale daemon client code is open source. Where the operating system is open source, the daemon and GUI are open source, and where the operating system is closed, the daemon is open source and the GUI is closed source.
You can run just the tailscaled daemons on Windows, it just wouldn't have a GUI, and that's fully open source[1].
The client daemon is open source. Generally you shouldn't be increasing attack surface with the introduction of a GUI (the closed-source part), so gonna have to strongly disagree with you here.
EDIT: Sorry - the original writeup / headline wasn't clear or I missed it - this requires guessing a 64 bit number that appears to have been random vs 32 bit sequential. Not sure if things were rate limited but even if not this makes a practical difference.
Happy to hear there was no evidence of this being exploited.
This is one of those things you really want to fix. Relying on an ID value being hard to guess as a security measure can be easily become one of those completely-invisible-but-critically-load-bearing (in)security properties that is too easy to invalidate with new pushes/features/etc.
If these IDs were able to be exfiltrated or leaked or probed, it could have been disastrous. Really fortunate that this was caught now and wasn't in the wild for another year or two.
Perhaps the biggest revelation from this is that Firefox doesn't implement rebinding protection, and that CGNAT DNS responses aren't dropped by most resolvers. Host validation is preferable anyway [1] but the second problem still has bad implications.
The bigger problem is, people casually look at the code and find such major vulnerabilities in Tailscale. It doesn’t instill confidence in the rest of the code.
This is a security product, and zero days in the code should be taken more seriously. Perhaps make the product all paid, and use the money to hire people to audit the code?
Deploying a prod fix within 24 hours is pretty great, they’ve been super cooperative and pleasant to deal with and (as far as I could ascertain) there was no apparent way to get the database IDs for nodes you didn’t have pre-knowledge of