Hacker News new | past | comments | ask | show | jobs | submit login
Privacy implications of the QUIC protocol [pdf] (petsymposium.org)
66 points by ivanblagdan 16 days ago | hide | past | web | favorite | 20 comments



"The source-address token is a unique, authenticated-encrypted data block provided by the server, which cannot be decrypted by the client. For the purpose of IP address spoofing prevention, it contains the users publicly visible IP address and a timestamp as seen by the server."

Compare ...

http://curvecp.org/security.html

"Does CurveCP provide client address authentication?

No. IP addresses are not secure in any meaningful sense, and CurveCP does not attempt to make them secure. Servers that distinguish between clients must do so on the basis of long-term client public keys, not IP addresses."

Users can create new public keys anytime they want. They could, e.g., create one key for one purpose and another for another purpose. Over the years, I have noticed a common theme in the design of all djb software. Whether it is intentional or not, I do not know. Control rests with the user.


So, I think you've completely misunderstood what's going on here.

What QUIC is doing here is to prevent _spoofing_, whereas what DJB is talking about is the futility of using IP addresses as _identity_

Why does this matter? QUIC servers often have lots of bandwidth available to them and offer 0-RTT, so they can be used for an amplified denial-of-service attack. Bad Guys send packets "from" your IP address to the QUIC server. Without spoofing protection the QUIC server begins bombarding your IP address with stuff. Your network links saturate and you're effectively knocked out even though you did nothing to cause this.

Spoofing protection is provided in TCP (on any vaguely modern OS) but QUIC doesn't use TCP so it must roll its own protection and that's all this is for.

CurveCP has a 1-RTT setup penalty with a "Cookie" to prevent spoofing. If somebody wants to track you with CurveCP they'd be just as able to use this cookie as they would the QUIC "source address token". In both cases you could throw this away to prevent tracking, at a penalty of reducing performance because you'll have to do the startup dance all over again.


The paper discusses how source address token might be used in tracking and thus online advertising services, Google's business.

The online tracking and advertising industry and companies selling to that industry such as Google believe IP addresses, when combined with other information, tell something valuable about consumers.

Perhaps users of the "modern web browsers" cannot or will not manually control generation or storage of source address tokens. Hence the need for papers like this one, pleading with the organisations controlling the browsers to change the software. That software is of course written by employees of companies and parents of companies that are paid directly or indirectly from sales of online advertising services.

Probably the folks creating these protocols never thought about the implications of the design on online advertising services. However, from the perspective of people selling internet advertising services, the association/non-association of IP addresses with public keys seems like it might be significant, regardless of its intended purpose to the protocol designers. People buying those online ad services are likely to understand the potential value of IP address information, e.g., they might think it tells them something about geo-location. If so, they might also see the added value in the combination of IP address with a unique identifier.


"If somebody wants to track you with CurveCP they'd be just as able to use this cookie as they would the QUIC "source address token"."

https://curvecp.org/confidentiality.html

"Two minutes after a connection is closed, the server is unable to extract the client's long-term public key from the network packets that were sent to that server, and is unable to verify the client's long-term public key from the network packets."

"The second packet from the client contains a cookie from the server. This cookie is actually a cryptographic box that can be understood only by a "minute key" in the server. Two minutes later the server has discarded this key and is unable to extract any information from the cookie."

By contrast, the paper suggests QUIC source address tokens have no expiration and are retained for a minimum of 11 days by the existing QUIC-compatible browsers.

https://curvecp.org/packets.html

"Server Cookie packet details A Cookie packet is a 200-byte packet with the following format:

8 bytes: the ASCII bytes "RL3aNMXK".

16 bytes: the client's extension.

16 bytes: the server's extension.

16 bytes: a server-selected compressed nonce in little-endian form. This compressed nonce is implicitly prefixed by "CurveCPK" to form a 24-byte nonce.

144 bytes: a cryptographic box encrypted and authenticated to the client's short-term public key C' from the server's long-term public key S using this 24-byte nonce. The 128-byte plaintext inside the box has the following contents:

32 bytes: the server's short-term public key S'.

96 bytes: the server's cookie."

It appears CurveCP's two minute server cookie lacks any easily usable tracking information. There is no client IP address, no long-term client public key, no long-term server public key.

In contrast, the paper suggests QUIC's evergreen source address token containing the client IP address is intended to be reused in subsequent connections and can be easily used for tracking:

"The client caches the source address token and presents it to the server during the setup of a new connection. This allows a server to link the connection where the source-address token is initially issued with each subsequent connections where the same token is presented during the CHLO message. Finally, this enables the server to identify a chain of connections associated with a user."


The network layer can already be used for tracking in multiple ways, including HTTPS sessions, ETags, and cached files with identifiers. When browsers partition the network layer they need to partition connection state as well, which includes QUIC/HTTP3 state. Safari already does this, and it looks to me like Chrome and Firefox are doing it too: https://www.jefftk.com/p/shared-cache-is-going-away

(Disclosure: I work on ads at Google, speaking only for myself)


Yes, theres other ways to be tracked at the network level, sure. I don't see how that changes the discourse? Beyond the straight technical implications, isn't it concerning that a single company can roll out its own protocol across the server and browser stacks, implicating 7% of web traffic? Would it be more concerning if the same company has certain interest in improving its tracking and data collection capabilities?

Also, I was expecting to find details around browsers implementing some form of network level partitioning at that link you posted, but failed. Care to spell it out for me?


The issue of using the browser cache to perform timing attacks, which you mentioned in the post above, has been known for two decades: https://sip.cs.princeton.edu/pub/webtiming.pdf

The fix you mentioned (getting rid of shared caching) is discussed in the above article from 2000.

Genuinely surprised this hasn't been fixed earlier.


in 2017 Brave disabled it:

> Back-story: Brave users reported ads getting past our ad-blocking shields in previous Chromium versions, beginning with reports of ads displaying on YouTube.com on October 11, 2016. uBlock Origin users had reported similar bugs. We discovered during testing that disabling QUIC seemed to stop these ads. As a result, we pushed an update to disable QUIC in Brave on January 25, 2017. This update appears to have temporarily abated the incoming bug reports about ads getting past our shields.

...

When we inspected web page traffic via chrome://net-internals, we discovered that QUIC requests were and still are being used for a majority of Google’s ad domains, including domains involved with bidding ...

source: https://brave.com/quic-in-the-wild/


Google software uses a google protocol to deliver google services. News at 11.

Is this the problem where we've shot ourselves in the foot securing communication to the point we can't block adequately now by tempering with traffic?


> Is this the problem where we've shot ourselves in the foot securing communication

Isn't that the goal? Everyone seems to think Google's innovations are gifts to the world. They are and always have been solutions to problems Google faces. The only reasons they're opensourced or community shared are to benefit Google. Anything beyond that is collateral benefit.


Per the Brave article blocking is perfectly possible and works fine, it was just their blocker implementation not being ready for a different protocol.


My question is why does their ad-blocking code work with one protocol and not another? Is their ad blocker running within Chrome's TCP/TLS stack instead of something higher-level like WebRequest (or maybe Chrome makes it impossible to do that)?


This is about Google's QUIC ("gQUIC") and is based on a version from 2018. It explains in the Related Work section at the end that the IETF QUIC protocol has different properties.

The trade discussed between keeping information for longer to make everything faster versus throwing it away frequently to avert tracking is _everywhere_ already. It's in your HTTP/1.1 web browser's Cookie behaviour, it's in the TLS implementation's resumption feature.

One piece of good news is that in the quest to speed things up putting the public keys into DNS means it's no longer practical to (as is discussed as a potential attack in this paper) give each client different keys so you can identify them that way. Because DNS is cache-based everybody receiving the same version of the cached data will see the same keys. This isn't a problem for good guys, everything works, but if your goal was to track people against their will it's a problem.


They also state that IETF QUIC proposals don’t seem to cover all aspects like abuse of the source-address token.

To my understanding, the more various identifiers you can muster, the more effective you are at identity stitching across data sets, resulting higher fidelity profiles. Are we at that point already where we’re ok at just waving off at another way to track what we do online?

Meanwhile, it seems theres already an implementation out there that covers ~7% of web traffic and is subject to this behavior. It’s been implemented single handedly by a company thats saying standards are moving too slow for it, far too often these days. That company also conveniently has a lot of stakes in the tracking game.


Putting the keys over DNS seems really clever. With DoH are requests bundled in a single session or is the session stood up and torn down per? I assume it's the former or this proposal wouldn't have gotten far but I've never actually bothered to check that far into DoH.


A DNS query wrapped as HTTP actually makes a canonically good example of a safe TLS 1.3 0-RTT transaction so in principle you don't need to keep sessions alive.

In your first (1-RTT) DNS lookup you agree a PSK (a secret key) with the DNS via DoH server.

On the next DNS via DoH lookup you send only one message, it goes like this:

Hi, it's me again. (The rest of the message is encrypted using the PSK). Here's a freshness check. I want to ask AAAA? www.google.com and also let's agree a new key for the next time I do this. Thanks, bye.

The DoH server will probably reply like this:

Hi. (The entire rest of the message is encrypted using the PSK). Here's proof I'm still me passing your freshness check. AAAA www.google.com answer is some:ipv6:address and yes, here is a new PSK for next time.

This is the same number of messages back and forth as with traditional UDP DNS albeit the messages are a little bit bigger now, and so it incurs the same latency.

Because this is 0-RTT the DoH server can't always be sure if it has seen your query before (doing this is trivial in a toy system with e.g. one DoH server on a Linux box but hard at scale with a distributed system). So a bad guy could replay the query. But, it's just a DNS query so replaying it doesn't achieve anything useful, and this doesn't help the bad guy learn anything about the query, they don't get to find out what it said or what the answer means.

[ Edited to remove mis-remembered DH for resumption, alas TLS 1.3 resumption PSKs are not forward secret ]


Proposals of putting keys in DNS has a long history in IETF protocol development, it was always (?) shot down.

It was also the basis of deploying ubiquitous end2end IPsec on the internet, attempted by FreeSwan.


Regarding QUIC and tracking, has anyone heard about MASQUE protocol proposal?

https://tools.ietf.org/id/draft-schinazi-masque-01.html

any thought on this?


It's nice in that it enables higher performance than previous "looks like a standard web server" due to QUIC but other than that there isn't too much to say. It's just a straw man proposal so the implementation details aren't there to talk about and it's not a new idea outside of binding to QUIC as the primary transport.


"Survives a browser restart: no" seems to imply any method which tried to use QUIC alone would be highly unreliable.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: