Hacker News new | comments | show | ask | jobs | submit login
Every Byte of a TLS Connection Explained and Reproduced (ulfheim.net)
1185 points by aberoham 31 days ago | hide | past | web | favorite | 103 comments



Author here - I was going to publish this today but it leaked out ahead of time. Enjoy!

EDIT: I'm putting a CDN in place.


Kudos! This a great work. I remember few years back when I was exploring TLS, there were no such resources and it took many months of trial and error to get some reasonable understanding. I tried doing a similar thing, but was only able to do for couple of packets [0]. Surely such an illustration will go a long way to help newbies to understand a complicated protocol.

[0] https://serializethoughts.com/2014/07/27/dissecting-tls-clie...


Thanks! This is really a fantastic resource. Something like a TLS connection is one of those things that seems really intimidating but when you see it laid out like this it makes you feel like 'I can understand this!'. Great work.


very nice! first thing I noticed is that at https://github.com/syncsynchalt/illustrated-tls/blob/master/... the content and samples are hardcoded into HTML - might it be nice if this was generated from some kind of JSON file or similar such that the approach you have here could be generalized to support any network protocol someone might want to annotate with descriptions?


Yes, if I do this again I will definitely generalize it into a content generator.

(Isn't that what every site turns into eventually, a custom CMS?)

As it is it's all tcpflow, hexdump, and vim.


Great illustration, I am going to share this with the rest of my team. One thing that I would find interesting is how x509 client certificates fit into the negotiation. I know this is optional so I don't know how it would fit into your flow easily though.


Yes, I thought about explaining client certs and others (the first request I got was to add ALPN to the connection), but there's already so much to talk about in even this simple connection that I thought it would detract from the document as a whole by making it even longer and denser.

I didn't even get a chance to explain the (normal, server-side) x509 certificate signing much because it just kept taking over the document. TLS is complex enough that just explaining the happy path is 400kb of HTML.


I did a presentation at our local software meetup on this topic a month ago, and wow, I could have really used this. It's way clearer and more organized than my muddled slides.. :-)

Very nice work !


Can you do this for DTLS, too? :=-)


This is the third request I've gotten for DTLS. Where are you weirdos coming from? :)


A canadian?

Nice.


This is wonderful!

I may make a version where the bytes used for lengths are highlighted, since it feels like so many bytes are lengths; look at the SNI extension, which has three 16-bit lengths, I know why they're there, but SNI probably shouldn't be a list, and even if it was a list, an extension that consists solely of a list has a list of the length of the extension, you shouldn't need two bytes for that, and if we recast sni into just a type and a string, the string is clearly going to take the rest of the extension length, so it doesn't need a two byte length either.


Agreed, and I think some pain must have gone into those redundant length bytes. The way they've done it makes it very easy and natural to extend any part of any record later, but it gets ridiculous when you have to document them.


Moreover, from a security perspective since we're talking about TLS, "overspecified" and/or "redundant" lengths are just begging to be made inconsistent and a source of vulnerabilities.


Usually if you mess up something there your implementation just doesn't work.


Vulnerabilities often arise from implementation bugs, no?


Yeah, but not every implementation details will equally lead to the same vulnerabilities. Having said that, heartbleed was a length issue.


Beautiful.

I recently had to implement two way auth over 1.2 and this would have saved much hair pulling. (and who does TWO way auth over an MPLS connection. Turns out, us).


I feel your pain. Two-way TLS is a funny thing, it's supported by the standards and even most implementations but its actual use is minuscule compared to "normal" one-way TLS, so much so that it's hard to find documentation even acknowledging two-way TLS exists, let alone how to use it.

And don't get me started about the hassles of obtaining signed certificates that are actually usable for client auth...


Would you be interested in some articles about his? A lot of our infrastructure bases trust around self-signed CA chains and mutual TLS authentication in different configurations, mostly involving elastic stacks, vault, nomad, consul - and we're migrating a lot of the cert handling to be fully automated. We're also extending this to a number of our java applications as well.

I might be able to shed some light on there, as this topic isn't actually that hard. It just requires some very careful thinking.


> And don't get me started about the hassles of obtaining signed certificates that are actually usable for client auth...

What sort of clients were you authenticating? The Web PKI needs to be trusted by random people from the whole world, but most mutually authenticated systems have a relatively small number of clients which are known to the server operator out-of-band. So probably the Web PKI is not the right choice. Instead you (the server operator or some neutral facilitator if it's a group of providers operating services for the same clients) should operate a CA for this purpose, not piggyback on the Web PKI.

One reason not to use the Web PKI if you aren't actually part of the public Internet is that we, to put it bluntly, don't give a shit about people who do that. Running a PKI is expensive (not just in dollar terms, it needs a bunch of smart, motivated people who are morally upright or it's worthless), and this one is ours, so it obeys our rules.

If you have your own PKI (or just one CA) you set the rules. Fifty year certificates for 1024-bit RSA? Why not. A current passport photograph baked into every certificate? Sure. Want the issuer to mint the keys and keep a copy? Do as you please. All those things are prohibited in the Web PKI.


Ignoring the Web PKI defaults though is probably a silly idea - e.g. long lived certificates with rubbish hash algorithms, huge certificates, and issuer kept keys are all really bad ideas, in almost any scenario.


Most two-way implementations are generally privately controlled. You can act as your own certification authority and sign both the client and server certificates for a private communication.

Totally curious, it sounds like you were going for a traditionally signed certificate approach. So you had clients that you didn't "own" -- normal Joe Public -- using your service? If so, that is definitely way outside the norm. If not, why didn't you just sign your own certs?


That's what the Noise protocol framework is for.


For client auth, assuming you control the server, would it be easier to just issue your own and install the root on the server?


Has anyone of you seen such beautiful explanation for other protocols (TCP, 4-way handshake)?


They're in text form, but I've always loved the clarity of W. Richard Stevens' (RIP) books on TCP/IP.

He did a great job of demonstrating every byte and jitter on the wire and how it related to the underlying BSD TCP/IP stack.


I'm reminded about the STEPS report of using ascii art from the rfcs as a DSL for describing tcp:

Writeup/comment: http://www.moserware.com/2008/04/towards-moores-law-software...

Comment thread: https://news.ycombinator.com/item?id=846028

The vpri texts: http://www.vpri.org/writings.php

Appendix e and section "A Tiny TCP/IP Using Non-deterministic Parsing" of "STEPS Toward The Reinvention of Programming: First Year Progress Report, December 2007.":

http://www.vpri.org/pdf/tr2007008_steps.pdf


You can usually use Wireshark to find out the meaning of each byte for internet connections (and images), it works great.


However, Wireshark can mislead you if you don't understand what you're looking at.

It's a bit like having a low-level debugger. If you're happy with low-level C, and you're looking at a debugger and it says variable 'k' which you know is a uint8_t which loops from 0 to 5, currently has the value 65 you should say to yourself. "Hmm, I bet that the compiler used the same place to store variable 'c' that's a single byte from a text string, so this is just the capital letter A and the compiler has realised it doesn't need to store the value of k even if it's technically in scope..." rather than "OMG my loop variable somehow massively exceeded its expected range, maybe cosmic rays have damaged the RAM".

With TLS for example if you give it a whole TLS 1.2 sesssion, Wireshark will say oh, this is TLS 1.2. Fine. But if you show it only a TLS 1.2 connection that failed, Wireshark will say "Oh, this is TLS 1.0". Why? Well, the low-level protocol has been bodged over the years because of crappy middleboxes, so Wireshark doesn't actually know for sure, and rather than say "I don't know yet, I need to see more of the connection" it says TLS 1.0

This can be a problem because you'll get amateurs saying "Our system can't talk to your server because you only do TLS 1.0" and you say "No. You are wrong" and they say "Look, here's a Wireshark trace" and sure enough Wireshark is telling them it's TLS 1.0 because their system has disconnected early (e.g. because they disabled all the crypto algorithms you allow), and so Wireshark wasn't sure and labels it TLS 1.0 rather than TLS 1.2

This is going to happen again with TLS 1.3. TLS 1.3 deliberately says "Hi I'm TLS 1.2" (middleboxes again) and so that's what Wireshark will report (until you get a newer version that knows to look inside the supported_versions extension field for the version) and so you can bet that amateurs are going to say "Your service only does TLS 1.2" when actually their connection failed for some reason and they don't understand how to read the Wireshark trace.


Beautiful. A few years ago, were implementing some TLS handshake stuff at Spotify (https://github.com/spotify/ssh-agent-tls) to be able to use your SSH key as a client-side cert for HTTPS connections.

The first couple drafts took forever to figure out, and I got a bunch of stuff wrong. This guide would've saved a ton of time back then.


There's a lot of practicality I learned from reading the Go source and implementing my own (e.g. always an array of a single value of 0 for compression method).

Now do a version with DTLS (one more message type, couple more fields on existing types, and logic concerning retries). Also, now do a TLS 1.3 one.


Yep, reading the Go source is what inspired me to do this, it's still very compact and readable.

I originally used an AEAD cipher but found it was impossible to demonstrate on the command line (openssl enc refuses to do AEAD because it can't confirm the authentication in a streaming context).

A friend asked me to demonstrate ALPN in this but as I looked into it I found it was distracting, as there's already so much going on and any new feature required digression. Maybe next time!

As for 1.3 my next project was going to be implementing it rather than documenting it. Just a throwaway implementation, nothing you'd want to use.


> As for 1.3 my next project was going to be implementing it rather than documenting it. Just a throwaway implementation, nothing you'd want to use.

Since you read the Go source, you might like [0]. I will say I personally think Go could have done better. I think it's too compact, too hidden, too disorganized, too underdocumented, and too inflexible/non-extensible. I began to pick some of it apart for a DTLS impl I started at [1], but have put on temporary hold yesterday due to other work obligations.

0 - https://github.com/cloudflare/tls-tris 1 - https://github.com/cretz/go-dtls


I've also had https://github.com/h2o/picotls suggested.


Link to the go source: https://golang.org/src/crypto/tls/


I have no need to look at that page at length ... but I am. This is very well made and interesting.


I remember implementing ssl in java when I worked for Netscape back in the day. Nostalgia rush looking at this. Great stuff


The irony is: I get "Your connection is not secure" when trying to open the link.


I've uncapped MaxRequestWorkers while I wait for the Cloudflare queue, it should work now (I believe you were seeing an error related to TLS timeout).



http://www.networksorcery.com/enp/Protocol.htm has a bunch of network diagrams for various protocols.

This website lays things out really nicely, would love to have more protocols :)


I love the way you present the data (where clicking on the hex values of the bytes gives their explanation). At work I work on software that communicates via RS-232 serial packets; I should make something that turns logs of the packet data into HTML files with an interface like this!


Good work on the illustration.

Using Diffie-Hellman to generate a shared key with each party's private key and the other party's public key is the part that amazed me most when I was trying to understand the handshake back then.


So the record header has 2 bytes for the payload size, and the handshake header has 3 bytes. Am I correct in thinking that the 3 bytes of the handshake header is superfluous? Isn't payload size limited to 2^16 bytes


I think this is explained by the optional compression which is handled at the record layer.


There's a typo in the server keys calculation. It should read "server" and not "client" IMO.


The parties had agreed on a cipher suite using ECDHE, meaning the keypairs will be based on a selected Elliptic Curve, Diffie-Hellman will be used, and the keypairs will be Ephemeral rather than using the public/private key from the certificate.

I think it's important to mention that even with ephemeral cipher suites, the server's ephemeral public key is signed using the server's certificate private key and verified by the client, since otherwise one would be able to MITM the key exchange.


It’s in the Server Key Exchange record (from memory) but i probably didn’t explain it well.


Nice, could have really used this a few years ago when I was making my own HTTP server implementation. Could someone make a PDF of this? Any C++ and C# examples of this?


I might post a PDF later (as the content becomes final), or make a printable view. In the meantime try this:

    # in your javascript console, paste this:
    [].forEach.call(document.querySelectorAll(".record, .calculation"), function(el){el.classList.add("selected")});
    [].forEach.call(document.querySelectorAll(".record, .calculation"), function(el){el.classList.add("annotate")});
    [].forEach.call(document.querySelectorAll("codesample"), function(el){el.classList.add("show")});
Then you can print the page to PDF.


As a crypto ignoramus - Why is the random data from each side necessary? Why can’t things just be encrypted with the PreMasterSecret directly ?


Involving random data gives everybody who gets to pick the random data (so in TLS that's both client and server) a freshness guarantee.

Because the other party needed to know you'd picked this particular random data to make the keys, the messages from them encrypted with those keys couldn't possibly have been pre-recorded / replayed.

In the ephemeral Diffie Hellman modes both parties contribute to the key anyway so this isn't as important, but with old school RSA the random values are the only thing preventing Replay attacks.

TLS 1.3 capable servers also scribble "DOWNGRD" in part of the random field if a client message says it can't do TLS 1.3. If a TLS 1.3 client sees that unusual "random" choice it knows bad guys tampered with the connection (attempted a downgrade attack). If bad guys just change the values, they won't match between client and server and the connection aborts. Older clients think nothing of the unusual random value and carry on as before.


TLS 1.3 capable servers also scribble "DOWNGRD" in part of the random field if a client message says it can't do TLS 1.3. If a TLS 1.3 client sees that unusual "random" choice it knows bad guys tampered with the connection (attempted a downgrade attack). If bad guys just change the values, they won't match between client and server and the connection aborts. Older clients think nothing of the unusual random value and carry on as before.

I haven't looked at the spec in detail, but does this mean that random generation has to specifically exclude that "sentinel value", lest it accidentally occur?


The probability of a particular seven bytes occurring by chance is less than one in a billion billion billion billion.


256⁷ ≈ 7.2E16, which is much less than billion⁴ = 1E36.

Still, the likelihood of this happening by chance is miniscule.


The actual feature uses an 8-byte value, it's just that the DOWNGRD part (the first 7 bytes) is intuitively easy to follow so why spell it all out in hexadecimal or whatever.

So it's one in 2^64 random connections

Also the client isn't even checking for possible downgrade if it got the protocol version it wanted (if I wanted TLS 1.3 and I got TLS 1.3 that is not a downgrade). So if "One in every 16 billion billion connections fails" is unacceptable, upgrade your servers and the problem vanishes.


You’re right, not sure how I screwed up the math


It's about the nonce / initialization vector. Basically it is used inside the cryptos primitives to add some entropy to the encryption itself and prevent a various ranges of attacks. A good introduction to crypto is "Introduction to modern cryptography" imho


This is very nice. I think the green on green and blue on blue highlight are a little hard to see.


Agreed - the color choice is one of the things I never really felt happy with (I'm not a frontend guy).


I love this. Excellent work. Would be cool to see a step-by-step niblets of client/server implementation of each step.


As someone that frequently deals with large scale integration failures I greatly appreciate this site. Thanks syncsyncchalt!


So does this happen for every single request? or is this based on a single session.


The two "application data" records at the bottom are what would wrap around a request/response for an established connection - you can see how much is added to make the strings "ping" and "pong" (though some of that is padding to expand it to a multiple of 16 bytes).

And much of the rest of the records might go away completely if the session were resumed from the client's memory (this wasn't demonstrated here).


Per session.


Unrelated but: based on your understanding of establishing a TLS session with a server, and then traffic through that connection, do you think the new Gmail user interface for the web (desktop) is sufficiently speedy?

The reason I ask in this thread is that this thread treats some of the low-level minimum traffic necessary between clients and servers.


Sufficiently speedy for what?

It's not clear what you're asking. Gmail obviously runs over TLS. It also seem pretty nippy to me, but TLS only has a minor impact on the speed.


>It also seem pretty nippy to me

OK. (my experience since the redesign is the opposite.)


The TLS overhead is pretty negligible for gmail. Even though there are a lot of steps here it is only 2 round trips for the handshake. Gmail will also use TLS 1.3 (this is 1.2) if your browser supports it which cuts that down to one round trip.[0]

https://www.cloudflare.com/learning-resources/tls-1-3/


Gmail was using TLS since day 1. If it's slow for you the first likely culprit is the sheer weight of Javascript and DOM work, followed by making more web requests than necessary.


>the sheer weight of Javascript and DOM work, followed by making more web requests than necessary.

can you (or anyone) put this into quantitative terms? How much are we talking about here? I realize this is a bit off-topic, but the topic is "every byte explained", so I think the people who are interested are in the right place to discuss it.


You can see a lot of what's going on with a e.g. DevTools in Chrome. Right click -> inspect

You'll want to go to the performance tab, and then record and reload the page (ctrl-shift-e). For me, about 3/4ths of the time is spent 'scripting' which is 'JS and DOM work'.

It's not terribly enlightening, though, because all the JS is minified and takes some work to understand.


how much time is that for you? How does it compare to another dynamic site such as the one we're on (HN) when you're logged in?


What does TLS have to do with the Gmail redesign? It used TLS before the redesign, didn't it?


It gives a target or minimum for a fast and lightweight secure web application.


This is what happens under the hood of every ping request.


A ping is very much different. A ping is (typically) simply an ICMP Echo Request, (not TCP, thus no TLS, etc). The receiving device, if accepting echo requests and configured to reply with echo replies, then responds with an ICMP Echo Reply - or some device in the middle (or the device itself could respond with an ICMP unreachable, or some other response - or quite simply drop the ICMP Echo Request entirely and silently).

*Edited an incorrect UDP reference out based on the below comment.


If I'm not mistaken, its not UDP, it's ICMP, like you said


Ahh yeah - good call. Totally different protocol. I guess ICMP more closely resembles UDP at the end of the day, but you're absolutely right. I edited out the incorrect UDP reference so that a person reading for the first time will not get misled. Thanks!


Which is also why some poorly configured network devices firewalls will eat pings - if they for example whitelist tcp and udp protocols and drop everything else (yes, that's a bad idea).

https://security.stackexchange.com/questions/22711/is-it-a-b...


ping would refer to https://en.wikipedia.org/wiki/Ping_(networking_utility) in this context, which is quite unrelated to TLS.


very fine work. the data flow and annotation are clear and concise. good job


Site's timing out and google cache is giving me a 404...


Cool! Would you be able to add a ascii view beside it?


Not trying to come off trite, but the RFC [0] has a simple ASCII diagram of the message flow and the structures that follow are fairly easy to read. Granted you have to hop to a couple of other RFCs to understand extensions and maybe even real world impls to understand some changes (e.g. no time in the random block of client hello), but it's worth perusing if you're interested.

0 - https://tools.ietf.org/html/rfc5246#section-7.3


I think they mean a side-by-side ASCII interpretation like that supplied by `hexdump -C` - where any alphanumeric character is printed as itself and every other character is printed as a "."

It gives a nice strings-like view of raw data.


I was afraid that it wouldn't fit and might be overly complex for what was gained (there's not much text in there). However there's a PCAP at https://github.com/syncsynchalt/illustrated-tls/tree/master/... if you'd like to load it up in wireshark or tcpdump.


Explained in detail, also the web page is well-built.

Thank you.


Yes very useful. Thanks for your effort.


HN hug of death ? Page is timeouting


tls.ulfheim.net - "This site is blocked due to a security threat."

ulfheim.net - no problem

At least according to my megacorp threat filter. I have never actually seen something blocked before, and it's a shame because the page would be great to share with my team.


Wonder why that would be, tls.ulfheim.net is a CNAME to ulfheim.net, they're just different apache vhosts. Same cert (using SAN).

The only things I can think of:

  - it doesn't like the hostname (tls?)
  - the hostname is new, and has no reputation
  - too much h4cking content


This is one of the major problems with threat feeds.


Kind of ironic that their website is using an invalid security certificate?


The site is currently presenting a Let's Encrypt certificate issued just under 24 hours ago. There have historically always (for reasonable definitions of always) been valid certs for this site, although of course I can't tell from here that they were always properly installed.

Likely explanations for your experience:

1. Your clock is wrong. If your system currently thinks this is Thursday 11 October for example, that's a problem, 'cos this is Friday 12 October.

2. There's some subtle configuration error on their server (seems unlikely as it looks to be just a generic AWS setup) that results in the wrong certificate being presented.

3. Your OS or browser trust store lacks the root CA "DST Root CA X3" operated by IdenTrust. If you didn't deliberately choose to do this, you should investigate as most likely you aren't getting important security updates.

All three causes can often by diagnosed by closely examining the detailed error reported in a browser e.g. SEC_ERROR_EXPIRED_CERTIFICATE


https://news.ycombinator.com/item?id=18201211

it mightve been invalid while he set up his cdn / reverse proxy?


It looks pretty valid to my browers (chrome/firefox)?


Thanks, very useful.


Thank you for sharing.


nice (y)


nice work!


Very detailed explanation. Useful even for a refresher on the mechanics. Well done!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: