Hacker News new | comments | show | ask | jobs | submit login
Ask HN: Is there a “ground-up” explanation of PGP/GnuPG?
121 points by kqr on Nov 30, 2016 | hide | past | web | favorite | 29 comments
Understanding how git works internally "from the ground up" has been incredibly helpful in my everyday work; things like blobs, commit objects, hashes and how they connect to form the git experience as I know it. Where I had been cargo-culting along previously, it all became clear once I understood the fundamental model of what was going on underneath the interface.

I feel like the same thing could apply to PGP/GnuPG. I am cargo culting my way along but I feel like I would feel much, much, much more comfortable if I knew how it worked from the ground up.

I have loose ideas of asymmetric cryptography and trust circles and such, but nothing concrete to hinge my actions upon, so I mostly try different permutations of command line arguments until GPG appears to do what I want it to do.

Is there a "from the ground up" good guide to PGP that allows me to break out of this pattern?

I don't know of any full explanations where they dissect the data. However you mentioned (in another comment) that you are familiar with RSA already, so assuming a basic code and crypto background, and from from what I know on a high level, PGP messages are something like this:

    print(number of recipients, algorithm used, etc.)
    for each recipient:
        print(RSA_encrypt(symmetric_key, recipient.public_key))
    print(AES_encrypt(message + hash(message), symmetric_key))
Typically if you send an email from a@example.com to b@example.com, it will find the two public keys from both parties and encrypt the symmetric key for both. The sender obviously wrote it, but they might want to read it back so the symmetric key is also encrypted with their public key, as well as the recipient's.

A random symmetric key is chosen to encrypt the message, since it would be silly to encrypt the whole message for each recipient again and again. And even if there's only one recipient, random key generation plus symmetric key encryption is typically faster than encrypting the whole message with asymmetric crypto (unless the message is just a few bytes, in which case it's fast regardless).

File encryption probably works the same way, except you're typically the sole recipient.

Signatures are done by encrypting a hash of the message with your private key, which everyone can decrypt with your public key to verify the hash. Since you're the only person with the private key, you are the only person who could have encrypted that hash, and since hashes are unique, you must have wanted to sign this text. (N.B. Both keys, public and private, can be used for both encryption and decryption, you just can't use the same key to decrypt if it was already used to encrypt and vice versa.) The hash is used rather than the full message for both speed and because it makes your signature a lot shorter.

Did I miss anything, at least from a crypto standpoint (since I don't know details of the file structure)?

It depends a lot on from which angle you want to understand it. There's a difference between "understanding the variety of command line options" vs. "understanding the meaning of the raw data structures". I learned quite a bit by looking up things in the RFC: https://tools.ietf.org/html/rfc4880

I'm hoping that by understanding the meaning of the raw data structures, I can ask much more educated questions when I am faced with a new operation I want to perform using GnuPG. The idea is that instead of asking myself "what happens if I set a new, future expiration date on a revoked key?" with no clear answer, I could just think in terms of (e.g., I have no idea how it really works) "since the expiration date is set as an optional header extension to the key data structure, and the revocation bit is maintained in the mandatory header, the revocation bit takes precedence over any other extension headers, including the new expiry date I accidentally set."

This way, knowing the raw data structures makes it easier for me to figure out which command line arguments I want, if you will.

That's like saying that if I understand the layout of the Ext4 filesystem, I can ask much more educated questions about how to write shell scripts or other kinds of programs to manipulate files in various ways.

I haven't read it, but "PGP: Source Code and Internals", by PGP's author Philip Zimmerman is worth a try. I've read his other book "Official PGP User's Guide" and learned quite a bit.


Apparently you didn't learn the correct number of n's in Zimmermann ;)

(prz is very particular about his n's)

Mea culpa. I could try to blame autocorrect, but the fault is entirely mine.

> Understanding how git works internally "from the ground up" has been incredibly helpful in my everyday work;

I'm a very competent git user and don't know how blobs work (nor care). I haven't hit a problem scenario so far in which I had to dissect a blob.

> I feel like the same thing could apply to PGP/GnuPG.

If you don't know crypto, the internals of GNUPG are a bad way to learn it.

I would recommend reading a book, such as Applied Cryptography by Bruce Schneier.

Someone who reads that should be a much better informed, much more sophisticated user of crypto, whether it be an application like GNUPG or a some cryptographic programming library or communication protocol. A developer who reads that book should have the know-how to implement some crypto and spot some crypto-related security flaws.

How GNUPG stores things in various formats is less important than the semantics of those things: like what is a private key, what is a signature and so on. You need to understand what is happening when you, say, verify a signature; just not necessarily at the bit level.

If you have time and are fine with it being a bit dry, you can read RFC4880 [0], the RFC for OpenPGP.

This is something I have done some work on (I wrote a basic implementation in an attempt to understand a while ago [1]), but I don't have a nice writeup.

An OpenPGP file, whether it is a public key or encrypted file, consists of a list of packets. Generally it is a binary file, but an armored file consists of this binary in base64 and then a checksum. You can get these packets with gpg --list-packets <file>

Example output from a signed and encrypted file

  gpg: encrypted with 2048-bit RSA key, ID 09FBFEF359DD186F, created 2016-11-30
        "asdfas <sdfasdfasd@asdfasd.asdf>"
  # off=0 ctb=85 tag=1 hlen=3 plen=268
  :pubkey enc packet: version 3, algo 1, keyid 09FBFEF359DD186F
	  data: [2047 bits]
  # off=271 ctb=d2 tag=18 hlen=3 plen=377 new-ctb
  :encrypted data packet:
	  length: 377
	  mdc_method: 2
  # off=293 ctb=a3 tag=8 hlen=1 plen=0 indeterminate
  :compressed packet: algo=2
  # off=295 ctb=90 tag=4 hlen=2 plen=13
  :onepass_sig packet: keyid 0D3B106118D1EFBE
  	version 3, sigclass 0x00, digest 8, pubkey 1, last=1
  # off=310 ctb=ac tag=11 hlen=2 plen=19
  :literal data packet:
  	mode b (62), created 1480523012, name="file.txt",
  	raw data: 5 bytes
  # off=331 ctb=89 tag=2 hlen=3 plen=284
  :signature packet: algo 1, keyid 0D3B106118D1EFBE
  	version 4, created 1480523012, md5len 0, sigclass 0x00
  	digest algo 8, begin of digest 05 c4
  	hashed subpkt 2 len 4 (sig created 2016-11-30)
  	subpkt 16 len 8 (issuer key ID 0D3B106118D1EFBE)
  	data: [2046 bits]
The pubkey encrypted packets contain a key used to encrypt the data. The encrypted data packet includes that symmetrically encrypted data.

When I have more time, I may do a more useful writeup on my site, but currently I am too busy.

[0] https://www.ietf.org/rfc/rfc4880.txt [1] All I could find was my file parsing code, I dumped it at https://github.com/artemist/mupg

If you have some pointers to Git internals explanation, similar to what you're looking for PGP/GnuPG, can you provide them? That would be useful and illustrative :) .

Scott Chacon's videos are priceless. (He's the co-author of the git book.) They explain exactly how git works internally in the simplest terms possible. Either or both of these:

Introduction to Git - talk by Scott Chacon https://www.youtube.com/watch?v=xbLVvrb2-fY

Introduction to Git with Scott Chacon of GitHub https://www.youtube.com/watch?v=ZDR433b0HJY

The latter is newer but a bit longer.

Don't be fooled by the video names, these are introductions to the internals not just the interface.

I'm not OP, but I found this to be useful.


Not the OP, but I have found the following useful:

https://git-scm.com/book/en/v2/ scroll to Chapter 10 "Git Internals"

(Direct link: https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Po... but it only shows you the first page among nine.)

The official git page has very good documentation regarding git internals. (The whole book is worth reading) https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Po... gives you a good overview on how git handles things internally.

Git from the Bottom Up is very good. Previous discussion: https://news.ycombinator.com/item?id=10199391

An Advanced Introduction to GnuPG by Neal H. Walfield


Although it doesn't have PGP/GnuPG, I found "The Architecture of Open Source Applications" to be very interesting and something that should be spread out more.

My work demanded me to read the ITK and VTK parts. Git and GDB are also very nice.


Ooh, ooh, ooh they're partly by Greg Wilson! I'm a huge fan of him! Shame they don't cover GnuPG. :(

This is a great pair of videos that intuitively explain Diffie-Helman key exchange and then the RSA algorithm in a way that requires no previous knowledge of cryptography:

"Public key cryptography - Diffie-Hellman Key Exchange" https://www.youtube.com/watch?v=YEBfamv-_do

"Public Key Cryptography: RSA Encryption Algorithm" https://www.youtube.com/watch?v=wXB-V_Keiu8

Good coverage of the fundamentals before attempting a deeper understanding of PGP specifically.

I found The Code Book a helpful read, though it's very much a high-level overview.

If you want to really understand what's going on at low level, one option is to just read the RFC and follow the references.

I was going to recommend Singh's book as well. I have an older edition of it, and some of it is certainly out of date (I think it has a whole chapter on Freenet, which may still be around but these days has been eclipsed by Tor), but the core principles are quite good. It's among the best explanations of asymmetric cryptography that I've read, anyway.

The low-level is more than good covered, the actual use of the GPG in different scenarios is what's not discussed enough.

To understand the low level you have to learn enough of cryptography. For example, to understand the logic of RSA algorithm, read:


How RSA works belongs to those snippets of random information that I do possess, but can't reliably link together to get a full picture of the PGP experience.

So what do you miss? There are of course other crypto primitives involved: even when public key algorithms are used, the whole message is encrypted with the symmetric cipher, only the key for the message is encrypted with the public key cryptography. An example for symmetric cipher is AES,


but note that older GPG's used less strong symmetric ciphers by default.

There's the RFC about the format of the OpenPGP message:


https://www.nostarch.com/pgp.htm for the command line overview from a sysadmin perspective then look at the PGP/public key crypto section here (or read the whole thing) https://www.crypto101.io/

I wouldn't call it 'ground-up' but you may find this useful: https://www.gnupg.org/gph/en/manual.html

Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact