Hacker News new | past | comments | ask | show | jobs | submit login
Why does APT not use HTTPS? (whydoesaptnotusehttps.com)
515 points by rishabhd on Jan 21, 2019 | hide | past | favorite | 420 comments

I guess I can copy over a comment I made when this previously made rounds:

I have a few problems with this. The short summary of these claims is “APT checks signatures, therefore downloads for APT don’t need to be HTTPS”.

The whole argument relies on the idea that APT is the only client that will ever download content from these hosts. This is however not true. Packages can be manually downloaded from packages.debian.org and they reference the same insecure mirrors. At the very least Debian should make sure that there are a few HTTPS mirrors that they use for the direct download links.

Furthermore Debian also provides ISO downloads over the same HTTP mirrors, which are also not automatically checked. While they can theoretically be checked with PGP signatures it is wishful thinking to assume everyone will do that.

Finally the chapter about CAs and TLS is - sorry - baseless fearmongering. Yeah, there are problems with CAs, but deducing from that that “HTTPS provides little-to-no protection against a targeted attack on your distribution’s mirror network” is, to put it mildly, nonsense. Compromising a CA is not trivial and due to CT it’s almost certain that such an attempt will be uncovered later. The CA ecosystem has improved a lot in recent years, please update your views accordingly.

The other big problem is that people can see what you're downloading. Might not be a big deal but consider:

1. You're in China and you download some VPN software over APT. A seemingly innocuous call to package server is now a clear violation of Chinese law.

2. Even in the US, can leak all kinds of information about your work habits, what you're working on, etc.

3. If it's running on a server, it could leak what vulnerable software you have installed or what versions of various packages you're running to make exploiting known vulnerabilities easier

Even with HTTPS, it's easy enough to figure out which package is being downloaded based on the download size, as explained in TFA.

Every time this comes up, it's always the same handful of incorrect arguments made in favor of HTTPS.

The cargo-cult mentality that HTTPS == security really does more damage than it does good.

I mean deducing the package from download size is way harder than just seeing the name in the open.. security is rarely perfect and more like an arms race, making things more difficult is a big deal. This kind of "you can hack Y too" argument doesn't make any sense if its way harder to hack Y than X.

I would agree with you IF it were actually "way harder", but realistically it's no more difficult than checking the file name or the md5 checksum.

What if keepalive is used and multiple packages are downloaded? Surely that is at least plausible deniability.

Is that even true? In practice rather than downloading a single package you'd download/update a bunch of packages over the same connection, and an attacker would only see the accumulated size, right?

You can see when you run "apt-get install ..." or "apt-get upgrade" that it opens multiple connections to download packages...

And the Debian contributor who wrote TFA says it's possible, and I'm sure he knows a lot more about it than I do.

I'm not sure how APT handles connections, but with a typical browser connections will be reused if requests are made shortly after another.

That doesn't mean it's impossible to determine what packages you downloaded. But it will be more effort to do so.

Less with HTTP2/TLSv1.3, endpoints are also hard to detect with ESNI.

No they aren't. HTTPS fingerprinting is easy. It's been done by lcamtuf years ago and it's available as a layer 7 filter in Linux... TLS adds more information because it prevents proxies and has specific server implementations.

This is addressed in the "privacy" section. Basically, your premise is wrong. TLS does not provide additionally privacy for this usecase. The short summary is: the size of transmitted data makes inferring what you've downloaded from a public file mirror trivial for a passive observer, even with TLS.

Perhaps apt should pad its downloads

Padding doesn't solve the privacy problem without unacceptable tradeoffs (i.e., pad all packages to the same size).

Pad all packages to the nearest in a list of sizes?

How many bits of privacy are you willing to give up here? Debian only has about 48,000 packages. That's almost 16 bits, total, with perfect privacy — all packages enlarged to the same size.

You can select a list of sizes trading off collisions (2 packages with same size => 1 bit of privacy; 4 => 2 bits, etc). But the most you ever get is (nearly) 16. The amount of padding you need to even get 2 bits of privacy (giving up almost 14) on the long tail of large packages is going to be "a lot" and it grows as you want more bits.

Some further pros and cons of padding is discussed elsewhere in the thread.

Providing privacy for the packages that, including dependencies, are less than 100MB in size is something that's probably worth doing. The cost of padding an apt-get process to the nearest say, 100MB, is not necessarily infeasible as far as bandwidth goes.

Instead of padding individual files, how about a means to arbitrarily download some number of bytes from an infinite stream? That would appear to be sufficient to prevent file size analysis (but probably not timing attacks).

Exposing something like /dev/random via a symbolic link and allowing the apt-get client to close the stream after the total transfer reaches 100MB would appear to make it harder to infer packages based on the transferred bytes, without being very difficult to roll out.

This has already been discussed in this very discussion.

* https://news.ycombinator.com/item?id=18959470

It's a big step from "just `cat` the traffic to know" to "start comparing file-sizes and hope no sizes match" in terms of privacy.

I'd say you're flat out wrong — there is no big step. We'll have to agree to disagree, I guess.

Oh, another thing I just thought of: It could leak what vulnerable software you have installed / what version you're running to make exploits easier.

Here is why Videolan (VLC) doesn't do it: https://www.beauzee.fr/2017/07/04/videolan-and-https/

There were some topics about it yesterday.

Eg. https://news.ycombinator.com/item?id=18948195

Some arguments of the blog post are also valid here

Also, their claim that HTTPS doesn’t hide the hosts that you are visiting is about to not be true. Encrypted SNI is now part of the TLS 1.3 RFC, so HTTPS will actually hide one’s internet browsing habits quite well. The only holes in privacy left on the web’s stack are in DNS.

Can you point out where encrypted SNI is in the RFC? I've read the RFC, and I don't recall it being in there. I do see that there is an extension published, which I haven't reviewed in depth.

From a breif review, I see two potential issues:

a) the encrypted sni record contains a digest of the public key structure. This digest is transmitted in the clear (as it must be at this phase of the protocol), so a determined attacker could create a database of values for the top N mirror sites.

b) in order to be useful, the private key for the public keys would need to be shared across all servers supporting that hostname. That's not a big deal for a normal deployment, but it's not great for a volunteer mirrors system -- lots of diverse organizations own and operate the individual mirrors and we need to count on all of those to keep it secure. Also, it adds an extra layer of key management, which is an organizational and operational burden.

Yeah, your parent is wrong about it being in the RFC. ESNI is something that they decided wasn't possible and ruled out of scope for TLS 1.3 but then somebody had a brainwave and Rescorla plus some people at Cloudflare wrote IDs and did live fire testing. The drafts are maybe at the "this is the rough shape of a thing" stage, more than ambitions but not a basis on which to announce specific plans.

It's also pointless without DPRIVE. If people can see all your DNS lookups they can guess exactly what you're up to. That's why that Firefox build did both eSNI and DoH

Yeah it’s not part of the TLS 1.3 spec (RFC8446).

It’s being added, but early stages. Internet draft is here:


For sni to be effective, you need to be hitting hosts that serve more than one domain. Otherwise, you can just reverse map the ip.

Most of my severs don't report the actual service dns back on a reverse lookup. It's generally nodeXX.some.fqdn or clusterXX or lbXX

Doesn't really matter either way - there's a bunch of crawlers and scanners out there such that you can pretty much Google any IP and find a list of sites that are hosted on it.

Fun fact, most of those break if you just close the connection when the client doesn't support SNI.

Not quote the ones I was referring to - many services just look at DNS and get the A record for every domain then offer reverse lookup - complete lists of domains are purchasable for all major TLDs. The only defense to this would be to host your content on a subdomain.

DNSlytics, DomainTools, W3Advisor and others offer this.

It's also easy to just scan a whole range (say, top 1m Alexa domains) and log the IP. You can scan 1m sites on a cheap $5 VPS nowadays.

Yes but if I want to specifically look for traffic to the Debian mirrors, I can use DNS to build a list of the IPs and then see if you're connecting to one of them.

Most services report what they are, and even the server often, when you connect. If you connect to an IP and it's serving a website then I don't know why you'd care that reverse-lookup isn't configured correctly, you're not hiding anything?

Its feasible to build a reverse lookup table of all registered domain names.

I bet DoHTTPS/DoTLS has more penetration than ESNI. Neither are widely implemented though, which is the important part.

Yeah, but not being implemented now doesn't mean that they never will be. Migrating/implementing now will mean that the mirrors will automatically support it once the features are in place and the clients and servers can agree on the improved privacy feature sets.

By host, they probably mean the IP address.

Thanks for commenting this! I've seen this website before and it's really unfortunate how much attention it gets.

APT's use of plain text HTTP (even with GPG) is vulnerable to several attacks outlined in this paper: https://isis.poly.edu/~jcappos/papers/cappos_mirror_ccs_08.p....

Yes, this paper is old, but APT is still vulnerable to most of these attacks. I would advise anyone wanting to use APT to do so only with TLS.

The criticisms in that paper either do not apply to Apt as described in TFA or amount to DoS attacks. HTTPS does not and can not solve DoS.

For a mirror based system like apt it is incredibly trivial. The integrity of a package depends on dozens of organizations and their practices.

When I was a 19 year old idiot, I was responsible for a mirror server. As a bad actor, I could easily get access to a valid organizational cert.

> When I was a 19 year old idiot, I was responsible for a mirror server.

Me too! It was even ftp.kr.debian.org! It still is!

Seriously, people, who do you think has root of official Debian mirror servers hosted by universities? University students. Who are 19 years old. This is literally true.

In my country, the ccTLD registry is run by a university. While the professors have done an excellent job, the NIC itself was hacked a few times back, there is no admin UI (call a 19 year old kid and set your nameservers with NATA phonetics), and they still have some non functional root nameservers.

>Packages can be manually downloaded from packages.debian.org and they reference the same insecure mirrors.

That's a reasonable complaint. I think it would make sense for the individual packages to be signed as well (and checked during install). This way you'd get a warning if you install an untrusted package regardless of the source. I'm not sure why it doesn't work that way.

>Furthermore Debian also provides ISO downloads over the same HTTP mirrors, which are also not automatically checked. While they can theoretically be checked with PGP signatures it is wishful thinking to assume everyone will do that.

I do do that. If you care about such an attack vector why wouldn't you? And if you don't, why should Debian care for you? There are plenty of mirrors for Debian installers, I often get the mine from bittorent, trusting a PGP sig makes much more sense than relying on HTTPS for that IMO.

>Finally the chapter about CAs and TLS is - sorry - baseless fearmongering.

I don't think it's baseless (we have a long list of shady CAs and I'm sure may government agencies can easily generate forged certificates) but it's rather off-topic. Their main argument is that the trust model of HTTPs doesn't make sense for APT, if that's true then whether or not HTTPS is potentially hackable is irrelevant.

> If you care about such an attack vector why wouldn't you? And if you don't, why should Debian care for you?

Security should be as automatic as possible. It should be assumed that any step that requires manual intervention will be skipped by most people.

But the point the page makes is that HTTPS wouldn't be good enough anyway. As such it's not a replacement for checking the PGP signature. I think it's consistent.

If HTTPS could be used to replace PGP signature checks then I'd agree with you but it's not the case. So I go back to my initial point, if you worry about your image being tampered with HTTPS is not enough. If you don't care then you don't care either way.

In a way not using HTTP is kind of an implicit disclaimer on Debian's part. "Don't trust what you get from this website". If they feel like they can't guarantee the security of whatever server is hosting the CD images adding HTTPS might actually be a bad thing because people who might otherwise have checked the signature may think "well, it's over HTTPS, it's good enough".

>It should be assumed that any step that requires manual intervention will be skipped by most people.


1. If you don't care about security, it still doesn't hurt to have HTTPS. Think of it as "extra" that you get for free.

2. If you care about security, you might still don't have the know-how to make sure everything is secure and don't have time to get into it as you're trying to get things done.

3. Even if you care about security AND have the know-how, you might still forget. Nobody's perfect. So it's good that the HTTPS is there.

Nothing is for free, https has additional costs over http. In many cases it makes sense to pay those costs but let's not forget about them.

To be fair to the apt developers/maintainers - the security _is_ automatic when using their tool to talk to their repos.

It's not their responsibility to automate security for people using their repos via different tools.

If the solution was just "install certbot on the server and use a free https cert" then perhaps you could make an argument saying maybe they should just do it. But when the problem space includes aggressively using a global (largely volunteer) mirror network and supporting local caching proxies, I can completely understand why they'd say "Nope. Not our problem, not our responsibility to provide a solution. We've got other more productive ways to spend our and our mirror volunteers time and effort".

> I think it would make sense for the individual packages to be signed as well (and checked during install). This way you'd get a warning if you install an untrusted package regardless of the source. I'm not sure why it doesn't work that way.

Actually, debs are not signed in Debian or Ubuntu. The accepted practice in Debian is that only repository metadata is signed.

The argument is that the design of Debian packages (as in, the package format) makes it difficult to reproducibly validate a deb, add a signature, strip the signature, and validate it again.

Personally, I'm not sure I buy it, as we don't have problems signing both RPMs and RPM repository metadata. Technically, yes, the RPM file format is structured differently to make this easier ('rpm' is a type of 'cpio' archive), but Debian packages are fundamentally 'ar' archives with tarballs inside, and those aren't hard to do similar things. For reproducible builds, Koji (Fedora's build system) and OBS (Open Build Service, openSUSE's build system) are able to strip and add signatures in a binary-predictable way for RPMs.

Fedora goes the extra step of shipping checksums via metalink for the metadata files before they are fetched to ensure they weren't tampered before processing. But even with all that, RPMs _are_ signed so that they can be independently verified with purely the usage of 'rpm(8)'.

> And if you don't, why should Debian care for you?

Those who don’t go out of their way to defend themselves don’t deserve security — seriously, that’s your attitude? Then you don’t deserve to be in any position to make security decisions for other people.

Security isn't on or off. If you think it is then "you don't deserve [...]" It seems like reasonable trade off.

>Compromising a CA is not trivial and due to CT it’s almost certain that such an attempt will be uncovered later. The CA ecosystem has improved a lot in recent years, please update your views accordingly.

This is simply not true. Governments can simply compel your CA to do as they want. Not to mention that "uncovered later" is pretty damn worthless.

That said, I do agree that there should be HTTPS mirrors.

> Not to mention that "uncovered later" is pretty damn worthless.

It is not worthless as a deterrent to the CA. Proof of a fraudulently issued certificate are grounds to permanently distrust the CA. So, yes, they can do it, but hopefully only once.

Can't Debian devs just hookup their own server as a CA? No need to trust anyone outside the Debian universe.

In fact, is the FOSS community running it's own trust network still, it used to be a thing at LUGs.

The justifications for why APT does not use HTTPS ( by default, it is possible to add https transport ) are just mind blowing. It is however not at all surprising considering how broken Debians secure package distribution methodology is -- I'm saying it as someone who had to implement workarounds for it for a company that was willing to spend significant amount of resources on making it work.

Here's are some low level gems:

1. I have installed package X. I want to validate that the files that are listed in a manifest for package X have not changed on a host.

APT answer: Handwave! This is not a valid question. If you are asking this question you already lost.

2. I want to have more than one version of a package in a distribution.

APT answer: Handwave! You don't need it. You can just have multiple distributions! It is because of how we sign things - we sign collections!

3. I want to have a complicated policy where some packages are signed, some are not signed, and some are signed with specific keys.

APT answer: Handwave! You should have all or nothing policy! Nothing policy, actually, because we mostly just sign collections, rather than the individual packages


> 2. I want to have more than one version of a package in a distribution.

> APT answer: Handwave! You don't need it. You can just have multiple distributions! It is because of how we sign things - we sign collections!

This is not true at all. If you need to distribute multiple versions of the same package then all you need to do is provide multiple versions of the same package. Just get your act straight, learn how to package software, create your packages so that you can deploy them simultaneously without breaking downstream software, and you're set.

AFAIK, dpkg doesn't allow to have multiple versions of the same package installed at the same time.

This is not apt's limitation, though.

> AFAIK, dpkg doesn't allow to have multiple versions of the same package installed at the same time.

That's true, but multiple versions of the same package is not the same as multiple versions of the same software. Debian and Ubuntu have access to multiple minor version releases of GCC and they can all coexist side by side.

Too lazy to get up from my comfy easy chair to boot my debian running NAS but 99.9% sure there's more than one version of python on that thing.

No, those are <somepackage>-<someversion> where <somepackage> is different.

Correct. Multi-version packages are not supported by dpkg. They are supported by rpm in limited circumstances (no file conflicts allowed!). Fedora and openSUSE kernel packages work this way, as an example.

The way Debian works around this is by doing "<name>-<version>" as the package name. This is a valid approach, though it makes package name discovery a bit more difficult at times...

As for multiple versions of packages in the repo, "createrepo"/"createrepo_c" (for RPM repositories) does not care.

And for Debian, "dpkg-scanpackages" can be made to not care too, using the "--multiversion" switch: https://www.mankier.com/1/dpkg-scanpackages#--multiversion

This is somewhat at your peril, as I've observed APT getting confused when it parses metadata from repositories produced by dpkg-scanpackages that allows multiple versions in the repository.

However, reprepro does not support this at all, so most deployments with semi-large Debian repositories will not have this option available to them anyway.

> No, those are <somepackage>-<someversion> where <somepackage> is different.

It seems there is a significant semantics gap between how deb packages work and what's supposed to be a package version.

In deb packages, package versions are lexicograhically ordered descriptions of a version ID that is used to guide autoupgrades.

If a packager wishes that multiple minor version releases should be present in a system then he should build his packages to reflect that, which is exactly how Python packages, and specially some libraries, do. For example, python packages are independent at the major version level but GCC packages are independent at the minor version level.

> If a packager wishes that multiple minor version releases should be present in a system then he should build his packages to reflect that, which is exactly how Python packages, and specially some libraries, do. For example, python packages are independent at the major version level but GCC packages are independent at the minor version level.

Not in a system. In a repo. Standard debian tools ( not hacked, not outside the tree, not outside the debian main tree ) do not support this.

If you have a repo with <packagename>-<packageversion> and add a package <packagename>-<packageversion1> where packageversion1 is higher than the package version, the previous version gets deleted from the repo.

Can it be worked around? Sure, you can:

a) have multiple repos. If you want to keep up to a hundred versions, you can just create a hundred repos.

b) you can redefine the meaning of a package name and incorporate the version into the name of the package.

(b) Sounds like a good solution. Except if the org decided to use something like .deb for distribution of artifacts it is probably the org that uses other software. Let's say it uses puppet, which supports Debian package management out of the box, except that now you need to change how puppet uses version numbers because as we started embedding the version name into the package name "nginx-1.99.22" and "nginx-1.99.23" became two different packages not two different versions of the same package.

This goes on and on.

> Not in a system. In a repo. Standard debian tools ( not hacked, not outside the tree, not outside the debian main tree ) do not support this.

That's the semantic gap I've mentioned.

Your statement is patently wrong, as deb packages do enable distributions such as Debian and other Debian-based distros such as Ubuntu to provide multiple major, minor, and even point releases through their official repos to be installed and run side-by-side .

Take, for example, GCC. Debian provides official deb packages for multiple major and minor release versions of GCC, and they are quite able to coexist in the very same system.

All it takes for someone to build deb packages for multiple versions of a software package is to get to know deb, build their packages so that they can coexist, and set their package and version names accordingly.

> Your statement is patently wrong, as deb packages do enable distributions such as Debian and other Debian-based distros such as Ubuntu to provide multiple major, minor, and even point releases through their official repos to be installed and run side-by-side .

Through multiple repos. Not through one repo. That's why testing packages live in a separate repo. That's why security packages live in a separate repo. That's why updates live in a separate repo.

If you are using careful versioning on packaging avoiding the <packagename>-<version> as the convention you are breaking other tools, including the tools that are distributed with Debian. Puppets' package { "nginx": ensure => installed } will not work if you decided to allow nginx to have multiple versions using "nginx-1.99" as a special package name.

> Through multiple repos.

Wrong. The official debian package repository hosts projects which provide multiple major and even minor versions of the same software package to be installed independently and to coexist side-by-side.

Seriously, you should get to know debian and its packaging system before making any assertion about them. You can simply browse debian's package list and search for software packages such as GCC to quickly acknowledge that your assertion is simply wrong.

It is not. You are talking about a hack/workaround/different methodology. That hack cannot be natively integrated with the other software that uses a regular idea of what is a package name and what is a package version (for example, puppet, which ships with Debian). It is a garbage approach just like the APT approach of using a repo manifest for signatures is garbage, and just like approach of not storing crypto hashes of every file in a .deb inside the .deb is garbage.

I have provided the challenge in another reply:

I have in a repo:

package: nginx version: 1.99.77

I need to add to the repo:

package: nginx version: 1.99.78

Both of the versions must remain in the repo. Packages must be signed. Repo must be signed. The name of the package cannot change and neither can the version number. Both of the packages must be installable using a flag that specifies a version number passed to one of the standard Debian package install tools (the tool must be in "main" collection )

What is a tool that can be used that is listed https://wiki.debian.org/DebianRepository/Setup as existing in the "main" part of the Debian? For half a point, you can use any tool listed in the wiki.

Everything else is the hand waving.

repropro has that limitation.

Been there. Done that. It does not work.

I want to have

nginx-1.99.77 nginx-1.99.78 nginx-1.99.78ehjkhk-1

in the same repo. It is not possible.

Unless I'm missing something, this is 100% possible. Now, if you want to carefully control the version constraints of these, welcome to pinning hell, but that's a different problem.

I'm always willing to see how to do it. The repo currently contains


The package name is <thispackage>

I need to add to the repo <thispackage>-<this-version>-<that-patchlevel>.architecture

Packages are signed and repo will be signed. At the end I should be able to install the package using the select the specific version flag without adding another repo.

This is possible, if you are having issues, it is likely you apt repo tooling, not apt itself.

I use aptly and rely on this everyday.

Rubbish. I have every version of the nosh toolset packages from version 1.22 to 1.39 side by side in a single Debian repository.

A listing of all of these packages side by side, along with information on how it is built that includes the very scripts used to do so (including constructing that listing), is one of the value-added bonuses in the GOPHER version of the repository:

* gopher://jdebp.info:70/1/Repository/debian/dists/stable/

* https://news.ycombinator.com/item?id=14837740

Ok, so just to make it sure I understand:

1) It is not listed in the debian wiki

2) To access the "methods" i need to install a gopher client

That's pretty much a definition of the "handwave! Workaround!"

P.S. I have implemented a workaround. It works. It is just not a solution. I should not need to assign people to resign the packages with our own keys and create utilities that would duplicate Debian tools to provide a nominal access to regular required functionality by a large organization that has packages with hundreds of dependencies and could have 5-20 versions of the apps in different production/test/validation/qa environments. Joe Random Engineer expects that his knowledge of how apt-get works, how dpkg works and how puppet works to be portable.

You clearly do not understand, especially given that gibberish about not being listed in some wiki, and utter irrelevancies about dpkg and puppet, which have no bearing upon the matter of publishing one Debian repository with multiple versions of packages at all.

I'm just not going to laboriously re-type it all into Hacker News when you can just go to the published repository and see it all explicitly laid out right there in the repository itself, with the exact scripts and commands that get run to produce the very repository that you are seeing -- quite the opposite of either handwaving or workarounds.

You could get to it with an FTP or an HTTP client, too. But that would just leave you with the raw files in no particular order, rather than the annotated and organized GOPHER listings. A value-added bonus for the GOPHER version, as I said.

You don't actually have any justification at all for your claim that this is somehow impossible with Debian repositories, given that people like me are doing it with a few simple scripts and even publishing them for the world to see; and clearly neither handwaving nor workarounds for anything are required.

First of all, it is the only thing that is relevant. Repo needs to be functional using tools provided by the distribution that uses that repo format. No one cares that one can download tarballs and compile them -- this is not 1997. In 2019 it is "Enter this command and get this result". Feed this result into the orchestration/management framework. Find a bug? Fix the bug. The rest of the system will continue to function.

Second of all, you still have not provided a link to the doc that someone can read without installing a gopher client. Come on, you said you already have it!

> 1. I have installed package X. I want to validate that the files that are listed in a manifest for package X have not changed on a host.

not going to argue with the rest of your flame (others already did) but the answer to this one is

    apt install debsums

This is based around the workflow where one first generates the sums. Notice the -g flag. It is a dumb workaround that I should not need to implement.

How is it not trivial if rogue organizations like "Staat der Nederlanden" have default certificates in the browser?

Hint: The Netherlands is world leader in surveillance of its own citizens and inhabitants.

TLS instead of GPG signatures might be a bad idea, but adding TLS to the transport of signed packages can't make the pipeline less secure.

Unless it misleads people in to thinking they don't have to check signatures because they fetched it over HTTPS which is "secure"

Also, as the article posts out - it's not exactly trivial to deploy https across their global mirror network or to make it work with local caching proxies. That's an easy thing if you've got a handful of servers or a few load balancers, but not so easy or practical for their use case.

(Also, remember most of the apt development had already happened way before free ssl certs became a thing. While saying "Why don't then just use certbot/LetEncrypt is an easy criticism, give them credit for having actually build a GPG sig secured distributed software delivery system years before LetEncrypt existed...)

Because they would be caught doing it very quickly. There are so many ways to detect this.

And ultimately none of those ways to detect this aren't useful against a sufficiently targeted attack with direct access to the signing key.

It's one thing if you sign a bad key for Google.com, publish it in CT logs and then put it up on the public internet - it's quite another if you sign a bad key for midsizecompany.com, keep it out of the CT system and use it only in a targeted attack against non-technical individuals who are unlikely to examine a certificate or use things like Certificate Watch.

With that said, I still believe serving it over HTTPS would be a substantial improvement. Perhaps pin the cert or at least the CA out of the box to prevent such attacks.

CT monitors are primarily for site operators not end users. When site operators spot a rouge cert issuance they can clarify that and ultimately get the cert revoked.

Precisely my point - it's not something an end user would notice. And if it doesn't even appear in CT logs or for the end user it would likely go completely unnoticed.

There really aren't "so many ways to detect this" - there's about 3: the user examines the certificate, CT logs catch it later and detection in browsers of major changes on the most high profile sites. Anything falling outside of those will almost certainly go unnoticed.

When you (a CA, but also just anybody who finds a new one) log a new certificate or a "pre-certificate" (which is essentially a signed document equivalent to the final certificate but not usable as a certificate) the log gives you a receipt, a Signed Certificate Timestamp, saying it commits to publish a consistent log containing this certificate within a period of time (today 24 hours).

The SCT proves that this particular log saw a signed document with these specific contents, at this specific moment.

Chrome (for a long while now), Safari (announced for early 2019) and Firefox (announced but a bit vague on when) check SCTs for publicly trusted certificates.

The browser can look at the SCT and verify that:

* It was signed by a log this browser trusts

* It matches the contents of the leaf certificate (DNS names, dates, keys, etcetera: Distinguished Encoding means there is only one correct way to write any certificate so there can't be any ambiguity)

* It has an acceptable timestamp (not too old, in some cases not too new)

It can also contemplate the set of SCTs and decide if they meet further criteria e.g. Google requires at least one Google log and at least one non-Google log.

If any of these is wrong, the site doesn't work and an appropriate error message occurs, no user effort is needed or useful here.

We know this works because Google managed to do it to themselves by accident once already, blocking Chrome access to a new Google site for - I think it was several hours - because their dedicated in-house certificate group screwed up and didn't log a new certificate.

There is more to do to defend the system completely:

1. You could compel a log operator to emit an SCT, but then not actually log your certificate. Certificate Transparency can detect this, a browser would need to remember SCTs it has seen and periodically ask a log for a proof which allows it to verify that the log really has included this certificate.

2. You could go further and compel the log to bifurcate, showing some clients a history with your certificate in, and others a parallel history without that certificate. This can only be detected using what is called "gossip" in which observers of the log have a way to discuss their knowledge of the log state and find out systematically if there are inconsistencies.

Once both these things are in place, there's basically no way around just admitting what do you did. Which of course doesn't mean any negative consequences for you, but it does make _deniability_ (much desired by outfits like the NSA and Mossad) hard to achieve if that's something you care about.

A state actor using a rogue cert is MitMing the connection in a.very targeted way, and the site operator wouldn't ever see it.

This is the most reasonable reply, and of course it is at the bottom. I wonder if accurate technical commentary will ever make it to the top here.

The arguments are correct. APT does not need HTTPS to be secure. That said, if APT was designed today I'm sure it would use HTTPS. It's now the default things to do, and Let's Encrypt makes it free and easy.

However Debian, where APT is from, relies on the goodwill of various universities and companies to host their packages for free. I can see that they don't want to make demands on a service they get for free, when HTTPS isn't even necessary for the use case.

Also since APT and Debian was created in the pre-universal HTTPS days, it does things like map something.debian.org to various mirrors owned by different parties. That makes certificate handling complicated.

It does not need HTTPS to be secure, but it would need HTTPS to add privacy, for which the protocol has none at the current time.

As explained on the website, HTTPS would not add meaningful privacy at all, because without significant other changes the architecture, what you're doing is still downloading files from a very limited set. The size of the files in most cases is unique so that an onlloker can tell what you downloaded, encrypted or not.

I find this argument not very convincing. Suppose an attacker wants to track people downloading stuff over APT. This is what they would need to do:

In case of HTTP - Step 1: Read the HTTP request payload. Step 2: There is no step 2.

In case of HTTPS - Step 1: Build an index of all possible packages and their sizes. Step 2: Reassemble HTTPS response traffic into individual HTTP responses. Step 3: Look up the response length to the corresponding package. Step 4: In case of identical file sizes, make some sort of model to find out which packet it looks to be based on other packages downloaded (?).

Yes, it's still possible to track people's packages all the same. But you have to have a have way more determined and prepared attacker - it cannot be as easily be done through casual eavesdropping. It's a false equivocation to say it would not add meaningful privacy, as your attacker model changes from casual eavesdroppers to more determined attackers.

You might not care for that particular distinction, and I agree people should have the choice to use HTTP or FTP for APT when selecting mirrors. Unencrypted APT is plenty secure, but encrypted APT is really a little better. In my opinion, there should not be so much resistance for HTTPS in default configurations (e.g. the Debian project could easily require this for their official mirrors around the world). Let's Encrypt makes this so easy, there's no argument anymore in my opinion.

That's because you are glossing over a lot of packet inspection deeper than the TCP level with a glib "Read the HTTP request".

You might want to think about existing surveillance systems. Analysis of telephone traffic is often done purely on the CDR (the caller, callee, and length of call, in simple terms) without the equivalent of deep packet inspection to read the HTTP request, which would be analysis of the actual audio data themselves. The HTTPS case would likewise need just the total octets transferred over the TCP connection for fingerprinting.

There's a lot of glib handwaving in this discussion about identical sizes, not based upon actual measurements of the Debian archive. I quickly looked at the package cache in one of my Debian machines:

    jdebp% ls -l|awk 'x[$5]++'
    -rw-r--r-- 1 root root      3314 Feb 16  2018 nosh-run-freedesktop-system-bus_1.37_amd64.deb
    -rw-r--r-- 1 root root     35190 Dec 14  2016 redo_1.3_amd64.deb
    -rw-r--r-- 1 root root   1114546 Feb 25  2018 udev_232-25+deb9u2_amd64.deb
    jdebp %
It turns out that in practice size alone almost does uniquely identify package in this sample. The other file that is 35190 bytes is version 1.2 of the same package, leaving just 2 possible ambiguities out of 847 packages. It seems likely that this holds after encryption as well.

So the remaining question is how much HTTP pipelining ameliorates this, which no-one here has yet actually analysed.

I'm not sure I understand your reply. Are you replying to me as if I said that determined attackers cannot trace back HTTPS traffic to individual APT packages? Because I said no such thing.

I just made the distinction between casual eavesdroppers and determined attackers. Those determined attackers exist and are quite capable, I'm sure. I said as much in my post.

You might also want to look into your use of the word 'glib' here. I find it an uncharitable interpretation of my post to call it 'glib' or 'glib handwaving', to be honest. Makes it seem to me as if I should be defending something I said, but I'm not sure what.

There's really no difference between a "casual" eavesdropper and a "serious" one. In what world do you live in where the former camp even exists? No one is casually spying on your apt updates, and anyone who is "seriously" spying on your apt updates can trivially manage to identify them by size. HTTPs really doesn't add anything here.

Of course no-one is spying casually on HTTP APT traffic specifically. Nobody is arguing that strawman - nobody here is "living in that world", give me some credit please.

But people spying casually on HTTP traffic in general do exist. People able to spy on HTTP traffic in general casually is one of the main reasons we care about HTTPS in the first place. Even though people can do a targeted content length analysis for nearly all other the stuff we read/watch/download online, too. We still care about HTTPS for all of that. And we should probably care for that with APT too, if only a little bit.

TL;DR HTTPS gives you potentially more confidentiality but not guaranteed as known vulnerability exists which an advanced attacker can exploit. You should not assume confidentiality when using APT over HTTPS. The severity of this issue in a CVSS is going to be very low because it is only an information leak.


ISPs, for example, eavesdrop us all the time, and they do it quite casually. They will modify your unprotected HTTP requests, inject ads, log everything they are able to, and sell the data if they can.

Wouldn't help. If almost all of the files' sizes are unique (I'm pretty sure compressed package sizes aren't even block-aligned), and you know the sizes ahead of time, and you can make test requests for samples of what headers are being sent/received, it's trivial to calculate which combination of packages would result in a given stream length using pipelining. You'd have to add countermeasures like padding or fake data.

You could quantize these with up to 10% padding and cause a very large number of collisions, but that wouldn't be useful without HTTPS. Is the core argument that privacy is not attainable, or that it is not valuable?

I think the core argument first of all is that it is not attainable trivially by "just using HTTPS". So it's a question of costs vs. benefits, where the costs are pretty big (change the whole infrastructure).

This is wrong. A great deal more privacy is attainable by trivially using HTTPS. Privacy in the presence of stream inspection is more difficult, but attainable by padding files to have quantized lengths.

You are entirely incorrect in this instance, for all the reasons laid out in this thread.

Looking at stream sizes is not "advanced deep packet inspection", it's in fact the opposite of that.

Up to 10% padding likely isn't sufficient to provide privacy. Think of the long tail of large packages.

Quantizing up to 10% collapses 43,000 values into 120. It's not perfect but it removes a huge amount of information.

How do you arrive at 120? I guess that makes sense if the biggest packages is within 92709x (1.1^120) the size of the smallest. But that doesn't seem like enough range, just eyeballing it. If you have a 1kB package at the low end, I'd be surprised if Debian didn't have a package bigger than 92 MB.

Presenting it as 120 from 43,000 is a bit of an oversimplification, because the average isn't meaningful. The long tail is going to have the worst privacy and the small packages will (probably) have the most.

A scheme like this might be workable but requires being really careful about the security properties you're claiming (i.e., of those 120, probably half are unique, large packages). And obviously, this scheme requires up to 10% additional bandwidth, in the case of the chosen 10% threshold. If buckets change over time, packages moving between buckets may leak a lot of information.

"Step 1: ... etc."

This sounds less like some sort of massively impossible barrier to overcome and more like a Project Euler problem, and one not all that far into the sequence, either.

One of the things you have to overcome if you want to think like a security person is that, yes, there are attackers that will put some effort into attacking you if you are a target of any consequence, certainly effort far exceeding what you just described. I've watched some people at the company I work for have to overcome that handicap myself. Yes, there are attackers that are not just script kiddies and actually, like, have skills and such.

Attackers won't jump through infinite hoops, but getting a foothold on a network somewhere where they'd like more access, seeing that they can watch a new system in your network getting provisioned, and cross-checking that against a list of known vulnerabilities by looking at package sizes would be boringly mundane for them, not something wildly exotic.

There are two reasons you want to compromise a host:

1) You're building a botnet (or, these days, are crypto mining). In that case you're not targeting a specific machine, you just want many of them.

2) You want to exfiltrate information from a host, or sabotage it. In that case you're targeting a specific machine.

I'd argue that in both cases, the proposed attack vector of inferring installed software versions through apt downloads is inferior, or at least more involved. In case 1) you're better off scanning for known vulnerabilities or make use of shodan and the likes. In case 2) you're probably going to probe the server anyways. It might take a little more time than if you just had a complete list of installed packages and their version (given you were somehow able to eavesdrop on the host in the first place), but you'll most likely determine at least what OS is running and what technology stack their internet-facing services are running on after some nmapping.

Or, looking at it from the other perspective, I wouldn't really feel much safer if apt were using https. I'm not against it, but I don't think it's a priority, especially if it needs a lot of coordination between different people, which always turns out to be very time consuming. Just being fast with updating packages seems a better investment of that time.

> Or, looking at it from the other perspective, I wouldn't really feel much safer if apt were using https. I'm not against it, but I don't think it's a priority, especially if it needs a lot of coordination between different people, which always turns out to be very time consuming. Just being fast with updating packages seems a better investment of that time.

This is exactly my position, to be fair. We're all bike-shedding here as far as I'm concerned - including this very website. I think the position that HTTPS doesn't help you is a little bit disingenuous, and the only fair position is that "coordinating this stuff takes time and effort we don't feel is worth the negligible advantages" (as you say, and as this website says) is a more acceptable argument than "the negligible advantages don't exist" (as this website seems to also want to say).

A: Assuming package privacy could somehow be protected (a question I leave to part B), then I agree that there is no improvement in security against capable attackers: they would have to focus on hacking the APT servers, on which the attack surface can be minimized, and monitoring and logging any deviations can be tracked and published for all to inspect.

B: if the packages are concatenated to each other and a random length noise string, we can substantially frustrate nation state / ISP level attackers to the point of forcing them to get this information from the endpoints themselves: Either end user, or the APT server must already have been compromised.

1) End user not yet compromised: in order to capture these, they must attack the APT server.

2) End user already compromised: on each update of all compromised users, information is sent to C&C, so this would produce lots of opportunity for attentive users to discover the implant.

When focussing on the APT, which would give the cleanest record of attack surfaces, the community can put man power on designing minimalist APT servers, and inspecting published deviations in communications can lead to uncovering 0days.

EDIT: changed disagree to agree, as I (incorrectly) thought you were arguing it would not make attacks more expensive, woops!

Any self signed certificate? Then the attacker who is already capable of intercepting packets can just serve whatever and proxy your request. Your argument to the main argument falls short just the same..

Agreed, went a bit overboard there. Removed the part about self-signed certificates.

Steps 1-4 are very easy; they just require some dev work. Thinking an attacker won't bother because it sounds annoying to implement is security by obscurity.

No, it's security against a specific class of attacker that your threat model is considering.

We know there are nation states that build profiles of each user based on their HTTP requests, but we don't know of any that have written custom software specifically to target Debian users.

It would take them half an hour to write that custom software. Under what threat model is "nation-states that can't spend half an hour writing code" a meaningful adversary? Note in particular that they can identify users by previous traffic statistics (and just normal traffic flow statistics that might be kept as a routine log by any network administrator, even); they don't need to write the code in advance. Given the number of bytes transferred, the times, and an archive of the state of the Debian archive (which is publicly available), they can always identify past downloads.

(Would it help if I wrote that code right now and put it on GitHub?)

> It would take them half an hour to write that custom software.

It would take a software engineer half an hour to write that custom software. It would take a government years to amass the political will to target such a small section of the population, and then potentially hundreds of thousands of dollars for a government contractor to offer a solution and implement it.

There's always going to be a big difference in threat level between a piece of software which already exists and a piece of software that could exist. For example, when you're snatched off the streets by the secret police, and they go to investigate what you've been doing in the country, they might be able to request from HQ a list of HTTP addresses fetched from the IP address associated with your apartment, but they're unlikely to be able to request that HQ write some software to go back retrospectively and count bytes of individual connections you made.

> (Would it help if I wrote that code right now and put it on GitHub?)

No, but it would help if you wrote a patch for APT which made it use HTTP range requests to hide the size of the files it downloads. That should only take half an hour, right?

I continue to be confused by this threat model where all government agencies you're worried about are plagued by massive levels of US-style governmental bureaucracy and can't get anything done, yet they're capable of being meaningful threats. (Also where the only entities you're worried about are government entities.)

The entities I'm worried about have bought off-the-shelf surveillance tools which record the HTTP requests associated with each IP/MAC address. This is a minimum viable product for governments and ISPs (not to mention businesses, like hotels and coffee shops), and it is reasonable to think that such a system is deployed on orders of magnitude more networks than a system that tries to infer Debian package downloads from counting bytes of HTTPS traffic.

> Read the HTTP request payload

The thing to note here is that the only reason this seems easy is that there is tooling readily available for such a task. If you didn't have such tooling, you'd find it more difficult to implement then your HTTPS case even _with_ tooling.

The same principle applies to your HTTPS case. Your argument disappears as soon as there is tooling. That tooling only needs to be written once. Perhaps it already has been done and exists in the circles where people want to surveil apt users. One possible reason such tooling isn't widely available is that apt doesn't use HTTPS by default, and one outcome may be that if apt switched to HTTPS the tooling would appear.

I have half a mind to write the tooling and publish it just to eliminate this argument. It really isn't very difficult.

There's a big difference, privacy-wise, between being able to say "X is talking to this debian mirror and thus probably running debian" and "X is downloading exactly these packages from this debian mirror".

The point being made is that the fact that X just talked Y bytes to this Debian mirror is enough to know exactly which packages were downloaded.

The argument that https obfuscates which packages you download is not a good one, and may cause users to unnecessarily worry about the implications (and conversely, that they end up "more safe" if that was not the case). If that type of privacy is desirable you should probably use something like Tor.

When I say `apt-get install foo` and it brings in 47 other packages, the problem gets exponentially more difficult. "He downloaded 1,432,509,104 bytes; what packages were those?" is more or less O(2^n): https://en.wikipedia.org/wiki/Knapsack_problem

If you download exactly one package, it may be easy to deduce which one it was (assuming that the protocol overhead is identical each time, and that changing timestamps and nonces doesn't affect the byte length whatsoever, etc.). If you download more than one at a time, which is common with Debian, then the problem is a whole lot harder.

An HTTPS stream can be side-channel (ie size, time) broken down into black-box HTTP/1 requests quite easily. Remember, even with Connection: Keep-alive, you still have to request every file synchronously after you're done with the previous one.

> Remember, even with Connection: Keep-alive, you still have to request every file synchronously after you're done with the previous one.

Wait, why? I don't know of any reason HTTP clients can't pipeline requests.

They can but they do not because servers are broken (and terribly so) and it brings less than zero gain for big files.

See, debian does not want to manage specific mirror server features if they don't have to. If they were in that position they'd make their own protocol.

Right, that makes sense. So it's not so much a client limitation as a minimum common denominator for their hosting infrastructure.

its not because a naive implementation of privacy enhanced APT would fail that all implementations would failt at protecting privacy. Concatenating all the packages and a variable length noise string together, should go a long way.

It would be more fruitful to discuss the different ways an attacker might deduce what software was installed from a naive implementation: download sizes, download date (i.e. new update available for package P, then a substantial fraction of users who were downloading from the server that day were probably installing P etc...

In theory an onion router might substantially improve the situation if the attacker has a hard time identifying which server the user is talking to, and thus making it hard to identify if the user is even installing anything at all...

sadly I don't trust TOR as long as I can't exclude a specific attack scenario I have always suspected about TOR but never actually known to be present...

> its not because a naive implementation of privacy enhanced APT would fail that all implementations would failt at protecting privacy. Concatenating all the packages and a variable length noise string together, should go a long way.

Sure, but then you are not talking about "just use HTTPS", you're talking about creating your own protocol and requiring all APT packet sources to speak that protocol, requiring a specialized server software, where currently they can just use whatever HTTP server they want. Switching the whole infrastructure and installed base over to that would be a massive multi-year project, not just a handful code and configuration changes.

Note I use the phrase "naive implementation" and was never talking about nor clamoring for "just use HTTPS".

The real discussion is not "blindly use HTTPS, or leave it like it is", for me the real interesting question is: can we design a package distribution system that preserves privacy against nation state level actors? can we virtually force those to attack the ATP servers themselves? could we use oblivious transfer ? could we design a fresh minimalist onion router (as opposed to bloated TOR) for package distribution?

> can we design a package distribution system that preserves privacy against nation state level actors?

That's a great question. Also, can we do it on top of the current Debian infrastructure (HTTP and everything)? Or do we need to change anything?

> Also, can we do it on top of the current Debian infrastructure (HTTP and everything)?

I'm pretty sure that is not possible, because the current infrastructure is just plain old HTTP file servers (anything that can sling bits will do) run by whoever fancies being a part of it.

If you create an onion layer like the GP proposes, I don't see any loss from it running over HTTP servers.

It already exists. It's called tor.

But the question here was why https isn't default in apt, not why tor isn't.

you're replying downstream of my comment containing "for me the real interesting question is:..." where I generalized the question away from the false dichotomy "keep apt as it is, or make https default in apt"

> The real discussion is not "blindly use HTTPS, or leave it like it is", for me the real interesting question is: can we design a package distribution system that preserves privacy against nation state level actors?

That is certainly an intersting theoretical question, but in practice there is also the question of the costs of something that requires you to change and complicate the whole distributed infrastructure vs. the benefits - are there actually real people who need privacy against "nation state level actors" specifically concerning the Linux packages they install?

what about journalists, whistleblowers, strictly vocal critics of the centralized surveillance state [0], lawyers, politicians, ... ?

[0] https://news.ycombinator.com/item?id=16947652

EDIT: just adding, we also don't know what the cost is of the most efficient privacy presevering distribution method actually is. only when people investigate and try will we find out.

> variable length noise Just thinking... could this be achieved by simply adding some extra response headers on the mirror side? That is the response could contain some headers like

   X-Unnecessary-Header-0: [random-string]
   X-Unnecessary-Header-1: [other-random-string]
This could be enough to insert random-length noise without the need to invent any new protocol. Of course this would only be effective for smaller packages as I assume that headers size is limited and as a result this would significantly change the perceived download size only for smaller packages.

that might work, and the larger packages might be sent as a sequence of parts, so that essentially all packages have the same size

other things to keep in mind is the dependency graph, some large packages might be rather unique in total filesize downloaded

The post you replied to outlined a way to determine which packages were downloaded for the HTTPS way, so I don't get what you're trying to say here.

Preventing script kiddies or dickhead ISPs logging your traffic is meaningful privacy.

That is, in fact, the main threat everyone should protect against.

Privacy against script kiddies and dickhead ISP is valuable. Not every scenario is a determined attacker targeting you personally.

This can easily be solved by inflating the wire size of the file but the received download, transmitting junk packets, that fail to decode, but still look like encrypted noise to an eavesdropper.

Not everyone uses fast unlimited internet.

Pretty much everyone using Debian flavored Linux is.

That doesn't sound like an argument against it to me. It just means that in addition to HTTPS, they also need to make "significant other changes the architecture".

...which would be a huge undertaking given the vast installed base and the fact that packet sources currently don't run any custom server software at all, which would need to change.

I'm not so sure about that. Assuming all the packages are downloaded over a single connection, you can easily pad the response client-side via HTTP range requests, for example. No special server software required; just a normal HTTP server.

You're downloading form a third-party volunteer. What protocol can provide privacy given that condition?

There is a huge difference between the third-party knowing what you're doing, and everyone in between knowing what you're doing.

The HTTPS everywhere movements are attempting to make privacy the default rather than the exception, but are of course done knowing that the server will always know whats up. The point is to make it so that only the two parties concerned, the server and the client, comprehend the communication rather than the entire world

Okay, suppose I want to know what packages you're installing and updating. I have two routes:

1. I gain access to a router near you.

2. I rent a sizable server at the hetzner/ovh/… location that's closest to you and volunteer to run a mirror for the OS you're using.

Both are somewhat uncertain (your traffic might flow via a different route, you might be load-balanced to a different mirror) but the uncertainty seems comparable. Option 2 seems so much easier that I have a real problem seeing the point of even attempting option 1 if all I want is the information option 2 would give. Perhaps someone can explain?

You'd have to take control over my mirror. Making a new mirror will not get you any traffic unless people actively choose to use it. Therefore, your option 1. is to control a part of the public route, and option 2. is to control the mirror of my choice.

However, any argument against using encryption for privacy for APT can equally be applied to any other traffic. Do you trust your public internet route enough to let your traffic run authenticated, but unencrypted? Chats, news, bank statements, software updates?

Even if content cannot be modified, it can still be blocked or made public. There are quite a few nosy governments that would like to know or block certain types of content, software packages included.

Lots of people use the nice hostnames like ftp.us.debian.org. It wouldn't be too hard to get included in that host name if you're determined. I haven't looked into the requirements, but I'm pretty sure it's a) be technically competent enough to run a mirror (it's not hard) b) have a lot of bandwidth. c) be organizationally competent enough to convince Debian that a and b will hold for a long time.

You will instantly spot blocking because apt reports connection failure and hash failure and replay attack on update list is ineffective.

As for privacy, eh? It is visible you're connecting to a debian mirror and what size of update list you're getting. Barring that, indexing packages by size is trivial.

You want true privacy, you'd have to use Tor or such.

Option 1 is hard for individuals but easy for state actors.

For example, they might want to know what versions you're running (by looking at what updates you _didn't_ download) so they can target you or lots of people at once.

Since it's known to which host you connect and the pattern of access is known to, it's a reasonable guess that it would be possible to infer the list of packages that you're downloading from observing the encrypted traffic.

Instead of JUST that third party knowing what (possibly vulnerable) packages you have, you are now letting everyone know what possibly vulnerable packages you have.

> you are now letting everyone know what possibly vulnerable packages you have.

Erm, what?

The number of people, who can listen to (much less — modify) your traffic is very small. It is basically your ISP (who is supposed to offers you services in good faith, not spy on you) and a number of engineers, who maintain Internet backbone. That's far from "everyone". Some SSL evangelists make it sound like everyone's traffic is permanently broadcasted to everyone else in the world, but it is not.

As for "vulnerable packages", the most certain sign, that someone does not install security updates, is lack of traffic between them and update servers. But that's orthogonal to use of encryption.

"everyone" in this context obviously means everyone on the path, and any attackers that have compromised nodes along the path. See the Belgacom hack by 5 eyes...

And I those attackers can still tell everything based on the target IP and the payload sizes.

there is no proof that this is true in general, so it is worth trying to find 1) an inefficient way in order to 2) postulate an efficient way...

for example, overlay onion routing, size blurring by appending random length random bits,... with oblivious transfer even the APT-server does not know what you downloaded (but that would require a large amount of information..., nevertheless oblivious transfer might still be a useful tool when used as a primitive, perhaps just to send a list of bootstrap addresses for p2p hosting of the signed files etc...)

Couldn’t this be used to identify possible hosts that are running exploited software. So you watch a target and keep track of their installed packages. You also monitor for zero day exploits. The instance you have identified a zero day you also have a list of high probability targets to test it out on or exploit. Privacy is more then just I know you have blue eyes or brown eyes

Genuine question: what are you doing with APT that requires privacy?

In the past cryptographic export was illegal in the us and many other countries, it can still get you put on a list.

In some countries messaging apps like signal and telegram are illegal.

There is no telling what seemingly benign software will be made illegal in the future for political reasons.

Privacy is always a requirement because of these reasons.

Size fingerprinting defeats such "privacy". If you need this, use Tor instead.

Cause pushing the barrier of how easy it is to detect is a bad thing.

This is very asymmetric though. The cost for an attacker is very low. The cost for Debian and it's mirror network is very high.

Given that the cost of implementation is high and the protection is minimal the decision to not do so is reasonable.

> The cost for Debian and it's mirror network is very high.

I'm curious how high it actually is. They say it's high, but that could well just be hand-waving. Sure, prior to things like LetsEncrypt those SSL certs would have been a notable financial burden. There's also some extra cost on infrastructure covering the cryptographic workload, but increasingly the processors in servers are capable of handling that without any notable effort.

Certificate cost is trivial. Let's encrypt makes it free, but with a small change to the host names (country code.ftp.debian.org instead of ftp.countrycode.debian.org), all mirrors could have been covered with a single certificate. Some CAs will let you buy one wildcard cert and issue unlimited duplicate certificates with the same name. So, that would cost some money, but probably not too much.

The real costs are organizational and technical.

Organizing all the different volunteers who are running the mirrors to get certificates installed and updated and configured properly is work. Maybe let's encrypt automation helps here.

From a technical perspective, assuming mirrors get any appreciable traffic, adding https adds significantly to the CPU required to provide service. TLS handshaking is pretty expensive, and it adds to the cost of bulk transfer as well.

I get the feeling that alot of the volunteer mirrors are running on oldish hardware that happens to have a big enough disk and a nice 10G ethernet. I've run a bulk http download service that enabled https, and after that our dual xeon 2690 (v1) systems ran out of CPU instead of out of bandwidth. CPUs newer than 2012 (Sandy Bridge) do better with TLS tasks, but mirrors might not be running a dual CPU system either.

Old hardware will eventually die and needs replacing. I run infrastructure for a CDN setup, and we actually _reeuced_ the CPU overhead with TLS 1.3 + HTTP/2.

When someone says the cost is high, most people jump to monetary issues. The cost is in the time and effort required to make the changes, and to have those changes synchronised across every single APT mirror.

None of your business.

But! As mentioned above, outside entities being able to monitor exactly which versions of which packages are being installed to which hosts is a significant security risk.

This sort of comments aren't helpful. Switching to HTTPS will require tremendous amount of work from volunteers. You need to convince me that (1) your usecase exists (2) your usecase can be remedied with HTTPS.

Nothing requires privacy. However, the HTTPS movement is about making privacy the default, not the exception.

The HTTPS movement is about ensuring that only Google gets your data. It's not about privacy.

I don't quite follow. Please explain why Google gets the data.

Presumably Google Analytics which can embed on HTTPS pages.

Which is already blocked by 40% of the users via ad-blockers.

Edit: some report even over 70% that block GA.

Using an adblocker is not part of HTTPS.

If I was a particular government, I could block apt requests for the tor package e.g.

We deploy our software packages to our own infrastructure and clients using a private APT repository and basic HTTP auth. Obviously we're running it with apt-transport-https installed for making the latter not completely insecure.

I see no reason to do that for signed packages from the main repositories, however.

For a theoretical example, hiding the fact that I have installed encryption or steganography from my government.

Genuinely: it's 2019: shouldn't privacy be the default?

yes but privacy is a whole another thing that is maybe not worth it. with mirrors and so on getting https to properly work is not trivial. sure it would be nice.

HTTPS is really quite trivial, especially with the advent of letsencrypt. This is especially true for simple package protocols like APT, where a repository is simply a dumb HTTP server coupled with a bunch of shell scripts that update the content.

Assuming that we consider SSH-ing into a server a negligible effort, then adding HTTPS to a APT repository or mirror is also a negligible effort.

As for whether privacy is worth it: Absolutely, especially in this day and age. There is very rarely a cost too high when it comes to privacy, and in this instance, it comes for free.

The problem is, HTTPS is not designed for privacy in any meaningful term.

1) TLS session negotiation leaks all sorts of useful data about both systems, not to mention TCP and IP stack on which it sits. This data is grabbed in 5 minutes with an existing firewall filter. Combined with IP, it shows the exact machine and web browser (incl. Apt version) downloading the file in many cases.

2) It does nothing to prevent time, host and transfer size fingerprinting.

3) Let's Encrypt helps with deployment but you get rotating automated server certificates. It is reasonably easy to obtain a fake Let's Encrypt certificate so without pinning it is worthless for authentication, pinning a rotating certificate is hard too.

Debian does not have resources to handle impostor mirrors.

it's not trivial if we are talking about Linux boxes serving as servers let's encrypt has a good chance to not work out of the box, and especially with older boxes. and then there i are other things like needing a http server for obtaining the cert rotating it, distributing it. and you loose the ability to use a proxy, and so on. with https you are still not protected with them knowi g where you get only what you did there.

it would be great to have the ability to have https but for APT in its current form and for what it is used the cost benefit for adding https is not that compelling to me.

Why would it? As the site lays out, HTTPS is strictly inferior to the scheme used by APT.

You can not get more security by adding a less secure mechanism to a better one. It's not additive.

Of course you'd get more security.

Analogy: It's like your hanging on a rope and also add a safety net. If the rope breaks, you only fall on the safety net instead of the ground.

All software has bugs, so if you add two buggy security solutions, you might only be able to exploit a bug in one, but the other still gives you the safety.

That's not always true though, and it can be really hard to tell if a particular combination is going to make things better or worse. e.g. the "Breach" HTTPS+Compression vulnerability.

To continue your analogy: The rope gets tangled in the safety net, forcing you to jump or proceed up with a loose rope, because you can no longer move the rope..

This is not the physical world.

Digital security strength is measured on orders of magnitude, and two mechanisms providing security with very different orders of magnitude do not add in any practical sense.

If you've seen today's bug yesterday, or carefully looked at the previous CVEs, you would have seen that https would have significantly reduced the probability of exploitability.

> All software has bugs, so if you add two buggy security solutions, you might only be able to exploit a bug in one, but the other still gives you the safety.

I am unconvinced about the last part. More commonly an exploit in either will cause security to fail, so adding more steps just adds more attack surface and leads to less security.

It depends on the threat model, here TM1 and TM2:

TM1: attacker does not posses zerodays to installed software

TM2: attacker possesses speccific (perhaps OS, perhaps library, perhaps userland) zerodays, usage of which (including unsuccesful attempts) should be minimized to avoid detection

in TM1: it's ok to use HTTP in the clear, as long as signatures are verified

in TM2: everything should be fetched over encrypted HTTPS, since HTTP would leak information about available attack surface

EDIT: not only would this increase security by not revealing what a user installs (perhaps download some noise as well such that it becomes harder to detect what a user is installing?), it could also improve security by turning the APT servers into honeypots, so that monitoring these can reveal zerodays...

TM3: attacker has a 0-day against a complex https server and they replace packages on some mirrors.

TM4: attacker can impersonate a server using Lets Encrypt certificate and bypass their automated verification, creating a fake mirror or a bunch. (HTTP has same vector.) They can also make DNS fail or reroute.

TM5: attacker has a 0-day against the more complex https client (e.g. curl).

TM6: Attacker fingerprints network connections to given servers by size and os or os + tls fingerprinting data

TM3 and TM5 are specific subcases for TM2, and turn the APT server / client into a honeypot

TM6: we should have an overlay onion router, and I agree that the current complexity is worrying, I'd love to see a minimalist version of TOR (with minimal I don't necessarily mean the code size should be small, but minimal assumptions, and that the safety of the system can be verified from the assumptions)

TM4: I don't understand, Lets Encrypt does not calculate private keys for public keys...

> if APT was designed today I'm sure it would use HTTPS.

Or it would use the much more modern and more secure Noise, like I think QUIC will end up using through nQUIC:



Yeah, no.

nQUIC is maybe an interesting approach for some specialist applications but more likely a dead end.

The idea you can replace QUIC with nQUIC is like when Coiners used to show up telling us we're going to be using Bitcoin to buy a morning newspaper. Remember newspapers?

nQUIC doesn't have a way for Bob to prove to Alice that he's Bob beyond "Fortunately Alice already knew that" which is the assumption in that ACM paper. So that's a non-starter for the web.

nQUIC also doesn't have a 0-RTT mode. Noise proponents can say "That's a good thing, 0-RTT is a terrible idea". Maybe so, but you don't have one and TLS does. If society decides it hates 0-RTT modes because they're a terrible idea, we just don't use the TLS 0-RTT mode and nothing is lost. But if as seems far more likely we end up liking how fast it is, Noise can't match that. Doesn't want to.

Noise is a very applicable framework for some problems, and I can see why you might think APT fits but it doesn't.

> nQUIC is maybe an interesting approach for some specialist applications but more likely a dead end.

Adding on to this, nQUIC (and Noise specifically) is significantly better for use-cases where CAs and traditional PKI don't make sense, e.g., p2p, VPN, TOR, IPFS, etc...

I agree that APT is not one of these cases. Currently APT has a root trust set that is disjoint from the OS's root CA set, but they could easily do HTTPS and just explicitly change the root CA set for those connections.

EDIT: from the nQUIC paper:

> In particular nQUIC is not intended for the traditional Web setting where interoperability and cryptographic agility is essential.

On another note, I think it would be helpful to expand some points for other readers:

> nQUIC also doesn't have a 0-RTT mode. Noise proponents can say "That's a good thing, 0-RTT is a terrible idea".

0-RTT is dangerous because of replay attacks. It pushes low-level implementation details up the stack and requires users to be aware of and actively avoid sending non-idempotent messages in the first packet.

> Maybe so, but you don't have one and TLS does. If society decides it hates 0-RTT modes because they're a terrible idea, we just don't use the TLS 0-RTT mode and nothing is lost.

One major point of using Noise protocol is to _simplify_ the encryption and auth layers, remove everything that's not absolutely necessary, and make it hard to fuck up in general. Things like ciphersuite negotiation, x509 certificate parsing and validation, and cryptographic agility have been the source of many many security critical bugs.

From an auditability perspective, Noise wins easily. You can write a compliant Noise implementation in <10k loc, vs. OpenSSL ~400k loc.

> But if as seems far more likely we end up liking how fast it is, Noise can't match that. Doesn't want to.

HTTP is insecure, but faster than HTTPS. Most sites now use HTTPS regardless. 0-RTT is insecure and while it might be OK for browsing HN, removing 0-RTT makes it much harder to fuck up.

>I can see that they don't want to make demands on a service they get for free, when HTTPS isn't even necessary for the use case

you could imagine a situation where https would be optional for APT mirrors. Then the package manager would have a config flag to use any mirror or only https-enabled mirrors (probably enabled by default). This would allow to use https without creating any demands to organizations that host those mirrors - if they can they would enable it, but it would not be required. The https-enabled hosts could also provide plain http for backwards compatibility.

The argument isn’t correct, what does a user do when the download is damaged by an injection? A re-download results in exactly the same tampered with file.

Yeesh, as someone who has had to troubleshoot more than a few times HTTP payloads that were mangled by shitty ISP software to inject ads or notifications I would love HTTPS as a default to prevent tampering. I get the arguments against it, but I have 100% seen Rogers in Canada fuck up HTTP payloads regardless of MIME types while rolling out "helpful" bandwidth notifications. Signatures will tell you yes this is corrupt but end-to-end encryption means that the person in the middle can't even try.

Likewise, integrity of the download is the primary reason I’ve switched downloads to HTTPS too. The argument that singed downloads is enough fails to address what the user is supposed to do after the integrity has failed? A redownload can result in the same tampered with file. This isn’t hypothetical btw, it happens in the real world, I’ve had ISPs in Ireland and South Africa damage downloads due to their injections and users don’t care if it’s their ISP, they get pissed off at you unfortunately.

APT many not be using HTTPS. But APT does support HTTPS builtin since version 1.5. I couldn't find any changelog or NEWS for that, but https://packages.debian.org/sid/apt-transport-https says so.

I myself uses HTTPS mirror provided by Amazon aws (https://cdn-aws.deb.debian.org/). I do so because My ISP sometimes forward to it's login page when I browse HTTP URLs. Also, it does sometime include Ads (Yeah, it's really bad, but it does remind me that I'm being watched).

I watched this blow up on infosec twitter and it made no sense, APT has https or even Tor if you really insist on it. Takes 30 seconds to configure.

Storm in a teacup.

Defaults matter, as that is what a large majority runs with.

Yeah true, but the arguments for tls default ring a bit hollow, to me at least. Someone who really wants the defense-in-depth should probably be switching to onion sources anyway, I was impressed with how quick they were.

As the article says, replay attacks are voided and an adversary could simply work out package downloads from the metadata anyway.

I personally use https out of general paranoia, but understand the arguments for not changing. It's two extra lines in a server setup script.

infosec twitter is crap like any other twitter subculture, full of drama queens and clickbait to increase their fav/rt count. What's even sadder is that they make no money off it.

Indeed, something similar happened just last week. theHacker News(not to be confused with HN) twitter lashed out at VLC for not updating over https, which essentially uses same(ish?) code signing as described by APT. A bit of a shit show.

HN also had a big thread participating in the fray...

I'm seriously tempted to start flagging links that point to "bad"/"outrage" bugtracker decisions like this, wide public distribution seems to make things quite a bit worse.

They use 1024bit DSA with SHA1, it is not cryptographically secure! Thus they would really benefit from HTTPS, it would provide another layer of protection against tampering.

Oh and we haven't even addressed that their "secure signing" doesn't also protect first installs that could be insecurely downloaded.

Most Debian mirrors support https. But HTTPS alone does not help you vs fresh connection if it is a rotating certificate like Lets Encrypt that has a dubious authentication chain.

Egypt or Turkey can issue valid fake certificates so you would have to check it if it's not one of those.

What a coincidence. Just earlier this week I was installing yarn in a docker container using their official instructions (https://yarnpkg.com/lang/en/docs/install/#debian-stable) and found out I had to install apt-transport-https for it to work.

Since the image was already apt-get install'ing a bunch of other packages at that point and everything seemed to work, the obvious question that popped in my head was: does this mean none of the other packages I've been downloading used https? That's what led me to this website.

If your personal ISP injected into HTTPS, it'd be broken too. So this is purely a complaint about the particular behavior of your ISP in that it serves HTTPS more faithfully than HTTP.

My corporate ISP hijacks HTTPS (MITM with self-signed CA), but not HTTP. Any system that uses any HTTPS security properties will verify certificates and fail on my work's network.

The argument about poorly behaved ISPs for one particular protocol but not the other cuts both ways — there are different kinds of poorly behaved ISP.

Right, some years ago I was involved in deployment of an update mechanism which (like APT) used signed bundles transferred in the clear. (Originally this was for privacy concerns: our users were more concerned about verifying the content of the "phone home" connections than about hiding their activity from an observer.) Anyway, some fraction of the time it'd fail because of an ISP or corporate injectobox. That stuff all goes over TLS now, not because there's any large benefit but it is very easy to add. We still get a fraction of failures due to injectoboxes, TLS or no TLS.

Moral of the story, I think, is that having a shorter chain of trust is good. In our case, the chain of trust started with a certificate in the original (sometimes OOB) download, the key for which we directly controlled. But for TLS, there are several links in between: the client host's cert store (under the control of OS vendors, hardware vendors a la Superfish, local administrators, etc.), the mess that is the TLS PKI community, your CA, several hundred other CAs, and finally you.

It didn't used to support it not too long ago so I had to setup my server to explicitly not redirect to HTTPS for one particular location because people would need to install apt-transport-https for it.

It's been in there for years

EDIT: Over 12 years to be precise.

  * added apt-transport-https method
Fri, 12 Jan 2007 20:48:07 +0100

it was in a secondary package for a long time though (until apt 1.5 in 2017)

So in order to get the HTTPS transport, you needed to first download the required package over HTTP.

We used FAI[1] to install it into the boot images we used and then ran it that way (other methods), but there still is the verification of the packages you put on those. Short of manually auditing the code and compiling that yourself then there's not much else in the trust chain. It's not really that necessary though, realistically, with the other protection methods. We just did it as it was fun to do and well, we could!

[1] https://fai-project.org/

One reason to prefer HTTPS is that in the event of a vulnerability in the client code, an attacker cannot trigger that vulnerability using a MITM attack if HTTPS is in use. One such vulnerability was recently found in apk-tools: https://nvd.nist.gov/vuln/detail/CVE-2018-1000849

While I agree with your point, a counterpoint is that vulnerabilities in the HTTPS implementation are also possible, and by introducing HTTPS code you are increasing the surface area of possible vulnerabilities.

I don't believe that increases the attack surface. As long as apt supports https repos and redirects:

An attacker who wants to exploit a buggy pre-auth (or improperly cert-validating) client-side ssl implementation, when the connection is http, can just MITM the http connection and redirect to https.

That's a great point, assuming APT follows redirects by default!

I don't know enough about it to know either way. I do know that you need to install a package to get HTTPS support for most connections, but I'm not sure if that package is just "switching" to using HTTPS by default or if it actually adds the ability for APT to read HTTPS endpoints.

How is this making you more vulnerable that not having it?

It’s like saying don’t bother having locks on your doors because they are subject to picking... not a great argument.

The argument is that the package you are downloading is already signed with public key crypto and verified during the update process. It's integrity is "secured". However there could be bugs in that implementation, and bugs can be exploited to (in one of the worst case scenarios) gain remote code execution during a MITM attack.

A "solution" is to protect the endpoint with HTTPS, making MITM attacks impossible. Except that it's also possible that the HTTPS code could suffer from vulnerabilities which can lead to remote code execution. And if I'm being honest, the code which implements HTTPS is much larger and more complicated than the code which is doing the signature checking in APT right now, so by that measure it's actually a downgrade to something "less secure" since it's just adding on more complexity while not improving security much at all.

In reality I believe HTTPS is more heavily scrutinized than the signature verification code in APT is, and therefore could improve security, and there are additional other benefits to HTTPS aside from added security against implementation bugs (like an improvement to secrecy, even if it's small, and better handling by middleboxes which often try to modify HTTP requests but know to not try with HTTPS requests).

I started writing a response but you’ve thoroughly covered all the bases, thanks for expanding and explaining further.

This isn't accurate. If the SSL code incorrectly trusts the wrong server, then you're no better off but also no worse off. If the code has a RCE vulnerability caused by bad parsing logic, then you're worse off than you would be without it.

My counter-counterpoint is that while OpenSSL had (has?) horrible security issues, it's still worth using HTTPS in principle, because a modern internet connected system that has no trustworthy SSL library is going never not going to have security problems. Whether it's hardening OpenSSL, shipping BoringSSL, or anything else, systems just have to get this right, and once they do, applications like apt can take advantage of it.

Problem being that such broken system might especially want to peruse package updates...

Yes, but this comparison doesn't favor APT/PGP at all. Using OpenSSL or similar for your HTTPS implementation means you're running code that the entire world already depends on for security, and which your own OS probably also depends on in other scenarios. Using PGP means you have some kind of custom transport implementation that you're responsible for. To the extent that you're solving the same problem, not using HTTPS is much riskier than using it.

> "Furthermore, even over an encrypted connection it is not difficult to figure out which files you are downloading based on the size of the transfer"

Is it really not difficult? I bet if you sorted all the ".deb" packages on a mirror by size a lot of them would have a similar or the same size, so you wouldn't be able to tell them apart based on the size of the dialog.

Furthermore, when I update Debian I usually have to download some updates and N number of packages. I don't know if this is now done with a single keep-alive connection. If it is, then figuring out what combination of data was downloaded gets a lot harder.

Finally, this out of hand dismisses a now trivial attack (just sniff URLs being downloaded with tcpdump) by pointing out that a much harder attack is theoretically possible by a really dedicated attacker.

Now if you use Debian your local admin can see you're downloading Tux racer, but they're very unlikely to be dedicated enough to figure out from downloaded https sizes what package you retrieved.

> I bet if you sorted all the ".deb" packages on a mirror by size a lot of them would have a similar or the same size

Why bet when you can science? :)

  $ rsync -r rsync://ftp.be.debian.org/debian/pool/main/ | grep "\.deb$"
  $ wc -l debian.txt
  1246733 # total number of deb packages
  $ cat /tmp/debian.txt | awk '{ print $2 }' | sort | uniq -c | sort -rn | head -1
   463 1044 # The most common package size has 463 occurances
  $ cat /tmp/debian.txt | awk '{ print $2 }' | sort | uniq -c | sort -rn | awk '{ print $1}' | grep -c "^1$"
  259300 # The number of packages with a unique size

As I found at https://news.ycombinator.com/item?id=18960239 there can be duplication which is irrelevant for the point being discussed, as it is one version of a package duplicating another version of the same package, meaning that the size is still a unique identifier of the package. It is worth checking that.

> "Furthermore, even over an encrypted connection it is not difficult to figure out which files you are downloading based on the size of the transfer"

>> "Is it really not difficult? I bet if you sorted all the ".deb" packages on a mirror by size a lot of them would have a similar or the same size, so you wouldn't be able to tell them apart based on the size of the dialog."

Human readable sizes: Sure. Byte size info: Not so much. And even if: Things would become very clear to the attacker after one update cycle for each package.

If you really want to mitigate information about downloaded packages you would have to completely revamp apt to randomize package names and sizes, and also randomize read access on mirrors...

> If you really want to mitigate information about downloaded packages you would have to completely revamp apt to randomize package names and sizes, and also randomize read access on mirrors...

There isn't a need to randomize package names, or randomize read access on the mirror, given fetching deb files from a remote HTTP apt repository is a series of GET requests. Randomizing order of these requests can be done completely on the client side.

Package sizes are still problematic. Here's a suggestion: if each deb file was padded to nearest megabyte, and there was a handful of fixed-size files (say, 1MB, 10MB and 100MB), the apt-get client could request a suitably small number of the padding files with each download. This would improve privacy with a minimum of software changes and bandwidth wastage.

If each file were padded to the nearest MiB, the total download size of the packages containing the nosh toolset would increase by almost 3000% from 1.5MiB to 46MiB. No package is greater than 0.5MiB in size.

I am fairly confident that this case is not an outlier. Out of the 847 packages currently in the package cache on one of my machines, 621 are less than 0.5MiB in size.

You're arguing with a strawman.

He didn't mean these file sizes specifically. it would still apply just the same with different file sizes

i.e. create cutoffs every 50 or 100kbyte

You're abusing the notion of a straw man, which this is not.

I am pointing out the consequences of Shasheene's idea as xe explicitly posited it. Xe is free to think about different sizes in turn, but needs to measure and calculate the consequences of whatever size xe then chooses.

No, it would not apply the same with different sizes. Think! This is engineering, and different block sizes make different levels of trade-off. The lower the block size, for example, the fewer packages end up being the same rounded-up size and the easier it is to identify specific packages.

(Hint: One hasn't thought about this properly until one has at least realized that there is a size that Debian packages are already blocked out to, by dint of their being ar archives.)

No, don't do that. Just use tor instead. It's supported on an out of the box Debian install.

It's still useful to be able to connect to the local mirror without tor (and enjoy the fast transfer speeds), but still mitigate privacy leaks from analysis of the transfer and timings.

Transferring apt packages over tor is unlikely to ever become the default, so it's worth trying to improve the non-tor default.

They could also improve the download client to fix this.

For example, if the download client uses the byte-range HTTP requests to download files in chunks, there is nothing stopping it from randomly requesting some additional bytes from the server. Then the attacker would have a very weak probability estimate of what was actually downloaded.

If you care security, you update often.

If you update often, the changeset should be small.

It is easy to match in the small set.

I'm somewhat surprised that no-one has (yet) linked to this post [1] by Joe Damato of Packagecloud, which digs into attacks against GPG signed APT repos, or the paper which initially published them [2].

The post makes clear in the third paragraph: "The easiest way to prevent the attacks covered below is to always serve your APT repository over TLS; no exceptions."

[1]: https://blog.packagecloud.io/eng/2018/02/21/attacks-against-... [2]: https://isis.poly.edu/~jcappos/papers/cappos_mirror_ccs_08.p...

Thanks for posting these, I was unaware that they even existed.

This argument boils down to two things:

One side: We want performance / caching!

Other side: We want security!

Both sides sometimes argue disingenuously. It's true that caching is harder with layered HTTPS and that performance is worse. It's also true that layering encryption is more secure. (It's what the NSA and CIA do. You smarter than them?)

Personally, I'd default to security because I'm a software dev. If I were a little kid on a shoddy, expensive third world internet connection I'd probably prefer the opposite.

I just wish it were up to me.

I think it's important to reiterate that HTTPS only adds network privacy about what you download. The signing of repos and their packages means they're guaranteed to be a pure copy of what's on the server.

That same mechanism means you can easily make trusted mirrors on untrusted hosting.

No, it also adds security.

If someone steals a signing key then they also need to steal the HTTPS cert. Or control the DNS records and generate a new one or switch to HTTP.

Adding an extra layer of encryption is like adding more characters on a password. Sometimes it saves your bacon, sometimes it was useless and came with performance drawbacks.

If you still disagree with me, that's fine. But I want to hear why you continue to hold this opinion when worked for 1Password during Cloudbleed.


In a scalar sense, sure. In a binary "do we have enough security" sense, less so. I realise that's a shitty quality of argument and I could have been more explicit but you can always add more security. Why, for example, aren't you demanding reproducible builds and signed source uploads?

Simply put —and this is, I think, where we disagree— the signing of packages is enough. The design of Apt is such that it doesn't matter where you get your package from, it's that it matches an installed signature.

Somebody could steal the key but they would then either need access to the repo or a targeted MitM on you. Network attacks are far from impossible but by the point you're organised to also steal a signing key, hitting somebody with a wrench or a dozen other plans become a much easier vector.

The problem with the binary sense is that people misunderstand risk. For example, the blackout in 2003 was partially caused by the windows worm earlier in the day. Even though the computers that were used to control the power grid weren't running windows, the computers that monitored them were. So a routine alarm system bug ended up cascading into a power outage that lasted over a week in some places, including my home at the time. This was classified for a while.

The people that programmed Windows before 2003 probably didn't consider their jobs with the full national security implications.

Then you take something simple, like Linux on a simple IoT device. Say a smart electrical socket. Many of these devices went without updates for years. Doesn't seem all that bad, right? Just turn off a socket or turn it on? How bad could it be?

At some point someone noticed that they were getting targeted and and said: "But why?" The reason is simple. You turn off 100k smart sockets all at once and the change in energy load can blow out certain parts of the grid.

The point isn't that someone will get the key. The point is that we know the network is hostile. We know people lose signing keys. We know people are lazy with updates. From an economics perspective why is non-HTTPS justified? Right? A gig of data downloaded over HTTPS with modern ciphers costs about a penny for most connections in the developed world.

To me, it's worth the cost.

Although I would not class this as even potentially in-line with Blaster or the imminent death of the internet under an IoT Botnet, I see your broader point. The deployment cost approaches zero and it does plug —however small— a possible vector.

I do think it would cause a non-zero amount of pain to deploy though. Local (eg corporate) networks that expect to transparently cache the packages would need to move to an explicit apt proxy or face massive surge bandwidth requirements, slower updates.

That said, if you can justify the cost, there is absolutely nothing stopping you from hosting your own mirror or proxy accessible via HTTPS.

I'm not against this, I just don't see the network as the problem if somebody steals a signing key. I think there are other —albeit harder to attain— fruits like reproducible builds that offer us better enduring security. And that still doesn't account for the actions of upstream.

Except the signing key was also downloaded over plain HTTP so all bets are off.

The pubkey is delivered with the CD and signed with gpg, listed on public server.

This is no longer secure then trusting the CA list in the preinstall Windows in pc.

This is a good synthesis here -- downloading and trusting a key over HTTP is folly, but then, so is trusting much of anything that "just works."

If the whole PKI approach is to work, client has got to get trusting that public key right. In regular practice, that probably means checking it against a HTTPS-delivered version of same from an authoritative domain.

(How far down the rabbit hole do we go? Release managers speaking key hashes into instagram videos while holding up the day's New York Times?)

You joke, but we've seen with machine learning you can fake those kinds of "proof" videos too :P

Except that this page argues, that you are not increasing security in this case.

This page is wrong, for example they claim there's no privacy increase here but quite clearly there's a huge difference between an attacker being able to tell "you're downloading updates" and an attacker being able to tell "you're downloading version X.Y of Z package" - worse, that information could actually later be used to attack you based on the fact that the attacker now knows what version to find vulnerabilities in, including non-exposed client software for email, documents, browsers, etc.

It's a relatively insignificant security benefit for most, but could prove an important one for those who targeted attacks are used against.

Expect the attacker can tell the package version in https. Read the footnote in the article.

They're speculating that you can do this using size of the full connection - there's truth to that, but under HTTPS padding will occur causing a round up to the block size of the cipher - meaning that there's a higher chance of overlap in package sizes.

It might actually become quite difficult to do such analysis, especially if multiple requests were made for packages at once in a connection that's kept open. You won't get direct file sizes either, you'd have to perform analysis for overhead and such - in any case it's significantly less trivial than an HTTP request logger.

Even then one is guessing and the other is establishing a fact. That part of the argument didn't sit right with me and I can't imagine somebody in Info Sec or Legal conflating the two..

But it does, because HTTPS can ensure you always get up to date data. They have a solution involving timestamps for HTTP, but that is still clearly less secure than the guarantee from HTTPS.

Not in this case, just read the tl;dr in the article (you don't even need to read the whole article).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact