
The cpio trailer problem (2018) - tjalfi
https://entropymine.wordpress.com/2018/05/27/the-cpio-trailer-problem/
======
teddyh
That “TRAILER!!!” string should be added to the _Big List of Naughty Strings_
¹ (discussed here previously²).

1\. [https://github.com/minimaxir/big-list-of-naughty-
strings](https://github.com/minimaxir/big-list-of-naughty-strings)

2\.
[https://news.ycombinator.com/item?id=13406119](https://news.ycombinator.com/item?id=13406119)

------
rwmj
The Linux kernel will keep going after the "TRAILER!!!" entry in the
initramfs. This is useful in the context of initramfs because you can
concatenate them together - a trivial way to add more files to the init
ramdisk or even overwrite existing ones. You can even concat gz-compressed
cpio initramfs files which causes a similar problem finding the uncompressed
size because the last 4 bytes will only show the size of the last gzip section
(try with gzip -l).

------
avian
> I assume that cpio is very rarely used these days.

From Wikipedia:

> The use of cpio by the RPM Package Manager, in the initramfs program of
> Linux kernel 2.6, and in Apple's Installer (pax) make cpio an important
> archiving tool.

~~~
rwmj
RPM hasn't used true cpio for a while [since 2013] because cpio didn't support
> 4GB files. It now uses its own format which it evolved from cpio.

~~~
tyingq
Seems they still use the same logic though: [https://github.com/rpm-software-
management/rpm/blob/853c48ba...](https://github.com/rpm-software-
management/rpm/blob/853c48ba6468ce1a516621a2fa6d1fc51e4f7410/lib/cpio.c#L49)

~~~
rwmj
That doesn't actually matter though does it, because RPM is no longer using
cpio, it's now using a custom format that happens to once have been related to
cpio. What matters is what RPM does, not what cpio does.

In this case it's not possible to create such an archive using rpmbuild as all
files added to RPMs much start with "/". I just tried it and got:

    
    
      RPM build errors:
          File must begin with "/": TRAILER!!!
    

You cannot hack on your own rpmbuild because you wouldn't be able to make a
signed RPM (as those can only be built through the Koji build system which
doesn't have your hacked rpmbuild), so you couldn't distribute that RPM to
Fedora or Red Hat Enterprise Linux.

In other words this is essentially a non-problem for RPM.

~~~
tyingq
The comparison is just to the filename, not including the path. You may be
right, but that check doesn't prove it.

Edit: Thanks for the reply! Apparently they are including the path.

~~~
rwmj
I can create an RPM containing a filename "TRAILER!!!" just fine:

    
    
      $ rpm -qlp /home/rjones/d/fedora/virt-what/master/x86_64/virt-what-1.20-1.fc33.x86_64.rpm
      /TRAILER!!!
      /usr/lib/.build-id
      /usr/lib/.build-id/59
      /usr/lib/.build-id/59/59c1cb95689b7babd5a6cb1b5bc58f12e20b3a
      /usr/libexec/virt-what-cpuid-helper
      /usr/sbin/virt-what
      /usr/share/doc/virt-what
      /usr/share/doc/virt-what/COPYING
      /usr/share/doc/virt-what/README
      /usr/share/man/man1/virt-what.1.gz
    

Furthermore I installed and uninstalled it with no ill effects, and the file
was created and removed in /. So whatever problem cpio may have with this file
does not affect the RPM-not-quite-cpio format.

------
qalmakka
I never really understood why tar "won" and cpio "lost" the archive wars. Both
do pretty much the same thing, thus I think it was just a matter of personal
preference?

I started using *NIX around ~2006 and nobody was using cpio already then, I
only had the chance of learning of its existence when I first had to extract
an RPM.

~~~
raverbashing
Because the cpio utility has awful usability

(it might capture more info than tar maybe? but it's not worth the effort)

~~~
theamk
Second that. I had the time when I was working with initrd images, so I had to
use cpio a lot. I could never get used to its syntax.

And the fact that it’s manpage was missing, only saying “use info instead” did
not help its usability at all.

~~~
JdeBP
That was a GNU documentation problem, never applicable to non-GNU versions of
cpio, and not even applicable to the current GNU manual page, which has no
mention of info and definitely exists. It also post-dates by years the period
where tar and cpio were in competition, which was (roughly) the early 1980s.
(The POSIX standard pax aimed at resolving the conflict was in the 1003.1b
draft in 1991.) It thus is an entirely fallacious explanation for what
happened in the "archiver wars".

* [https://news.ycombinator.com/item?id=18978064](https://news.ycombinator.com/item?id=18978064)

* [https://netbsd.gw.com/cgi-bin/man-cgi?cpio](https://netbsd.gw.com/cgi-bin/man-cgi?cpio)

* [http://osr507doc.sco.com/man/html.C/cpio.C.html](http://osr507doc.sco.com/man/html.C/cpio.C.html)

* [https://manpages.debian.org/sid/cpio/cpio.1.en.html](https://manpages.debian.org/sid/cpio/cpio.1.en.html)

------
tyingq
Interesting. I wonder why they didn't just use a value that's not valid for a
filename. TRAILER/// might have worked for Unix.

------
saagarjha
TL;DR a lot of formats have “sentinel” values that are little more than
“nobody sane would ever pick this value” and we end up in a conundrum when the
obvious ambiguities arise.

~~~
raverbashing
It's not an issue with sentinel values, but an issue with not escaping your
inputs

I don't believe tar "falls for" such a stupid trick.

Confusing metadata with actual data is a classic security blunder

tl;dr: cpio has amateurish levels of input validation/handling

~~~
masklinn
> tl;dr: cpio is an amateur project

TBF cpio dates back to v7 unix, some time before considerations of security
really became a thing.

~~~
raverbashing
The question is not so much why it was like that, the question is why nobody
fixed this yet

I like the Unix environment, but I can understand why stuff like systemd take
hold, because the legacy is painful.

And any Unix environment has this weird mix of tools where the most popular
ones got better over time (tar, etc) then you have the weird ones that didn't
evolve so much like cpio

~~~
cyphar
A bit of shameless self-promotion, I wrote a blog post some time ago which
goes through the history of tar[1]. Tar definitely has evolved through the
Unix wars, but I wouldn't say the final result is really that great. While we
do tend to reinvent tools for reinvention's sake, I do think tar really does
deserve a redesign based on modern constraints (rather than the constraints of
tape archives). Zip archives are markedly better designed.

Also, cpio did evolve at least somewhat. In fact POSIX has a separate archive
tool called "pax" (which supports both tar and cpio) which was meant to act as
a truce between the tar and cpio factions of Unix vendors. Nobody uses it, as
far as I know.

[1]: [https://www.cyphar.com/blog/post/20190121-ociv2-images-i-
tar](https://www.cyphar.com/blog/post/20190121-ociv2-images-i-tar)

~~~
masklinn
> Zip archives are markedly better designed.

Yes and no. They have advantages and drawbacks:

* zip archives don't require seeking through the entire thing to list archive contents

* zip archives can random-access files cheaply

* zip checksums the files themselves (AFAIK tar only checksums the header)

* zip compresses files individually, meaning different files can use different configuration

* the central directory (CD) means you can trade efficiency for very cheap updates: removing a file is just rewriting the CD without that file, likewise to replace a file just tack the new file at the end, then write a new CD with updated offsets

However

* "streaming" zips can lead to schizophrenia as the central directory record does not necessarily match the streamed contents or ordering

* zip _only_ compresses files individually meaning there's no cross-file efficiency to be had (some compression formats have a "solid" archive concept which compresses globally as an opt-in) so tar _can_ compressed better depending on the mix and the compression algorithm (assuming both archives use the same method)

> Also, cpio did evolve at least somewhat. In fact POSIX has a separate
> archive tool called "pax" (which supports both tar and cpio) which was meant
> to act as a truce between the tar and cpio factions of Unix vendors. Nobody
> uses it, as far as I know.

GNU CPIO apparently straights support tar.

~~~
cyphar
I would argue the upsides out-weigh the downsides. Out of two downsides you
mention, one actually applies to tar archives as well (the ordering of files
can be random) -- it's just that there's no index to compare against. If you
added an index to a tar archive it would have the same issue.

The compression point you make is fair, though there is a clear trade-off when
it comes to random-access. Naive whole-archive compression makes efficient
random-accesses impossible because you have to decompress everything before
the data you're trying to access. A sufficiently-clever archive and
compression format may be able to overcome that, but I'd argue that zip made a
reasonable trade-off here.

> AFAIK tar only checksums the header

tar doesn't checksum anything, there isn't even a CRC.

~~~
masklinn
> I would argue the upsides out-weigh the downsides.

I could agree, but that doesn't mean replacing tars by zips doesn't have
drawbacks. Even more so as you can use whatever compression scheme you want
with tars, not so with zips (unless you use them as tars and then… they're
just a worse tar).

> Out of two downsides you mention, one actually applies to tar archives as
> well (the ordering of files can be random) -- it's just that there's no
> index to compare against.

There is no schyzophrenia because tars _always_ behave that way.

> If you added an index to a tar archive it would have the same issue.

Yes, if tar were a completely different format it would have different issues.
It's not though.

> Naive whole-archive compression makes efficient random-accesses impossible
> because you have to decompress everything before the data you're trying to
> access. A sufficiently-clever archive and compression format may be able to
> overcome that, but I'd argue that zip made a reasonable trade-off here.

So did tar, they're just _different_ tradeoffs. For instance archiving tons of
small files with zip is an absolutely miserable experience because you've got
more overhead than you have content for instance. Because tar was designed for
_tape archive_ , its use-cases included backups which, for university
computer, would be likely to include lots of small files.

> tar doesn't checksum anything, there isn't even a CRC.

It's not a CRC, but tar has a very simple checksum on headers: the 257 bytes
of the header are summed into an integer.

------
appleflaxen
Is there a name for this general class of problems? Where you have an
accidental parsing/scoping error because the encoding format is ambiguous, or
doesn't take into account a specific configuration of the message contents?

~~~
giovannibajo1
I would classify this as using an in-band signal rather than an out of band
signal. Twitter used to have this same problems ages ago. I don’t remember the
details but it was something like DMs being standard twitter messages with
“/dm” at the beginning.

