
How I wound up finding a bug in GNU Tar - fanf2
https://utcc.utoronto.ca/~cks/space/blog/sysadmin/TarFindingTruncateBug
======
heinrichhartman
FTR: This gdb script will print full stack traces for every read syscall made:

    
    
      catch syscall read
      commmands
         backtrace
         continue
      end
    

Put that into a file `trace-read.gdb` and attach to a running process like so:

    
    
      gdb -x trace-read.gdb -p $(pgrep -n tar)
    

This assumes you are running an executable with built-in debugging symbols
(gcc -g). It should be possible to side-load debugging symbols provided in an
external package, though I don't have a command at hand. (Anyone?)

This works well enough on Linux, to quickly debug situations like the one
above. However, it can make the attached process painfully slow, and
occasionally bring it down all-together.

The right tool for this kind of in situations where slowing down or crashing
the process is not acceptable, is DTrace:

    
    
      dtrace -p $(pgrep -n tar) -n 'syscall::read:entry { ustack(); }'
    

\- Tutorial:
[https://wiki.freebsd.org/DTrace/Tutorial](https://wiki.freebsd.org/DTrace/Tutorial)

\- ustack: [http://dtrace.org/guide/chp-user.html#chp-
user-4](http://dtrace.org/guide/chp-user.html#chp-user-4)

Ironically, DTrace is one of the main selling points for Solaris/OmniOS (or
FreeBSD) over Linux. The situation has gotten better recently, with bpftrace
becoming available:

\- [http://www.brendangregg.com/blog/2018-10-08/dtrace-for-
linux...](http://www.brendangregg.com/blog/2018-10-08/dtrace-for-
linux-2018.html)

\- [https://github.com/iovisor/bpftrace](https://github.com/iovisor/bpftrace)

Until you have a 4.x Kernel with the right configuration options running, I am
afraid the above gdb scripts is your best option on Linux.

~~~
dima55
You can do this on Linux using perf, if you want less overhead and less impact
on the application-under-test:

    
    
      $ perf record -e syscalls:sys_enter_read -g -- application arg1 arg2 ...
    
      [ application runs while perf writes out a log, recording every read() syscall, and keeping track of the backtrace each time]
    
      $ perf report -g --stdio
      [ perf reads the log, writing out the backtraces ]
    

This is the basic usage. Lots more is available, obviously. This has been
available for a LONG time.

~~~
heinrichhartman
Good point. I did not know perf could print backtraces.

------
gumby
fun trivia: gnu tar predates the GNU project. What?

It started out as an implementation of tar written by John Gilmore whose uname
is 'gnu'. gnu has written a lot of free software as well, but the fact that
his name and the GNU project's name are the same is a complete coincidence.

~~~
jchw
Fascinating but is that actually true? As far as I have been able to search
their implementation seems to have been called pdtar.

~~~
w7
The FSF references this in a news article [1]. Stating that pdtar is what GNU
tar was based on.

[1] [https://www.fsf.org/news/2009-free-software-
awards](https://www.fsf.org/news/2009-free-software-awards)

~~~
jchw
Yes, that is definitely true. I was simply referring to the claim that pdtar
was known as gnu tar.

------
cesarb
While the bug is in tar, the API for the read(2) system call doesn't help. Its
return value can be positive, which indicates the number of bytes read;
negative, which indicates an error, unless errno is EINTR, in which case you
should retry; or zero, which indicates either end of file, or the number of
bytes read when the buffer length is zero (and in this case you can't
distinguish end of file from a normal read). It's easy to forget to handle one
of these five cases.

~~~
saagarjha
The issue with EINTR is pretty standard between most syscall; if you get one
you should retry (or, depending on what flags you have set, this can be done
for you automatically in some cases). A return value of zero is an EOF, there
is no such thing as an “empty read”. Why would you pass in an empty buffer?

~~~
jandrese
You can get an empty read if you have setup the FD to be nonblocking.

~~~
cesarb
That would be a sixth case I had forgotten above (negative return value with
errno EAGAIN and/or EWOULDBLOCK - these two might or might not be the same
number).

------
ezoe
> if you run GNU Tar with --sparse and a file shrinks while tar is reading it,
> tar fails to properly handle the resulting earlier than expected end of
> file.

Is that a bug? Yes. It's different than changing the data of file but still.

~~~
newnewpdro
Yes, it's expected for a general system utility like tar to handle such
simultaneous file operations gracefully.

~~~
hollerith
Agreed. OTOH, it is not a bug if the fact that tar is not instantaneous causes
the tarball to contain combinations of files unlikely to occur during normal
operation of the system. e.g., if a file is renamed while tar is running it is
not a bug for the tarball to contain zero or two copies of that file.

~~~
newnewpdro
It is a bug for tar to enter an infinite loop writing an ever expanding file
of trailing zeroes because the current file was truncated to a size smaller
than its stat returned.

It's basically a classic TOCTOU race, tar needs to honor the EOF returned from
read() as authoritative.

This is actually a very common trap for junior *nix programmers; treating
stat() and a subsequent mmap() or read() loop as if they were atomic with
regards to stat.st_size. The size returned by stat() can be used as an
estimate, but otherwise can't be applied to subsequent operations.

------
ac_4763267
A relevant question about tar and backups:

[https://unix.stackexchange.com/questions/333975/is-using-
tar...](https://unix.stackexchange.com/questions/333975/is-using-tar-while-
source-directory-is-being-updated-safe)

------
Annatar
Finding a bug in GNU tar is not hard... just ask Jörg Schilling.

One of the first things I do when I download source code is decompress it and
then de-archive it with GNU tar, then promptly re-archive it wit a real, AT&T
UNIX System V 4.0 tar, because it's a known fact among us old UNIX folks that
GNU tar is buggy as hell and only GNU tar can correctly read GNU tar (that
wasn't the point of a tape archiver, but GNU people didn't get that memo).

~~~
giancarlostoro
Kind of curious as to where you get that version of tar from.

~~~
Annatar
Either from Heirloom tools, or by running any freeware, open source operating
system which is built from the illumos source code, like SmartOS. illumos code
base is the most direct, canonical descendant of AT&T UNIX System V 4.0. If
you run anything based on illumos not only are you running a true UNIX, but a
reference implementation of it.

Additionally, any true System V UNIX like IRIX and HP-UX will have it.

------
oculusthrift
that’s cool! thanks for the write up. i learned a bit

------
noja
I think he should have reported this but writing it up as a fairly detailed
blog entry.

~~~
DavidNielsen
I rather enjoyed reading the adventure of hitting a problem in a large system,
and then working towards narrowing down the specific bug or set of bugs.

Such posts have in the past been super helpful for me personally (and to
others I imagine), in going from plain weeping that stuff just randomly
breaks, to learning to enjoy examining the possible causes and understanding
how to explain the problem concisely and with enough detail to make it useful
to a developer.

He’ll be able to reuse much of his blog post in an great bug report with easy
steps to reproduce the problem, excepted and actual outcomes of following
those steps, and a scenario where it will happen in real deployments as well
as a providing a workaround.

