
Why not tar?  Limitations of the tar file format. - gnosis
http://duplicity.nongnu.org/new_format.html#nottar
======
cperciva
The first two issues -- a lack of index and the fact that you can't seek
within a deflated tarball -- are true but are easily handled by smarter
compression. Tarsnap, for example, splits off archive headers and stores them
separately in order to speed up archive scanning.

The third issue -- lack of support for modern filesystem features -- is just
plain wrong. Sure, the tar in 7th edition UNIX didn't support these, but
modern tars support modern filesystem features.

The fourth issue -- general cruft -- is correct but irrelevant on modern tars
since the problems caused by the cruft are eliminated via pax extension
headers.

------
enneff
"Other archive formats like WinZip..."

The guy immediately loses credibility in my eyes for referring to the most
popular archive format as 'WinZip'. It's the ZIP file format, designed by Phil
Katz of PKWare Inc.

<http://en.wikipedia.org/wiki/ZIP_(file_format)>

To add injury to insult, the rest of his proposal is pretty similar to ZIP,
which also accomplishes the nice-to-have things he mentions at the end.

~~~
bl4k
What this article describes has already been solved with zip, gzip, 7z, bzip
and forks of tar

The problem is that at the moment there is no open standard (there are IETF
proposals) since each of these is either patent, copyright or trademark
encumbered.

------
nailer
It's very difficult to talk about 'tar' per se. Do you mean:

* GNU tar?

* BSD tar?

* Solaris tar?

Or even Schilly's 'star' program?

Each of these has different limits, advantages, and disadvantages.

------
rarrrrrr
this detail is wrong: the tar that ships on Mac OS X does indeed support
resource forks.

------
anon_d
> Because tar does not support encryption/compression on the inside of
> archives.

Yes it does? Just encrypt/compress all the files before tarring.

> Not indexed

The reason tar doesn't have an index is so that tarballs can be concatenated.
Also IIRC, you only have to jump through the headers for all files. Still O(n)
where n is the number of files, but you don't have to scan through all of the
data.

~~~
gwern
> The reason tar doesn't have an index is so that tarballs can be
> concatenated.

I'm curious, what's the use-case for this? Offhand, the only use for that
ability I can think of is if I forgot a file in a tarball and have already
deleted the originals; I can tar the missing file and cat the two tarballs.

~~~
dagw
Don't think files, think tapes. Tar stands for Tape ARchive and was originally
primarily used for backing up to tapes. When working with tapes where
deleteing and re-writing archives is basically impossible, concating an
archive to the end of an already backed up archive to create a new, updated
archive is very useful.

~~~
gwern
Ah, I see. Yes, that does sound very useful.

------
micheljansen
I think raising these concerns is fair in a world where nearly all Unix-
related source code and binaries is distributed in (g/bzipped) TAR format.
Unfortunately, the author does not really explain why this is and what is
wrong with ZIP (e.g. why a new format is needed).

I guess that one of the reasons for TAR's dominance is the lack of a free
alternative? Apparently ZIP is not free enough (as I understand from
[http://en.wikipedia.org/wiki/ZIP_(file_format)#Standardizati...](http://en.wikipedia.org/wiki/ZIP_\(file_format\)#Standardization)).

TAR is old however, and if ZIP cannot take its place, coming up with something
new is not such a bad idea. I think Apple's DMG/UDIF file format deserves to
be mentioned as well: it addresses all the concerns mentioned (it is
essentially a mountable filesystem). I'm pretty sure there is a lot to be
learned from that.

------
farmer_ted
Xar addresses a lot of the issues presented in this article.

<[http://code.google.com/p/xar/wiki/xarformat>](http://code.google.com/p/xar/wiki/xarformat>);
<[http://code.google.com/p/xar/wiki/whyxar>](http://code.google.com/p/xar/wiki/whyxar>);

But not with the nice descriptive graphics found in the new archive format
proposal.

------
bootload
_"... Because tar does not support encryption/compression on the inside of
archives ..."_

That can be an advantage. Space isn't always what I want for backups - I want
the original data back and compression gone wrong ( _tar -zxvf_ ) is just
another way to loose data.

~~~
dagw
The pkzip format allows you to "zip" data uncompressed if you are worried
about that. Then you can trivially unpack your files using nothing but seek
and read for those cases where you also accidentally misplace your last copy
of unzip.

~~~
acqq
It also has file description before the copy of the file data and at the
second copy at the end of the whole archive, allowing fast listing of the
content or location inside of the archive, exactly as the article would want.

The only thing pkzip doesn't cover in the original format is unix/linux
specific metadata, but maybe this was/can be added. I use info-zip when the
metadata don't matter but tar when they do (but even tar has its limitations
with working with unix/linux metadata).

~~~
dagw
pkzip does reserve the posibility for an arbitrary length extra field
connected to each file. According to the spec
(<http://www.pkware.com/documents/casestudies/APPNOTE.TXT>) this is for
"additional information...for special needs or for specific platforms". All
compatible zip tools are required to ignore all information in this field that
it doesn't understand so you can basically write whatever you want there
(although the spec does offer a recommended format for writing to this field).
So if you write a special ACL preserving zip implementation, you can still
unpack the file with any other zip implementation that knows nothing of your
special version.

------
nanairo
Anyone knows how does duplicity compare to XAR? DAR? Or CFS or 7z?

------
hernan7
As long as they don't make me use cpio, I'm fine.

~~~
joey_bananas
we should go back to that embedded shellscript thingy that was common back in
the day. Its name escapes me.

~~~
gaius
<http://en.wikipedia.org/wiki/Shar>

~~~
rwmj
I was just imagining if we'd all settled on using shar files how big the
virus/worm problem would be on Linux today ...

... Then I thought that effectively that's what Windows does (using *.exe for
installers). No wonder they've got a problem.

~~~
_delirium
Well, the existing Linux package managers aren't really safer as far as the
archive formats go; for example, .debs can run arbitrary shell scripts during
installation. The main thing that seems to add to the safety is the social
practice of grabbing debs via trusted repositories using apt-
get/aptitude/synaptic, rather than manually downloading them from random sites
and doing _dpkg -i_. But if there _is_ malware, it's even worse, because at
least these shar installers are usually installed as non-root, while
installing a .deb needs root.

