
Are tarballs obsolete? - acheron
http://esr.ibiblio.org/?p=6875
======
tritium
No! Things like package managers and git, however pervasive, are _agents_ and
not the testable, serialized representation of data.

Why wouldn't you make the same file set available as a single download via
plain, old HTTP(S)?

What's so hard about that? Why place something on a shelf, behind an agent
that demands a learning curve, when it could also be available to a browser?

You CAN'T hash a tree of files quickly by hand. You need a program to help
test the integrity of a tree with some recursive problem solving. Take
tarballs away from people, and you're putting things on a shelf, out of reach,
taking away a level of safety and reliability from a certain cross-section of
your audience.

Not only that, but EVERY platform has a web browser, even if it's curl or
wget. With a single file, available by HTTP, any platform can get to it, so if
your machine dies, and you have to call a friend for help, and they don't use
the same OS, they won't have to install a special tool, or hope an intervening
third-party web service provides coverage.

If we're simply talking file formats, and possibly exchanging the tarball for
zips, ISOs or what-have-you, as alternative serial representations of the same
data, that's not even a discussion.

If you're worried about practical utility, load, bandwidth and/or availability
(or even, perhaps, culling the dull, dumb, toxic and obnoxious from the herd,
terrible as that sounds) then changing the agent permitted to access the
resource might be one solution, but not necessarily the right solution.

~~~
simula67
> Why place something on a shelf, behind an agent that demands a learning
> curve, when it could also be available to a browser?

wget <filename>

git clone <repo url>

What is the difference in learning curve ?

> With a single file, available by HTTP, any platform can get to it, so if
> your machine dies, and you have to call a friend for help, and they don't
> use the same OS, they won't have to install a special tool, or hope an
> intervening third-party web service provides coverage.

People have been making an argument for static compiling basic tools for a
long time now ( I support it ). I think git can be considered a pretty basic
tool now given its popularity.

~~~
madawan
> git can be considered a pretty basic tool

so what about CVS, svn, Mercurial and fossil? Git is hip right now, but
including it in an initrd seems excessive. Furthermore git downloads the
entire repo, all the history/commits, this isn't always what you want or can
handle or want to pay for.

~~~
klibertp
> so what about CVS, svn, Mercurial and fossil

Why, these are pretty basic too. Especially if what you want is just to get
the code. Having a bit of experience with any VCS will let you skim the
relevant manpage and use another VCS in minutes tops.

> including it in an initrd seems excessive.

True...

> Furthermore git downloads the entire repo

Not necessarily? I think you can download just the source files, without any
git metadata, turning git into SVN essentially. I'm not 100% sure though, but
I'd be really surprised was it not the case.

\---

I started my VCS adventure with RCS, then (quickly) moved to CVS, then (even
more quickly) to SVN. As soon as DVCS started to appear I switched to bazaar
(bzr), then git. These are the VCS I used for my own projects; I probably used
a couple more VCSes in a situation where I needed latest code for some project
which used something like Mercurial, or Darcs, etc.

In general, I think it's fair to say that Version Control, as a whole, is a
basic tool.

~~~
krisdol
>Not necessarily? I think you can download just the source files, without any
git metadata, turning git into SVN essentially. I'm not 100% sure though, but
I'd be really surprised was it not the case.

Partially possible. You're always fetching diffs, not files, with git. You can
get close to what you said with `git clone --depth 1`, which tells git to only
get history for the latest revision, but you still get some git metadata about
available branches. Depending on the version of git you're using, you may also
have to pass --single-branch in order to only fetch one branch's revision.

------
nuxi7
The distributed form of a project (especially one based on GNU autotools) is
often not merely a tarball of the upstream repository. Often the author has
pre-run some stages of the build that she feels would impose unnecessary or
esoteric dependencies. Typically this involves pre-generating Makefile.in from
Makefile.am and configure from configure.ac. Another common one is for the
maintainer to pre-generate the documentation. The user of the distributed
tarball is fully capable of re-running these steps, but they have become
optional.

This is a fact often forgotten when projects switch to github and just start
using the automated release tarballs github will make from the repo.

~~~
Jasper_
This becomes more complex when distributions want to patch configure.ac or
Makefile.am -- they need to patch the included Makefile.in files as well.

Trying to balance these two forces is the reasoning behind the
AM_MAINTAINER_MODE flag.

These days, I prefer to simply use git archive for a tarball and make users
run autogen.sh themselves.

See
[https://blogs.gnome.org/desrt/2011/09/08/am_maintainer_mode-...](https://blogs.gnome.org/desrt/2011/09/08/am_maintainer_mode-
is-not-cool/) and the associated comments.

~~~
voltagex_
To the newcomer, autoconf looks really scary. Are there any good tutorials and
are there clear advantages of using autoconf over CMake, waf, scons, ninja or
other hipster-language-build-systems?

I'm going to start looking at CMake to build HTML and javascript based
projects soon.

~~~
ploxiln
I consider myself rather adept with Makefiles for Gnu make, and I also find
autoconf scary (I'll be checking out the "autotools mythbuster" linked above
though).

I'd guess that autotools and cmake have a lot of complexity to do C/C++/libs
related stuff that is inapplicable to html/js projects, and I'd suggest almost
any of the others: waf, scons, plain make ...

ninja, as you've probably heard, is designed to be the backend to some other
frontend which generates all the explicit dependencies and commands. That
could be cmake, it could be a custom script of yours. Then ninja identifies
changed files and runs an "incremental" build as fast (parallel and minimal)
as possible.

Just btw, I wrote a from-first-principles sort of tutorial on Gnu make:
[http://www.ploxiln.net/make.html](http://www.ploxiln.net/make.html)

------
dalke
My comment from yesterdays post on this topic:

If you're willing to hand-wave away certain things as being outside of the
"center of the open-source software-release ritual", then you can get a 'yes'.

But in addition to source-distributions mentioned in the essay, which are "too
tiny a minority", two other issues are:

1) If you define "git" as "pervasive" then you can declare that git is the
solution. In my field, some of the tools are still only accessible via
tarball/zip. Two examples are the IUPAC InChI distribution, from
[http://www.iupac.org/home/publications/e-resources/inchi/dow...](http://www.iupac.org/home/publications/e-resources/inchi/download.html)
, and VMD from
[http://www.ks.uiuc.edu/Development/Download/download.cgi?Pac...](http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=VMD)
. PyMol, a more popular free software visualization package, is also not
developed using git.

2) If you define "open-source software-release" to exclude fee-based free
software distribution (perhaps it is also "too tiny a minority"?), then it
preferentially excludes certain flavors of 'open-source software'. For
example, I write free software, and distribute it for money. A tarball is an
easy way to identify the delivery of the contractually obligated product. I
can also send it via email, vs. setting up a private git server and getting
accounts set up for my customers.

How much free software development is part of the hidden world of non-publicly
accessible development? I don't think anyone really knows.

So if your free software baseline assumes "pervasive git", a public
development repo, and no cost to access the code, then sure, the answer is a
"yes". Until then, it's a "no."

And my examples show that baseline is only a subset of free software
development, with no clear idea of how large it is.

~~~
spb
Another place where tarball/zip is significantly more viable than Git: the
browser (and Chromebooks as such). I know there are things like Tim Caswell's
JSGit, but they're woefully unfinished and assembly-required compared to an
interface like JSZip.

------
liw
I am strongly in favour of tarballs for releases. Git tags can be moved, for
example, so that's an extra thing to be careful about, if one is worried about
reproducibility.

[http://yakking.branchable.com/posts/releases/](http://yakking.branchable.com/posts/releases/)

That's my write-up on the topic of making releases, from about a month ago.

~~~
brianzelip
Thanks for posting your blog post link. Having only published static websites,
I've been curious about the release process and your notes provide some
insight, especially the parts about signatures.

------
vezzy-fnord
Tarballs versus DVCS appears to be a false dilemma. They're independent
mechanisms. The more proper question to ask here is "Are tarballs obsolete for
distributing source code?".

~~~
pjscott
That's what the article is asking. Perhaps a misleading title.

------
discreditable
Correct me if I'm wrong but doesn't a git clone involve downloading the
project's entire git history as well as $CURRENT_VERSION? That seems a little
excessive.

~~~
dajohnson89
And what if you find a bug, and want to regress to a previous version?

What is so cumbersome about downloading a project's history?

~~~
ubernostrum
Well, some projects have been around for a while. "All the history" can be a
LOT of history.

~~~
eru
Actually, even for the Linux kernel, it's only a few hundred megabyte.

~~~
Freaky
My bare Linux repository comes to 1.2GB. A git gc run got it down to 1.1GB.

~~~
eru
Oh, true. I just checked. I think I mixed up two numbers.

------
tjbiddle
Absolutely not; there are a myriad of reasons to prefer tarballs over Git:

\- Offline use \- Permission restrictions (Internal git networks) \- True-
persistent versioning (Can always delete and update a tag), no external tools
needed to download \- Can restrict extra garbage that may not be need to be
shipped \- Etc.

~~~
scrollaway
> Offline use

Tarballs are just as offline as git repositories...

> Permission restrictions (Internal git networks)

What?

> True-persistent versioning (Can always delete and update a tag)

Why is that of concern regarding "tarballs vs git"? If anything it's a win for
git.

> Can restrict extra garbage that may not be need to be shipped

Only valid point you presented, and it's better discussed elsewhere in this
thread.

~~~
gizmo686
>Tarballs are just as offline as git repositories...

Tarballs are easier to shoenet into an offline machine. To do this with git
repositories, you would either need to deal with transfering a directory
structure or, more likely, pack the respository into some sort of tarball.

~~~
voltagex_
`git bundle create` will do what you want.

------
joslin01
The problem tarballs solve is bundling files together. That use-case will
probably be with us for awhile even if it has fallen out of favor for source
code.

~~~
raimue
The article is discussing whether it is worth to provide tarballs for software
releases, not tarballs in general. The title might be misleading.

------
RexRollman
I like tarballs. Speaking as a non-programmer who compiles things like Vim and
Scrypt, it is nice to be able to download a release without having to install
a VCS first.

~~~
err4nt
Tarballs are also easier to attach to an email or to share with other
computers :)

------
deathanatos
The article seems to mean "tarballs are obsolete for source distribution", not
obsolete on the whole.

> Here’s an advantage of the clone/pull distribution system; every clone is
> implicitly validated by its SHA1 hash chain.

Given that TLS does not consider SHA1 secure[1] anymore… I'm not sure that's
an assumption I'd be making.

[1] in the sense that it's being rapidly deprecated; even if it doesn't
trigger validation failures today, the fact that it will tomorrow is telling.

~~~
JoachimSchipper
SHA-1 is clearly imperfect, but not _that_ big a problem here. The writing is
on the wall for SHA-1's collision resistance. A break in collision resistance
would allow a malicious software maintainer to distribute different tarballs
to people of interest than to the internet at large, but would _not_ allow a
third party to replace a honestly-generated tarball with some other tarball
("second pre-image").

Since you almost certainly need to trust the maintainer not to insert
backdoors anyway (a well-hidden backdoor will not be detected by the normal
packaging process), trusting the maintainer not to maliciously distribute
different software to a small number of people of interest isn't _that_
unreasonable.

------
dvh
Tarballs can have empty directory

------
Johnny_Brahms
I love tarballs. They got a whole lot more enjoyable with all the paralell
compression utilities out there. My favourite is for a file format that has
fallen out of grace (bzip2) [1]. It is fast enough to make IO the bottleneck
on my HDD (although not on my ssd).

1:[http://compression.ca/pbzip2/](http://compression.ca/pbzip2/)

------
fsiefken
tarballs which are also usually compressed (gz, bz2, 7z, xz) may have the
benefit of being more efficient to download over bandwidth constraint or size
constrained links. Also being 1 file, it's only one request and it works over
zmodem or modern derivatives.

------
jokoon
Seen how any filesystem will slow down when dealing with many individual
files, i'd say no.

------
ZeWaren
Consider using core OS tools (wget/fetch, tar, etc.) versus the dependency
tree required to run git on your system. Do you need curl? Do you need perl?
Do you need a XML parser and a regex runtime engine?

------
olgeni
Step 1: gems, npm, pip, etc. Step 2: "containers". Step 3: this.

------
tw04
No. Source: Veritas NetBackup.

------
aayala
no

------
tuna
docker brought back the frenzy of delivering software in a tarball just like
slackware in the past. Long live patrick volkerdig and tar -zxvf ... -C / !

~~~
TheDong
Even though Docker's disk and transport format are technically tar, I think
that this is still a wrong statement.

tarballs are ubiquitous already. Docker actively hides that it uses tarballs
(it abstracts away all interactions with them, to the point that it really is
an implementation detail excluding docker export/import).

