
Argh-P-M – Dissecting the RPM file format (2016) - icebraining
https://blog.bethselamin.de/posts/argh-pm.html
======
neonihil
I've authored many RPM packages, it's dead simple. Have a spec file, tar it,
and run rpmbuild on it.

I don't really know why this guy was freaking out.

I also like the fact that transient dependency and repo handling is separated
from the package handler itself. We have rpm to handle a single package, and
we have dnf (yum) to handle trees of packages.

~~~
bmn__
> I don't really know why this guy was freaking out.

That's clearly explained in the article. This is not about authoring from a
specfile, so your comment is off. He wants to serialise a package into RPM
without the traditional toolchain; it turns out to be much more difficult than
the other serialisers he already has because of the rpm format's bad design
decisions and accretions.

~~~
yjftsjthsd-h
I don't think they're bad decisions; I think he has no idea _why_ it's that
way and so is _assuming_ that it's terrible. The RPM format was designed when
computers were smaller and slower, and is extremely flexible while forcing you
to declare enough that it can validate the file contents as it goes. It's
over-engineered, and there are certainly things I would change, but it's not
just random stupid decisions. If you can't think of any good reason why it
would be written the way it is, the main options are that you're missing
something or the original spec authors were stupid. Of course, sometimes they
were stupid, but maybe don't immediately assume that after your very first
investigation into the format.

~~~
icebraining
The Deb file format was designed even earlier, yet it doesn't contain a
matryoshka of metadata.

~~~
yjftsjthsd-h
Debian under-engineered, Redhat over-engineered. In perfect hindsight, Debian
probably made better tradeoffs, but that's only with decades of hindsight.

~~~
icebraining
I don't see how Debian under-engineered. Does it not fulfill its needs?

Plus, RH could have decided to make a proper v2 rather than build this lasagna
of a format.

~~~
yjftsjthsd-h
I suppose "under" is perhaps a stronger term than I really want, since it did
in fact work out. I agree that Debian, indeed, fulfilled exactly what it
needed, but _only_ what it definitely needed right then; a minimum viable
product that turned out to be sufficent. RH built for everything that they
could imagine ever needing, and left options to extend for anything they
didn't think of in advance. Perhaps "just enough"-engineered vs over-
engineered?

~~~
icebraining
I disagree, RH did not in fact built it for everything; if they had, they
would not need headers-inside-headers or fixed "nonsense" values. What they
build is an over-engineered yet _incomplete_ format, but also inflexible, so
they had then to pile hacks upon it.

------
renox
That's funny, I've always thought that the various Linux distribution using
different packaging format was a huge waste of resources and wanted everyone
to standardize on RPM. After reading this article, I still thinks it is a
waste of resources but I'd prefer that everyone standardize on .deb not RPM!

------
rwmj
Almost no one who uses RPMs needs to know the internal format, and if they do
they should use rpmlib (or rpm2archive if they just want to unpack the files).
Does anyone not use JPEG because they think the internals of JPEG binary
format are a bit complicated?

~~~
bmn__
The point of the exercise was to not use rpmlib. The comparison to JPEG is
inappropriate because it has dozens of interoperable implementations.

Formats defined by a single implementation are sickly and much less viable
(remember the lessons of [http://libmng.com/](http://libmng.com/) ?); I
applaud the reverse engineering effort from the article.

------
CptMauli
To plug it here: If you want to build an RPM via Maven (or just want to build
one with Java using the underlying lib)

[https://github.com/ctron/rpm-builder](https://github.com/ctron/rpm-builder)

------
Subsentient
True, RPM is pretty messed up as far as complexity.

But, let's consider that I have never had an RPM-using system have its package
database corrupted beyond repair (you can rebuild it with rpmdb --rebuilddb),
and unlike dpkg/deb, rpm fully supports downgrading packages, which is
something I have missed on Debian multiple times.

RPM is too complex, but it's also a good format from a reliability standpoint.

~~~
RVuRnvbM2e
> I have never had an RPM-using system have its package database corrupted
> beyond repair

you've had that on Debian?

> rpm fully supports downgrading packages, which is something I have missed on
> Debian multiple times

Huh? Downgrades are fully supported by dpkg and apt*.

~~~
Subsentient
> Huh? Downgrades are fully supported by dpkg and apt.

No, they aren't. You need to remove the newer version prior to installing the
old one. RPM based systems allow you to downgrade to an entirely different
distro release with one command.

> you've had that on Debian?

More than once, sadly.

------
notacoward
Yes, the RPM format is crappy. A normal person would have seen that quickly
(or known it already), gotten their business done, and moved on. What's the
point of dissecting every little piece to criticize, or of criticizing even
the oversights or practices normal for their time as much as the things that
are truly insane? It's just rage-bait.

~~~
fuzzy2
To learn from past mistakes, probably. If you want to create something better,
you first need to determine what’s bad.

~~~
yjftsjthsd-h
You also need to understand how it got that way. In this case, a lot of the
things the author takes issue with are a combination of eternal backwards
compatibility, really wide portability, and having originally been devised
when memory was tiny and processors were slow. It's one of those things where
if you rebuilt it yourself, by the time you covered all features of the
original, you would be just as ugly.

~~~
cesarb
I agree. For instance, one of the things the author complains about is the
rpmlib(...) dependencies. To me, they are clearly a way to make older versions
of rpm (which do not know anything about these features) fail cleanly with a
"missing dependency" error.

------
jiveturkey
Like @nailer says, this article is under-researched and starting from a
prejudiced viewpoint. However I did love one comment.

> The <rpm/rpmtag.h> header has an enum with all acceptable values, but please
> consider the environment before trying to print it.

------
sandGorgon
there was a chance to unify linux packaging in the last year around creating
universal packages - what ended up happening instead was yet another war
between snap and flatpak. Everyone flaunts some superiority or another -
[https://github.com/AppImage/AppImageKit/wiki/Similar-
project...](https://github.com/AppImage/AppImageKit/wiki/Similar-
projects#comparison)

however, it would be incredible to have a single format

~~~
digi_owl
If you want a universal format, you have a compressed source tar-ball.
Anything else will run into the whims (and on the ground circumstances) of
maintainers and admins.

~~~
bboozzoo
Or cpio. IIRC tar had its own set of issues once you consider old Linux
distros or non-Linux systems in general.

~~~
yjftsjthsd-h
Fun story: I once had to exchange data with a Solaris box, and discovered that
its tar had no "z" flag. Ended up just exchanging uncompressed tar files
because I didn't want to figure out .tar.Z files.

~~~
digi_owl
.Z is compress/uncompress, no?

~~~
yjftsjthsd-h
Think so. And it was probably doable, but I realized that my network
connection was fast enough that the transfer would be done before I could
check.

------
okket
(2016)

~~~
moonbug
Dude, it's been insane for decades

~~~
digi_owl
And still some of the biggest name distros use it.

There are times i wonder if IT has some innate attraction to masochism...

------
lmilcin
The only reason RPM is still being in use is RedHat, the company, that did
well to provide the very needed service for actual businesses.

There is no competition between package managers, there is only competition
between distributions.

~~~
rwmj
How about SUSE and Mageia?

------
nailer
This person tries inspecting the format with a hex debugger, implying he
hasn't done any research. Rpm2cpio then extract the RPM file. It's a binary
and a text .spec file. The spec file format is divided into sections to build,
make etc the app and while ancient (like dpkg) not hard to learn.

~~~
icebraining
From what I can tell, spec files are just for source packages, whereas the
author was analyzing (and trying to build) a binary package directly.

~~~
nailer
That's even worse. Why would you analyse a binary RPM to inspect it, when
present that as a normal way to analyse or build RPMs?

> the author was analysing trying to build a binary package directly.

To analyse, there's two approaches that have been well documented for around
20 years now:

\- open the .src.rpm for a package

\- Run the rpm binary over the .rpm, which can dump spec file contents too

To build:

\- Make a spec file

Again, there's lots to dislike about RPM (my main issue is how the sections
are macros and so are the autoconf macros), which confused a lot of new
people) but this article in incredibly unresearched.

~~~
pornel
Keep in mind that the author tried to build an independent implementation, so
your complaint is like "Why are Gecko developers wasting time on reverse-
engineering websites, instead of installing Chrome?"

For this purpose it's fitting to do a deep analysis of files actually
produced, rather than trust the documentation to be entirely complete and
sufficient. Such low-level dump is also needed to debug interoperability
issues when files produced by the new implementation are rejected by existing
tools.

I'm developing cargo-deb, which builds Debian packages without need for spec
files. It doesn't use Perl, and can even build Debian packages from macOS. I
had to learn and reproduce the structure of the deb files, which was
relatively easy. I would need to go through exactly the same horror as in the
article if I wanted to add RPM support the same NIH way (so RPM tooling may be
good and easy, but RPM as a format is a failure.)

~~~
nailer
If I was making something to build RPM packages automatically (like the author
states), I'd still leverage the existing tools.

But even if I wanted to _recreate RPM_ which the author does (although he
doesn't state this) I'd start with the source code.

~~~
icebraining
_although he doesn 't state this_

Yes, he does, right in the first paragraphs.

 _I 'd start with the source code._

He started with the formal specification of the RPM format, which includes
relevant pieces of the source; how is that not a good place to start?

~~~
nailer
No. He states:

> I ship configuration as system packages. Every distribution has their own
> tooling and process for building these packages, but I eventually grew tired
> of the ceremony involved in it, and wrote my own system package compiler.

Which to most engineers would mean making a tool that builds.rpm files, not
reimplements the rpm build tools.

~~~
icebraining
Read a bit further: _I stubbornly refused to add dependencies and use existing
tooling (i.e., the rpm-build(1) command). I wanted to serialize the format
directly from my own code, like I did for Pacman and Debian packages._

~~~
nailer
Yeah. That's his problem - it's not "to make matters worse" (as OP writes)
it's the entire issue.

