Hacker News new | more | comments | ask | show | jobs | submit login
Argh-P-M – Dissecting the RPM file format (2016) (bethselamin.de)
48 points by icebraining 7 months ago | hide | past | web | favorite | 55 comments

I've authored many RPM packages, it's dead simple. Have a spec file, tar it, and run rpmbuild on it.

I don't really know why this guy was freaking out.

I also like the fact that transient dependency and repo handling is separated from the package handler itself. We have rpm to handle a single package, and we have dnf (yum) to handle trees of packages.

> I don't really know why this guy was freaking out.

That's clearly explained in the article. This is not about authoring from a specfile, so your comment is off. He wants to serialise a package into RPM without the traditional toolchain; it turns out to be much more difficult than the other serialisers he already has because of the rpm format's bad design decisions and accretions.

I don't think they're bad decisions; I think he has no idea why it's that way and so is assuming that it's terrible. The RPM format was designed when computers were smaller and slower, and is extremely flexible while forcing you to declare enough that it can validate the file contents as it goes. It's over-engineered, and there are certainly things I would change, but it's not just random stupid decisions. If you can't think of any good reason why it would be written the way it is, the main options are that you're missing something or the original spec authors were stupid. Of course, sometimes they were stupid, but maybe don't immediately assume that after your very first investigation into the format.

The Deb file format was designed even earlier, yet it doesn't contain a matryoshka of metadata.

Debian under-engineered, Redhat over-engineered. In perfect hindsight, Debian probably made better tradeoffs, but that's only with decades of hindsight.

I don't see how Debian under-engineered. Does it not fulfill its needs?

Plus, RH could have decided to make a proper v2 rather than build this lasagna of a format.

I suppose "under" is perhaps a stronger term than I really want, since it did in fact work out. I agree that Debian, indeed, fulfilled exactly what it needed, but only what it definitely needed right then; a minimum viable product that turned out to be sufficent. RH built for everything that they could imagine ever needing, and left options to extend for anything they didn't think of in advance. Perhaps "just enough"-engineered vs over-engineered?

I disagree, RH did not in fact built it for everything; if they had, they would not need headers-inside-headers or fixed "nonsense" values. What they build is an over-engineered yet incomplete format, but also inflexible, so they had then to pile hacks upon it.

> RPM format was designed when computers were smaller and slower

So was DEB. That's not a very good excuse.

That's funny, I've always thought that the various Linux distribution using different packaging format was a huge waste of resources and wanted everyone to standardize on RPM. After reading this article, I still thinks it is a waste of resources but I'd prefer that everyone standardize on .deb not RPM!

Almost no one who uses RPMs needs to know the internal format, and if they do they should use rpmlib (or rpm2archive if they just want to unpack the files). Does anyone not use JPEG because they think the internals of JPEG binary format are a bit complicated?

The point of the exercise was to not use rpmlib. The comparison to JPEG is inappropriate because it has dozens of interoperable implementations.

Formats defined by a single implementation are sickly and much less viable (remember the lessons of http://libmng.com/ ?); I applaud the reverse engineering effort from the article.

> Does anyone not use JPEG because they think the internals of JPEG binary format are a bit complicated?

Yes! There are lots of examples like this.

• Plenty of small tools write BMP or Targa instead of JPEG or PNG, because these formats are dead simple to reinvent.

• JPEG as we know it is JFIF, which has clearly won over alternative JPEG data containers, such as more complicated TIFF (mode 6). BTW JPEG, for what it does, is brilliantly simple.

• MNG never took off, but dumber APNG did.

Formats that are intended for interoperability need multiple implementations, and it does happen that authors say "no, fuck that" and choose a different format instead.

To plug it here: If you want to build an RPM via Maven (or just want to build one with Java using the underlying lib)


True, RPM is pretty messed up as far as complexity.

But, let's consider that I have never had an RPM-using system have its package database corrupted beyond repair (you can rebuild it with rpmdb --rebuilddb), and unlike dpkg/deb, rpm fully supports downgrading packages, which is something I have missed on Debian multiple times.

RPM is too complex, but it's also a good format from a reliability standpoint.

> I have never had an RPM-using system have its package database corrupted beyond repair

you've had that on Debian?

> rpm fully supports downgrading packages, which is something I have missed on Debian multiple times

Huh? Downgrades are fully supported by dpkg and apt*.

> Huh? Downgrades are fully supported by dpkg and apt.

No, they aren't. You need to remove the newer version prior to installing the old one. RPM based systems allow you to downgrade to an entirely different distro release with one command.

> you've had that on Debian?

More than once, sadly.

I've had a Debian system break because I couldn't fix the package database, yes. I might well have just missed a trick, but I did google around for fixes and got nowhere.

How are these qualities descriptive of the file format and not of the particular implementation of the package manager used by the distribution in question?

I am talking about the official implementation of the RPM package manager from Red Hat.

The file format and the quality of the package manager implementation seem to be rather orthogonal, so I don't understand how your top comment is relevant to the article.

Yes, the RPM format is crappy. A normal person would have seen that quickly (or known it already), gotten their business done, and moved on. What's the point of dissecting every little piece to criticize, or of criticizing even the oversights or practices normal for their time as much as the things that are truly insane? It's just rage-bait.

To learn from past mistakes, probably. If you want to create something better, you first need to determine what’s bad.

You also need to understand how it got that way. In this case, a lot of the things the author takes issue with are a combination of eternal backwards compatibility, really wide portability, and having originally been devised when memory was tiny and processors were slow. It's one of those things where if you rebuilt it yourself, by the time you covered all features of the original, you would be just as ugly.

I agree. For instance, one of the things the author complains about is the rpmlib(...) dependencies. To me, they are clearly a way to make older versions of rpm (which do not know anything about these features) fail cleanly with a "missing dependency" error.

For the benefit of the next person who has to figure this out, with bonus angry commentary?

Like @nailer says, this article is under-researched and starting from a prejudiced viewpoint. However I did love one comment.

> The <rpm/rpmtag.h> header has an enum with all acceptable values, but please consider the environment before trying to print it.

there was a chance to unify linux packaging in the last year around creating universal packages - what ended up happening instead was yet another war between snap and flatpak. Everyone flaunts some superiority or another - https://github.com/AppImage/AppImageKit/wiki/Similar-project...

however, it would be incredible to have a single format

Those formats aren't comparable to RPM/deb. In any case even if we did unify RPM and dpkg, it would make no difference because distros would still have their own standards for naming, dependencies, etc. SLES and RHEL both use RPMs but you cannot copy RPMs from one to the other.

If you want a universal format, you have a compressed source tar-ball. Anything else will run into the whims (and on the ground circumstances) of maintainers and admins.

Or cpio. IIRC tar had its own set of issues once you consider old Linux distros or non-Linux systems in general.

Fun story: I once had to exchange data with a Solaris box, and discovered that its tar had no "z" flag. Ended up just exchanging uncompressed tar files because I didn't want to figure out .tar.Z files.

.Z is compress/uncompress, no?

Think so. And it was probably doable, but I realized that my network connection was fast enough that the transfer would be done before I could check.


Dude, it's been insane for decades

And still some of the biggest name distros use it.

There are times i wonder if IT has some innate attraction to masochism...

The only reason RPM is still being in use is RedHat, the company, that did well to provide the very needed service for actual businesses.

There is no competition between package managers, there is only competition between distributions.

How about SUSE and Mageia?

This person tries inspecting the format with a hex debugger, implying he hasn't done any research. Rpm2cpio then extract the RPM file. It's a binary and a text .spec file. The spec file format is divided into sections to build, make etc the app and while ancient (like dpkg) not hard to learn.

From what I can tell, spec files are just for source packages, whereas the author was analyzing (and trying to build) a binary package directly.

That's even worse. Why would you analyse a binary RPM to inspect it, when present that as a normal way to analyse or build RPMs?

> the author was analysing trying to build a binary package directly.

To analyse, there's two approaches that have been well documented for around 20 years now:

- open the .src.rpm for a package

- Run the rpm binary over the .rpm, which can dump spec file contents too

To build:

- Make a spec file

Again, there's lots to dislike about RPM (my main issue is how the sections are macros and so are the autoconf macros), which confused a lot of new people) but this article in incredibly unresearched.

Keep in mind that the author tried to build an independent implementation, so your complaint is like "Why are Gecko developers wasting time on reverse-engineering websites, instead of installing Chrome?"

For this purpose it's fitting to do a deep analysis of files actually produced, rather than trust the documentation to be entirely complete and sufficient. Such low-level dump is also needed to debug interoperability issues when files produced by the new implementation are rejected by existing tools.

I'm developing cargo-deb, which builds Debian packages without need for spec files. It doesn't use Perl, and can even build Debian packages from macOS. I had to learn and reproduce the structure of the deb files, which was relatively easy. I would need to go through exactly the same horror as in the article if I wanted to add RPM support the same NIH way (so RPM tooling may be good and easy, but RPM as a format is a failure.)

If I was making something to build RPM packages automatically (like the author states), I'd still leverage the existing tools.

But even if I wanted to recreate RPM which the author does (although he doesn't state this) I'd start with the source code.

although he doesn't state this

Yes, he does, right in the first paragraphs.

I'd start with the source code.

He started with the formal specification of the RPM format, which includes relevant pieces of the source; how is that not a good place to start?

No. He states:

> I ship configuration as system packages. Every distribution has their own tooling and process for building these packages, but I eventually grew tired of the ceremony involved in it, and wrote my own system package compiler.

Which to most engineers would mean making a tool that builds.rpm files, not reimplements the rpm build tools.

Read a bit further: I stubbornly refused to add dependencies and use existing tooling (i.e., the rpm-build(1) command). I wanted to serialize the format directly from my own code, like I did for Pacman and Debian packages.

Yeah. That's his problem - it's not "to make matters worse" (as OP writes) it's the entire issue.

Who cares why you do it? Maybe you're just curious. How does that invalidate the analysis?

It makes it seem like RPM is an incredibly difficult packaging system, when in reality the author has deliberately chosen an obscure and unusual way of working with it.

ASN.1 and XML are easy to work with using the right existing tools, but that doesn't mean someone can't post their thoughts about them being frustrating formats.

OP made it clear what they were focusing on. If a passerby mistakes it for a complaint about using RPM as a sysadmin, it's their own problem.

OP said they were trying to make a tool to automatically generate RPM packages. They didn't do this efficiently or reasonably at all.

This is nothing to do with the packaging system or the ecosystem, the author decided to dive into the file format specifically. They even posted comments from the rpm source that echo the confusion and disbelief reflected in the article. Is the tone the author took the problem you have with it? Otherwise I don’t get why you are so against someone trying to learn how a file format works and documenting the experience.

I've built tools that automatically create RPM files in less time than the author tool to write this article. I take umbrage in the bizarre implication that building a tool to make RPMs needs you to dive into the format with a hex editor, when it does not.

In fairness, the author was trying to write his own RPM encoder, so not using existing tools was... less of a bad idea than usual. That said, I agree that this was under-researched.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact