
Extending the Linux Kernel with Built-In Kernel Headers - jrepinc
https://www.linuxjournal.com/content/extending-kernel-built-kernel-headers
======
eeeficus
It’s a simple and practical solution. It irks me for some reason but I can’t
think of something better.

~~~
mikepurvis
It's too bad the squashfs idea was dropped. Having to download and unpack the
archive somewhere seems like an extra step that should be unnecessary vs just
loading the module and having the tree appear in /sys for you to point your
compiler directly at.

Also a bummer that it looks like you have to be running the kernel to get
access to this— perhaps the archive could be marked off in the binary somehow
so that there's a way to access it for non-running kernels (eg, the dkms use
case of wanting to build the module for that kernel you just installed ahead
of booting it for the first time).

Overall binary size is a legit concern, but if this is going to be in an
optionally-loaded module anyway, it seems weird to make that call just based
on memory usage. I imagine most distro kernel builds will just disable this,
since they already ship the headers in separate packages.

~~~
tych0
squashfs was dropped because Greg doesn't like squashfs, which I can
understand somewhat (it doesn't have an active maintainer really, there's no
userspace library, just binaries, etc.).

However, we're doing a project where I work that depends heavily on squashfs,
so the fact that it was dropped from this even though it would have been nicer
because people don't like it because it doesn't have a maintainer worries me.
Hopefully someone picks it up :)

~~~
hawski
Oh, gosh. Since when it's without a maintainer? I would think it's quite an
important piece of most embedded Linux projects. Those companies really should
sponsor someone to take care of it.

My personal project also depends on squashfs. Are there any other options for
read-only compressed rootfs?

Squashfs-tools are a bit rough around the edges that's for sure. I patched in
ability to produce an image without root for my use, but I think it would be
generally useful. Sort list file format is funky. It's: "filename_string
priority_integer", so it will not work for files with spaces. Also new lines,
but that's far less common. It was not yet a problem for me and I can always
patch it more and one can say, that people having files with spaces deserve
it, though it's entirely different issue.

~~~
usr1106
squashfs has horrible performance. All requests to the block layer are 512
Bytes. Other filesystems like ext4 make much bigger requests and perform much
better in the end despite the compression of squashfs leading to lower overall
data volume.

Disclaimer: Measured 2 years ago on ARM32, emmc, with a 4.1(?) kernel.

~~~
plougher
Try using the SQUASHFS_4K_DEVBLK_SIZE config option next time

By default Squashfs sets the dev block size (sb_min_blocksize) to 1K or the
smallest block size supported by the block device (if larger). This, because
blocks are packed together and unaligned in Squashfs, should reduce latency.

This, however, gives poor performance on MTD NAND devices where the optimal
I/O size is 4K (even though the devices can support smaller block sizes).

Using a 4K device block size may also improve overall I/O performance for some
file access patterns (e.g. sequential accesses of files in filesystem order)
on all media.

Setting this option will force Squashfs to use a 4K device block size by
default.

If unsure, say N.

~~~
usr1106
I'm quite sure I have tried all options available in the kernel I used back
then without achieving performance comparable to ext4. My project manager was
conviced that squashfs makes things faster (and so hoped I initially because
the overall data volume is smaller) so I had a hard time to convince him that
we will just drop that "optimization" from the project plan. (He was one of
those who can prefer checkmarks over technical merit.) I don't remember the 4K
option for sure, but if it existed, we tried and measured it. What is the size
ext4 is reading from the block device? I'm reading this on holidays on my
phone, so I cannot easily fire up blktrace. But I could guess it's 128K or
even 256K. So still far from 4K.

------
dTal
Why /sys and not /proc ? After all, the kernel binary itself is under /proc .

~~~
cyphar
There is a very long thread on LKML discussing this. But I'm pretty sure the
primary reason is that procfs has been slowly accumulating more and more
random knobs, while sysfs actually has a structure that can be used reasonably
by userspace.

------
Aissen
Too bad there's no mention of the new BTF file format, which is supposed to
resolve this issue for the eBPF side:

[https://www.kernel.org/doc/html/latest/bpf/btf.html](https://www.kernel.org/doc/html/latest/bpf/btf.html)

------
riyakhanna1983
Why can't they build the eBPF bytecode offline using the correct kernel API
and ship the bytecode to the Android device?

~~~
derefr
If it’s anything like GPU shaders, it’s because the bytecode format itself is
Turing-complete and not-so-sandboxed, and the actual security/fault-tolerance
comes from a static analysis pass done during compilation that ensures the
_source code_ being compiled isn’t doing anything crazy.

If you were able to load bytecode directly, you’d skip this verification step.

(I’ve always thought VM runtimes should have signing keys, sign build
artifacts—e.g. bytecode—as they create them, and then have the VM’s module
loader check the signature. This way, you could still rely on build-time
static verification for your security, while also being able to share compiled
artifacts among any set of runtimes that trust one-another’s signing keys.)

~~~
stefan_
eBPF is specifically not Turing-complete because you can not prove any
property of Turing-complete language code.

~~~
sp332
eBPF is not Turing-complete, but static analysis can absolutely prove things
about individual programs in Turing-complete languages.

~~~
colechristensen
You can prove some code in Turing complete languages.

You can construct non-turing-complete languages which all code can be proven,
and that is the point the parent is making.

~~~
comex
Forbidding loops in a language does make all code written in that language
provably terminating. But there’s not much _else_ it lets you prove for all
code, at least not if you want to get results before the heat death of the
universe. For example, you can’t determine what inputs to a program will yield
a given output: even classic BPF is probably expressive enough to implement a
cryptographic hash, and eBPF definitely is. You can’t enumerate all possible
paths through the program: the number of paths is exponential in the size of
the program.

On the other hand, forbidding loops does make some properties easier to prove
_for restricted classes of programs_. For instance, Linux’s BPF verifier
tracks, for each instruction, the minimum and maximum value of each register
at that point in the program. It uses that to determine whether array accesses
in the program are bounds checked, and complain if not: that way, it doesn’t
have to insert its own bounds check, which might be redundant. Doing the same
in the presence of loops would require a more expensive algorithm, so
forbidding them is a benefit. Yet... Linux’s verifier is sound: it will forbid
all programs that could possibly index out of bounds, at least barring bugs.
But it is not fully precise: it does not pass _all_ programs that have the
property of never indexing out of bounds for any input. For example, you could
have a program that takes a 256-bit input and indexes out of bounds only if
its SHA-256 hash is a specific value. That program is safe iff there happen to
be no 256-bit strings that hash to that value, something that you could
theoretically verify – but only by going through all 2^256 possible strings
and hashing each of them. Linux does not.

Nor would it be reasonable to, why does that hypothetical matter? Because the
ability to prove arbitrary properties about _all_ programs, at the cost of
arbitrarily long analysis time, is sort of the main mathematical benefit to
non-Turing-completeness. But from a practical standpoint that’s useless. And
if you don’t need that – if you only care about the ability to prove things
about restricted classes of programs – well, you can achieve that even with
loops. After all, that’s what a type system does, and there are plenty of
Turing-complete languages with type systems. As I said, disallowing loops
makes the job easier in some cases. But that’s more of a matter of degree, not
the kind of hard theoretical barrier that “non-Turing-complete = analyzable”
makes it sound like. That makes it a less convincing rationale for disallowing
them.

------
de_watcher
What's wrong with shipping headers and using DKMS for building like in
GNU/Linux?

------
kumbel
> embed the kernel headers within the kernel image itself and make it
> available through the sysfs virtual filesystem (usually mounted at /sys) as
> a compressed archive file (/sys/kernel/kheaders.tar.xz)

eeww

~~~
forgottenpass
File this one under: kludges to get around openly user-hostile userland.

~~~
MBCook
How so?

Seems like an elegant enough solution to ensure you always have the right
headers to build modules against the currently running kernel.

~~~
forgottenpass
Isn't kernel header availability a solved problem on any linux system that
isn't busy pretending not to be built on linux?

~~~
danudey
The nice thing about this is that you can keep kernel headers around in a much
smaller format on-disk, while still making them available separately as a
package if you want.

A few problems this solves:

1\. Neither user nor tool knows specifically how to install the headers for
specifically this version of the kernel.

2\. Manually installing a header package on e.g. Ubuntu marks it as manually
selected, meaning it doesn't get cleaned up with an 'autoremove'; selecting a
generic kernel headers package (like linux-headers-4.14-generic or something)
means that every time your system auto-updates the kernel package version
(which happens by default on, for example, AWS IIRC, and happens a lot) you
end up with yet another copy of the kernel headers, and then you blow your I/O
budget uninstalling them.

3\. It removes one extra step to running e.g. an eBPF program, or building an
e.g. out-of-tree kernel module, so make scripts can now start to take
advantage of that. For example:

    
    
        # Get headers
        if modprobe kernel_headers; then
          <unpack the tgz>
        else
          <scan through /usr/src, /usr/local/src, etc. for what looks like the right version, or fail>
        fi

------
jononor
Shipping the kernel headers is complicated, but somehow a whole C compiler is
not? Or doesnt BPF compilation need a C compiler?

~~~
cyphar
BPF is compiled in user-space and the bytecode is provided to the kernel --
the compiler definitely isn't shipped as part of the kernel.

