
The Linux Kernel Is Now VLA (Variable-Length Array) Free - kbumsik
https://www.phoronix.com/scan.php?page=news_item&px=Linux-Kills-The-VLA
======
woodruffw
Awesome. I'm looking forward to the day that the Linux kernel stops using the
(GNU-specific) `sizeof(void) == 1` assumption as well[1].

[1]: Just by way of example, one that I ran into recently:
[https://elixir.bootlin.com/linux/latest/source/include/uapi/...](https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/netfilter/x_tables.h#L129)

~~~
millstone
Why? It's useful for pointer arithmetic and seems harmless, since the
alternative is to make it an error.

It would be better for the C spec to just define sizeof(void) == 1.

~~~
sleavey
I am ignorant of this particular aspect of C and most of C in general, but
shouldn't the kernel developers try to follow the C standard instead of just a
compiler's?

~~~
cesarb
The Linux kernel makes heavy use of GCC extensions, like statement expressions
([https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Statement-
Exprs...](https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Statement-Exprs.html)),
__builtin_constant_p ([https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Other-
Builtins....](https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Other-
Builtins.html#index-_005f_005fbuiltin_005fconstant_005fp)), and GCC-style
inline assembly. Since this means the kernel can only be compiled by gcc (or
compilers which try to mimic gcc like clang, which pretends to be gcc 4.2.1),
it makes sense to treat GCC (actually "the C89 standard plus GCC extensions")
as the standard.

~~~
flurrything
> it makes sense to treat GCC (actually "the C89 standard plus GCC
> extensions") as the standard.

While that might have been the case, this announcement says that one of the
many reasons to stop using VLAs is to allow the kernel to be compiled with
clang. The announcement reads like being able to compile the kernel with other
compilers is a very desirable property, that has taken many years of hard work
to achieve due to the incorrect assumption that "GCC is the C standard".

So while that assumption might have made sense back then, it does not appear
to make sense now. If you treat one compiler as "the standard", chances are
that that's the only compiler that you will ever be able to use. That's a bad
strategic decision for a big project like the Linux kernel.

------
int_19h
I'm actually kinda sad that VLAs didn't work out in general (they have been
downgraded to an optional feature in the latest C standards). For all the
downsides, they make working with matrices in C much easier.

~~~
enriquto
Yeah, VLA is one of my favorite features in C. I would love if it was viable
to use them for arbitrarily large temporary arrays. They lead to much cleaner
code. Instead of

    
    
        {
                float *x = malloc(n * sizeof*x);
                ...
                free(x);
        }
    

you do simply

    
    
        {
                float x[n];
                ...
        }
    
    

In image processing, you often need large temporary images, but it is
dangerous to distribute code such as the above unless you play with the stack
limits from outside your program.

~~~
dzdt
Its unfortunate that C and C++ kept the mindset that automatically memory
managed variables are stored on a SMALL stack, with the consequence of
exceeding that small and unknown size being that your program crashes.

For variable sized arrays, there is already a bit of overhead in sizing the
allocation. Would it really have been impossible to move large allocations to
the heap with automatic free at exit of the scope?

~~~
plasticchris
Not all software written in c has a heap.

~~~
megous
Not all C programs have stack ether. Say Micochip's XC8 compiler for pic14
family.

~~~
rbanffy
That has to be "fun" to work with...

~~~
megous
You just can't recurse and the same function can't be called from isr and from
main context at the same time. Basically, nothing is re-entrable. Otherwise
it's quite fine. To be sure, you shouldn't be doing those things anyway if you
have 64B of RAM and 512 words or ROM. :D

------
larkeith
Though much less strict, this brings to mind the JPL Coding Standard for C [1]
- it seems like a similar idea as a lot of JPL's rules, in that it is much
more difficult to verify and ensure the stability of less static data
structures; compare:

    
    
      3: Use verifiable loop bounds for all loops meant to be terminating
    

[1] [https://lars-lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdf](https://lars-
lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdf)

~~~
carlmr
Verifiable doesn't mean static. You could have VLA with an upper bound and
still not violate this rule.

------
ericpauley
Curious that there isn't a static analysis to verify that these have been
removed completely. Is this difficult or was it just not worth the effort?

~~~
ndesaulniers
The hard part with the kernel is its fanciful build system, Kbuild. It's great
at turning off compilation of subsystems that you don't plan to use. The
difficulty is that there are a large number of mutually exclusive number of
configurations, so the config that's called `allyesconfig` really isn't "all"
because you can only pick one of the mutually exclusive configs. Additionally,
some configs don't just add/remove code, but can change which implementation
gets compiled in (all via C preprocessor and -D flags).

Those configs mean there are massive combinations of what is actually set, and
which code you're actually trying to compile. I imagine that's similarly
difficult for static analysis; due to the build system and what code you run
it over. I think a simpler approach with grep might actually work better.

------
chris_wot
Ok, I’ve looked but failed - where is there an example of what a VLA is?

~~~
asveikau

       void foo(int n)
       {
          int m[n];
       }
    

The downside is if n is large you use a lot of stack, but you wouldn't notice
if it's mostly called with low values. So a bit of a ticking time bomb.

I guess if you have a _really_ large n, like more than a few pages worth, then
certain values of m[i] might wind up in some other page allocation where it
shouldn't.... I seem to recall a vulnerability like this a year or two ago.
(Normally a stack can grow on a page fault by hitting a guard page, but if the
n above is absurdly large...)

~~~
craftyguy
So adding a

    
    
        if (n > MAX){
            return;
        }
    

Would not help?

~~~
olliej
As @benchaney said you have to choose an appropriate guard. But the issues
with VLAs are many

* the code is bigger, and people say is meaningfully slower

* super easy to accidentally blow the stack - in kernel land that is probably going to end in a panic

* using vLAs used to disable stack canaries for those frames, not sure if it still does

For a lot of cases where you are truly going to need variable length you’re
unlikely to be penalised heavily by a heap allocation due to the relative cost
of the operations.

~~~
jcelerier
> meaningfully slower

really ? there's a difference but unless you keep allocating in a hot-loop
this should hardly ever have observable costs :
[https://gcc.godbolt.org/z/SN9Ois](https://gcc.godbolt.org/z/SN9Ois)

~~~
jabl
The problem is with how to access other variables on the stack. Usually the
compiler uses an offset from the stack pointer, which without VLA's is known
at compile time. However, if you have VLA's, you might need a calculation at
runtime to figure out where that variable is stored. Or maybe the offset from
the frame pointer is known statically, but then you can't do the "omit frame
pointer" optimization and you thus waste a register. Or if you have a variable
that is between two VLA's, then you need a runtime calculation regardless.

------
pjmlp
Thankfully! VLAs were a very dumb idea to start with, given C's sematics.

~~~
krzyk
Why? It has quite nice simple syntax. And all C newbies love it (doing malloc
and then free is cumbersome, and not needed in simple cases).

~~~
pjmlp
What about lack of bounds checking leaving the door open to security exploits,
in the days where CVEs caused by C's use increase every month?!?

Hence why they were dropped in C11. Being an optional annex means that
actually most C vendors won't bother.

~~~
viraptor
Why is that different from heap allocations? They don't have automatic bounds
checking either.

~~~
pjmlp
Malloc does not corrupt the heap if it cannot accommodate the requested block
size.

------
delinka
So C's VLA feature has been implemented on stack space. Why not on the heap?
Seems safer. Is this strictly about convenience for implementation? Reclaiming
stack memory at the end of the scope is already implemented, so I understand
the motivation.

What I don't understand is: after all these decades of stack smashing
vulnerabilities, why would such a shortcut have been taken?

Convenience costs performance, safety costs performance ... at some point you
just have to suck it up and pay the performance penalty to make things safer
and to improve programmer productivity.

~~~
drfuchs
You'd have to get longjmp to understand how to unwind these heap-allocated
arrays, not only from the current function's stack frame, but for all the ones
in the longjmp'd-over frames. In C-land, longjmp simply overwrites the stack
pointer (and all other registers), which implicitly "deallocates" all the VLA
arrays, and all alloca() allocations, in one fell swoop. There is no notion of
a step-by-step unwinding of the stack, so it would be a big task to try to
trap all intermediate VLA arrays.

------
kristianp
Are there any numbers as to what difference this will make to the compiled
size and speed of the kernel? I imagine the difference in size will be tiny.

Did they profile before optimising[1] and identify it as a problem?

[1]
[http://wiki.c2.com/?ProfileBeforeOptimizing](http://wiki.c2.com/?ProfileBeforeOptimizing)

~~~
gbear605
According to the article, this is in large part for security, and a little bit
for speed, not for size.

~~~
kristianp
There is a Linus quote in the article: "It generates much more code, and much
_slower_ code (and more fragile code),". "generates much more code" means a
size difference.

~~~
rstuart4133
If he had just said what you quoted he would be wrong in the general case.

He actually said it is all those things _in comparison to doing fixed
allocation_. The reason that is relevant is the kernel stack is so small you
can only use a VLA when you know in advance the upper bound on the size is
small, but if it's small you may as well use the fixed upper bound, and if you
do that it is always faster, smaller and less fragile than using a VLA.

He's wrong in the general case, because user land C programmers will replace a
VLA with:

    
    
        if (!(array = malloc(n * sizeof(array[0]))) fatal("I'm out of memory");
    

which compared to the VLA generates more code and is slower. In both the VLA
and malloc() case if you run out of memory the program will die nice and
deterministically (unlike the kernel), but in the malloc() case that will only
happen if you remember to do the check whereas in the VLA case the compiler
ensures it always happens.

------
lclarkmichalek
God bless Kees Cook

------
wiz21c
FTA :

>>> \- VLAs within structures is not supported by the LLVM Clang compiler and
thus an issue for those wanting to build the kernel outside of GCC, Clang only
supports the C99-style VLAs.

one less vendor lock in (but since it's GCC, it makes me sad).

------
z3t4
Meanwhile on the web ...

