
How newlines affect Linux kernel performance - tbodt
https://nadav.amit.zone/blog/linux-inline
======
zbjornson
v8 (before v5.9) used to only inline functions that were under 600 characters
(and 196 AST nodes). That was another fun way to add fuel to the tab vs.
spaces fire: identical functions that used spaces instead of tabs could run
significantly slower because they weren't inlined.

~~~
umvi
> The kernel compiler, GCC, is unaware to the code size that will be generated
> by the inline assembly. It therefore tries to estimate its size based on
> newline characters and statement separators (';' on x86).

Seems like a poor decision to use whitespace as a factor in estimating code
size. Some people like to space out their code with lots of comments even if
it is only a few lines long. This is my naive view, though - does anyone out
there have any insight as to why semicolons alone aren't sufficient to
estimate code size?

~~~
chrisseaton
> Seems like a poor decision to use whitespace as a factor in estimating code
> size.

It's a heuristic. It's always going to be more poor in some respect that doing
something precise. You could probably make it better in a million different
ways, but it's suppose to be quick and simple.

~~~
trhway
to inline the parsing must be done. Once parsing is done there is no newlines
and semicolons, only AST nodes. Counting AST nodes is very quick and simple
and provides for much better heuristic.

~~~
chrisseaton
> to inline the parsing must be done

Yes... that's the point. For your metric counting AST nodes you need to have
parsed the method. The other heuristic works on unparsed source-code. So you
don't waste time parsing a method that you won't inline.

> Counting AST nodes is very quick and simple

But parsing is neither. And you need to have done that first. I think (there's
a paper I'm sure but can't find it) that parse time for JS applications is
very substantial.

~~~
MikeHolman
You don't want to inline functions that haven't been parsed. If it hasn't been
called by the time you are JITing, then it probably isn't going to be on a hot
path.

~~~
chrisseaton
If it's been JITed then the AST may well have been discarded after the JIT
completed! Doesn't V8 work like that today? You'll need to pass it again even
if you have already parsed it!

It's not hard to think of worst-cases. Think about an enormous function that
you would not want to inline. It's been JITed, or it's running in a byte code
interpreter, and the AST has been discarded. In your scheme you'll have to
parse it to find it's too big. Huge waste of time, if you can see it's too
large from the source code.

~~~
MikeHolman
You can't throw away your interpreter bytecode, because the JIT code needs to
be able to bailout if a guard fails (e.g. if you specialized a var as an int
and you get a non-int).

So you can always use the size of your bytecode as an inlining heuristic
(which is what Chakra does).

Of course that wouldn't work before v8 had an interpreter, but they could have
used other heuristics. For example, they could have used the size of the AST
as a heuristic and saved that even after the AST gets thrown away.

------
ndesaulniers
__builtin_constant_p is a rats nest of edge cases. We're looking into
differences between implementation from gcc and clang now for the kernel.

~~~
ndesaulniers
Also, we're working on getting Clang's integrated assembler to assemble the
Linux kernel, but there's a long tail of syntactic sugar and simpler basic
things Clang has to implement as GAS from binutils is a defacto standard for
assembler.

------
rurban
I'd really like to see a comparison with clang's inliner. In my cases [1] gcc
constexpr support was always horrible and clang decent. So I'm not sure if
using the builtin assembler would fix the gcc situation.

1:
[https://github.com/rurban/safeclib/blob/master/tests/perf_me...](https://github.com/rurban/safeclib/blob/master/tests/perf_memcpy_s.c)

~~~
ndesaulniers
So I think the inlining decisions in a compiler are some of the most
heuristic-y parts of the compiler. There was a paper from Jeff Dean and some
other Googlers recently that hinted that machine learned models might be the
next leap forward in compiler technology, as a potential means of replacing
these heuristics.

~~~
rurban
Machine learning? If some logic, than precise and reproducible please. Prolog
would be nice, not some neural nets. gcc already has some nice lisp. Jeff Dean
usually has good ideas, but not this one. ML would be more useful for a JIT,
dynamic optimization problems at run-time.

But the problem is only the botched architecture, with non-manageable
interdependencies. Of course inlining is the mother of all optimizations, most
other optimization should be applied after inlining. But for inlining to be
effective it has to run after constant folding. And then after inlining you
have to run most optimization passes again to get the real benefits.

This problem is much simpler: Missing as-integration and a botched constexpr
API. They should just take it from llvm.

------
amelius
Seems like those competitions where people write one-liners have their merits.

~~~
est
[https://github.com/torvalds/linux/pull/437](https://github.com/torvalds/linux/pull/437)

------
jwilk
Ugh, all external links in this article go through the Google click tracker.
:-/

Here are ungoogled URLs:

[https://elixir.bootlin.com/linux/v4.17/source/include/linux/...](https://elixir.bootlin.com/linux/v4.17/source/include/linux/jhash.h#L70)

[https://elixir.bootlin.com/linux/v4.17/source/include/net/ds...](https://elixir.bootlin.com/linux/v4.17/source/include/net/dst.h#L442)

[https://elixir.bootlin.com/linux/v4.17/source/include/linux/...](https://elixir.bootlin.com/linux/v4.17/source/include/linux/thread_info.h#L121)

[https://c9x.me/x86/html/file_module_x86_id_318.html](https://c9x.me/x86/html/file_module_x86_id_318.html)

[https://elixir.bootlin.com/linux/v4.17/source/arch/x86/inclu...](https://elixir.bootlin.com/linux/v4.17/source/arch/x86/include/asm/bug.h#L33)

[https://gcc.gnu.org/onlinedocs/gcc-4.0.1/gcc/Extended-
Asm.ht...](https://gcc.gnu.org/onlinedocs/gcc-4.0.1/gcc/Extended-Asm.html)

[https://www.embecosm.com/appnotes/ean10/ean10-howto-
llvmas-1...](https://www.embecosm.com/appnotes/ean10/ean10-howto-
llvmas-1.0.html#idp109760)

[https://patchwork.kernel.org/patch/10450037/](https://patchwork.kernel.org/patch/10450037/)

[https://lkml.org/lkml/2018/10/4/25](https://lkml.org/lkml/2018/10/4/25)

[https://elixir.bootlin.com/linux/v4.17/source/include/linux/...](https://elixir.bootlin.com/linux/v4.17/source/include/linux/slab.h#L699)

[https://gcc.gnu.org/onlinedocs/gcc/Other-
Builtins.html](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html)

[https://elixir.bootlin.com/linux/v4.17/source/arch/x86/kvm/v...](https://elixir.bootlin.com/linux/v4.17/source/arch/x86/kvm/vmx.c#L9669)

[https://elixir.bootlin.com/linux/v4.17/source/arch/x86/kvm/v...](https://elixir.bootlin.com/linux/v4.17/source/arch/x86/kvm/vmx.c#L706)

~~~
gruez
I was wondering why that'd be the case (usually the only reason these urls
even pop up was because someone was copying urls from a google results page),
then I saw this at the bottom:

>Made with the new Google Sites, an effortless way to create beautiful sites.

~~~
jrockway
The idea is to prevent you from leaking the URL of your internal google sites
page through the referer header. Maybe Google does some tracking, but the idea
behind the redirect was for privacy, as I understand it. (Disclaimer: former
Googler.)

~~~
taneq
There's usually a good _justification_ like this for features like these. The
fact that every single such feature results in Google tracking more of your
life is a mere coincidence.

~~~
jrockway
I mean, they could just inject Javascript that detected that you clicked the
link if tracking was the only goal. The fact that they prefer an HTTP redirect
means there's more than just tracking involved.

It would be nice if they put a link to the privacy policy on the redirect
page, so the intent was more clear.

