
John Carmack on Inlined Code (2014) - rinesh
http://number-none.com/blow/blog/programming/2014/09/26/carmack-on-inlined-code.html
======
dzdt
I have had the pleasure to work with lots of other people's code of varying
styles and quality. Nothing is harder to read and understand than code which
is deeply nested calls from one little helper (or wrapper) function to
another. Nothing is easier to read and understand than code which just flows
straight through from top to bottom of a big function.

There are other tradeoffs of code reuse and speed and worst-case-speed and
probability of introducing bugs. If you haven't read the article, do, its
worth it.

I love that Carmack tries to measure which styles introduce more bugs! Who
else does that? Seriously, I would love to see more of that.

~~~
kminehart
Good Lord yes. When I was working on point of sale machines for Toshiba, there
was this unbelievable depth of inheritance and ridiculously deep callstack
constantly, and it made fixing any minor bug take weeks.

* abstract class Connection would have maybe 3 or 4 methods;

* DataConnection would extend Connection and add a couple more methods that were specific to some proprietary protocol.

* POSDataConnection would extend DataConnection and wrap this proprietary protocol for POS machines

* ControllerDataConnection would extend POSDataConnection because a Point of Sale controller is technically a POS machine with a bit more functionality (Really just a couple flags turned on).

* There was plenty more in between; it's been so long now that I've forgotten it all.

It's just like we all learned in college! Object Oriented programming is
supposed to model real-life! Except no, it's not. That's stupid and
complicated.

Now the part of the OS that was C/C++ was absolutely beautiful. It took the
UNIX style of programming / applications seriously; every little piece was its
own program, and it worked flawlessly. Anyone could jump in and get to work
immediately because it was so well written. You could follow any program top
to bottom and it just... made sense!

~~~
artursapek
> it made fixing any minor bug take weeks.

My god.

A lot of these fancy language features sound good on paper but can be so
abused that even just coming to understand the code takes up too much space in
your head.

[https://twitter.com/davidcrawshaw/status/507268175256231936](https://twitter.com/davidcrawshaw/status/507268175256231936)

~~~
MyNameIsFred
Emphasis belongs on _abused_. Interfaces are swell. Subclasses of classes
which implement an Interface are swell too, when they extend simply by
altering the way said interface is achieved. When you subclass to "add more
stuff", however, you get the kind of nonsense described.

~~~
deong
The problem, in my opinion, is that the Java community (generalizing, I know)
decided to bless that abuse as a best practice.

------
lj3
Interestingly, Casey Muratori accidentally demonstrates during one of his
Handmade Hero sessions that the compiler won't always be able to optimize
certain bits of code that are put in a function as opposed to being inline.

In the video, he inlines a very simple function and his game gets twice as
fast for no apparent reason. It's instructive to watch him dive into the
generated assembly to figure out why.

[https://www.youtube.com/watch?v=B2BFbs0DJzw](https://www.youtube.com/watch?v=B2BFbs0DJzw)

~~~
imtringued
The compiler probably turned

    
    
      for(int I = 0; I < 4; ++I) {
            real32 PixelPx = (real32)(XI + I);
            real32 PixelPy = (real32)Y;
            real32 dx = PixelPx - Origin.x;
            real32 dy = PixelPy - Origin.y;
      
            real32 U = dx*nXAxis.x + dy*nXAxis.y;
            real32 V = dx*nYAxis.x + dy*nYAxis.y;
      
            //rest of the loop
      }
    

into something like

    
    
      real32 PixelPy = (real32)Y;
      real32 dy = PixelPy - Origin.y;
      
      real32 PixelPx = (real32)(XI);
      real32 dx = PixelPx - Origin.x;
      
      real32 U = dx*nXAxis.x + dy*nXAxis.y;
      real32 V = dx*nYAxis.x + dy*nYAxis.y;
      
      for(int I = 0; I < 4; ++I) {
          U += nXAxis.x;
          V += nYAxis.x;
          
          //rest of the loop
      }
    

PixelPy and dy are not affected by the counter in the loop which means they
can safely moved outside the loop.

This also results in the subexpression dy _nXAxis.y and dy_ nYAxis.y being
lifted outside the loop.

Now we've moved half of the code outside the loop but we aren't done yet.

The same can be done with PixelPx and dx, the trick is to then replace dx
_nXAxis.x with

    
    
      (dx + I)*nXAxis.x
    

Expanding

    
    
      (dx + I)*nXAxis.x
    

yields

    
    
      dx*nXAxis.x + I*nXAxis.x
    

We can now lift the subexpression

    
    
      dx*nXAxis.x
    

out of the loop.

The only thing that is now done in the loop is

    
    
      I*nXAxis.x
    

which can be further simplified to

    
    
      U += nXAxis.x
    

The same happens with nYAxis.x.

EDIT: Sorry for the bad formatting. The markdown parser ate my asterisks so I
put things into code blocks which requires a new line each time.

~~~
Lerc
Effectively this means the speedup comes from optimizations that assume the
code in question is only ever run in that context. When the code is inline
this is an easy call to make. For a function it's trickier. I would hope some
compilers make a function to handle arbitrary contexts but try inlining on
individual cases to see if significant gains such as this can be made.

It's another hurdle for the sufficiently smart compiler though. You need to
know how the program will be run to know which is the better form. Once you
get into making code-size Vs speed things get murky with instruction caches
etc.

------
bshimmin
Imagine how fantastic it must be working with someone like Carmack. Sure, the
first few code reviews would be fairly traumatic - as you quickly realise just
how much faster and generally _better_ he is than you - but I think after a
little while you could just relax and try to absorb as much as possible.

I love how everything in these emails is delivered as a calm series of
reflections, chronicling with great honesty his own changing opinions over
time - nothing is a diktat.

I also found it rather heartening that he makes the same copy/paste mistakes
that the rest of us do - how many times have you duplicated a line and put "x"
or "width" on both lines..? Seemingly Carmack can actually tell you how many
times he's done that!

~~~
smegel
> the first few code reviews would be fairly traumatic

Hopefully because he is saying "did you really think vain attempts at
premature optimization were going to impress me?".

~~~
blackbeard334
Nerd rage:
[https://youtu.be/JjDsP5n2kSM?t=13m41s](https://youtu.be/JjDsP5n2kSM?t=13m41s)

------
bluetomcat
Another perspective in defense of long functions is that they enable you to
spot common expressions/statements within the body, for example:

    
    
        void long_func(void) {
            ...
            if (player.alive && player.health == 100) {
                ....
            }
            ...
            if (some_other_condition && player.alive && player.health == 100) {
            }
        }
    

Conventional wisdom says that you should write a function
`is_player_untouched` and substitute the composite expressions with function
calls, but the code in question can be refactored in a much more
straightforward way:

    
    
        void long_func(void) {
            ...
            const bool player_untouched = is_player_untouched();
    
            if (player_untouched) {
                ....
            }
            ...
            if (some_other_condition && player_untouched) {
            }
        }
    

Had the function body been split into more functions for "clarity", you would
be doing duplicate calls to `is_player_untouched()` which go unnoticed because
they would be buried deep in the call graph.

~~~
rubber_duck
This leads to brittle code - it's easy to determine if is_player_untouched
state changes between those two conditions when writing the code but when
someone else edits that code it's also easy to introduce something that will
break it and the monolithic/big function makes it hard to keep track of
assumptions like this and it leads to a lot of code changes in those big
functions as well. Even worse this kinds of bugs usually end up being hard to
test if you're not covered with unit tests, and games rarely are, and big
functions go against testing practices.

~~~
bluetomcat
It sure leads to brittle code if assumptions and contracts in the code are not
crystal clear, but Carmack's point in this particular article was about
writing "consistently performing" code which doesn't degrade under specific
conditions.

In my refactored example, you wouldn't be _eventually_ calling
`is_player_untouched()` once more if `some_other_condition` is true.

------
logfromblammo
If you occasionally inline all the functions and unroll all the loops, you can
occasionally find optimizations that even the compiler won't be able to make.

For example, in quaternion-based rotation math, there exists a "sandwich
product" where you take the (non-commutative) product of the transform and the
input, followed by the product of that result and the conjugate of the
transform.

It turns out that several of the embedded multiplication terms cancel out in
that double operation, and if you avoid calculating the canceled terms in the
first place, you can do a "sandwich product" in about 60% the total floating-
point operations as two consecutive product operations.

In the application that used spatial transforms and rotations, the optimized
quaternion functions were faster than the 4x4 matrix implementation, whereas
the non-optimized quaternion functions were slightly slower. That change alone
(adding an optimized sandwich product function) cut maybe 30 minutes off of
our longest bulk data processing times.

You would _never_ be able to figure that out from this.

    
    
      out = ( rotation * in ) * ( ~rotation );
    

You have to inline all the operations to find the terms that cancel (or
collapse into a scalar multiplication).

------
apeace
I think there's a big point that's being missed here. Carmack is conflating
inlining code with writing functional code. These are different things.

I'd agree that if the majority of your code is mutating state, it makes sense
to mash all that together in one place. You want to keep an eye on the dirty
stuff.

But on the other hand, inlining pure functions that don't use or mutate any
global state doesn't make sense to me. Why is making it "not possible to call
the function from other places" a benefit?

How about calling that code from a unit test!

~~~
phkahler
>> Why is making it "not possible to call the function from other places" a
benefit?

When it's a pure function that's not a problem. When it changes state then you
lose track of ordering and such. That's his point, state changes need to be
kept in the one big function so you can keep track of them easily.

And of course, almost all interesting software has mutable state. Otherwise
you're just doing a computation and looking for a single output.

~~~
apeace
>> When it's a pure function that's not a problem.

That was my point. He's conflating two different things. I understand why
inlining mutation has benefits. Just not inlining functional code.

>> And of course, almost all interesting software has mutable state.

Of course, but I think most programmers overestimate how prevalent state needs
to be throughout a program.

I once wrote an RSS aggregator as an eight-stage pipeline. It checked about
40k feeds, each every 60 seconds. Every stage had a 'main' file where the vast
majority of state was kept. The rest was functional libraries. I suppose that
would be a demonstration of what Carmack is proposing, with the difference
being that my pure functions (the majority of the code) had clear names and
were unit-tested.

It worked so well that almost every large program I've written since has been
designed the same way!

------
jon-wood
The thing that struck me was Carmack's relentless pursuit of perfection. I
can't think of many people who'd describe a single frame of input latency as a
cold sweat moment!

~~~
dfan
When you are making videogames, a (video) frame of latency is a big deal.

When I worked on Guitar Hero and Rock Band, we worried about sub-frame latency
(timing is more important when you're hitting a drum than when you're firing a
gun).

~~~
speeder
How you handled CRT vs non-CRT screens?

I still use CRT screens, not because of latency, but because of better
contrast and colour reproduction, and the capability to use whatever
resolution I want.

I noticed that in new games, and using newer video-cards, there is some kinda
weird lag there, like if they were geared on purpose for slow LCDs (there
seemly even some variables that you can control on AMD cards, using Windows
Registry, or tweaking the Linux driver, related to screen input lag, they are
on the "PowerPlay" part of the drivers for some reason though, I couldn't
figure yet what they do exactly).

EDIT: Also, I stopped playing music-games almost entirely, I found many of
them completely unplayable on my setup, I just can't find the correct settings
to make the timing work. The least aggravating one is "Necrodancer" that
seemly is really good in calibrating.

~~~
dfan
There are so many AV setups, we had to leave calibration to the user (we tried
with auto-calibration but it could not always be perfect).

The fundamental problem is that there are two independent delays that both
depend on your individual system: the delay from the time that the console
produces a video frame to the time that the user sees it, and the delay from
the time that the console produces a sound to the time that the user hears it.
In a beatmatching game, you really need the user's perceptions to be in sync,
which means delaying either the video or the audio. Of course, the more you
delay one or the other, the more the repercussions you run into.

In a regular video game, it's not a big deal if you fire a gun and hear the
shot 50ms later, but in a beatmatching game, that delay is really noticeable.

~~~
AstralStorm
In a competitive twitch shooter, 10 ms lag is an important handicap. And that
is less than one frame at 60 FPS.

------
ctrlrsf
Long functions might read easier but you lose some testing precision. I think
recent focus on testing has lead to shorter functions with as little
responsibility as possible. When short functions fail a test you have smaller
surface area to look for the cause.

~~~
leojfc
Doesn’t this imply the need for a new language feature? So that well-defined
sections of inline code can be pulled out, initial conditions set in a testing
environment, and then executed independently.

I guess this could trip up if the compiler optimisations available when
considering all the code at once means that the out-of-context code actually
does something different in testing...

~~~
chimprich
Pull out well-defined sections of inline code that can be executed
independently for testing? Sure, that's breaking it down into functions.

------
typedef_struct
This is a pet peeve of mine. If you made a block of code into a separate
function, I'm assuming it's called from multiple places. Or maybe it used to
be. Or will be soon. But still.

~~~
sickbeard
It's hard to unit test large functions.

------
panic
Some good previous discussion here:
[https://news.ycombinator.com/item?id=8374345](https://news.ycombinator.com/item?id=8374345)

------
nickm12
I'm surprised at the enthusiasm for really long functions here. I find my
experience is just the opposite—I find it much more difficult to read code
written that way than when the different sections are broken up into smaller
functions.

It is of course essential that the smaller functions be well-named and manage
side-effects carefully. That is, they should either be pure functions, or the
side effects should be "what the function does", so that readers of the main
function don't generally need to read the function's code to understand its
side effects.

~~~
mark-r
Yes, a long monolithic block is hard to read. But he's suggesting using
comments and braces to visually separate it into blocks, so the end result is
a happy halfway point between monolithic and broken-down functions.

The intro suggests that he agrees that pure functions are an even better
solution.

------
Practicality
I wonder how much of this change is because he can no longer keep track of so
much state in his head (or just doesn't want to).

I only say this because I've gone through a similar transition of valuing my
mental computation time in the last 20 years of coding :).

The efficiency of inlining is compelling when you code the whole thing at
once, in one session. Once you decide to break the work up over multiple
sessions, it's too much to keep in your head over multiple days (or weeks).

------
qwertyuiop924
Although I am a fan of the LISP school of program design (minimal global
state, build small functions and macros, make sure they work, and than build
more functions and macros on top of that, until you have an abstraction that
you can build your app on), Carmack raises some interesting points: If you're
handling a lot of global, mutable state, you may want to abstract minimally,
so that you can see where that state is mutating, which makes bugs easier to
spot.

Not a bad idea.

~~~
mgregory22
Maybe the key idea is to put all the global state mutations as closely
together as possible, so they can be compared and contrasted as easily as
possible.

~~~
qwertyuiop924
Well, yeah, that's a big part of it.

------
p0nce
About style A vs B vs C:

Robert C. Martin encourages style B because it reads topdown and replaces
comments with names.

------
anotherhacker
Don't we write code for other programmers first--then for the system?

It seems counter-intuitive but in the long run this mentality best serves the
business.

~~~
softawre
Depends if you're working on some LOB application with average programmers or
if you're writing a game that pushes the limits of modern technology.

------
schlipity
>I now strongly encourage explicit loops for everything, and hope the compiler
unrolls it properly.

I get why this is a thing. Sometimes an unrolled loop is faster. But if this
is really an issue, why isn't there a [UnRoll] modifier or a preprocessor or
something that handles that for you?

Something like this:

    
    
      for (int i = 0; i < x; i++;) {
        dothing(x[i]);
      }
    

versus:

    
    
      unroll for (int i = 0; i < x; i++;) {
        dothing(x[i]);
      }
    

Only the compiler / preprocessor would unroll the second one. You have the
best of both worlds with a reduced chance of subtle errors.

~~~
mschaef
This is essentially the thinking behind the 'register' keyword. The idea was
to make it possible to mark which variables were supposed to go in registers
and which could be stored in memory. That may have made sense back in the
70's, but these days, the compiler's heuristic is usually better. This also
applies to the 'inline' keyword. Maybe you're right... but maybe you're wrong
and inlining the function blows the cache, etc.

I think the same logic applies to a putative 'unroll' keyword. Even if it's a
short-term win, the environmental properties that make it a win are likely to
change before the code is retired. To me, that argues for relying on the
heuristic.

One note to this is that MSVC has both the usual 'inline' keyword as well as a
proprietary stronger '__forceinline' keyword. __forceinline overrides the
heurstic and forces the inlining of the function even if the compiler doesn't
agree it makes sense. I can see how that kind of compiler-specific annotation
might be useful tactically. (ie: You've found the compiler to be making the
wrong choice for a specific platform and you wish to overrule.) But not a
full-fledged language keyword...

~~~
Symmetry
And letting the compile decide means you can choose between -Os and -O3 later
depending on how your constraints change. Really for performance they only
keywords you should be using are 'const' and 'static'. Both just tell the
compiler it's free to make certain kinds of optimizations it might not
otherwise figure out that it's allowed to do.

------
hellofunk
One of his definitions of a pure function is one that only has by value
parameters and doesn't change state. Am I correct in thinking that in C++, the
advent of C++11 lambdas allows you to be explicit about this and prevent the
compiler from allowing you to accidentally use variables from outside the
scope of the function's parameters, by writing lambadas (named, if necessary,
like a normal function) with no-capture lists ("[]") which would force you to
work in a more pure style. In C++, what other method might help you enforce
purity?

~~~
pbsd
You can still freely access global variables inside lambdas, with or without
captures.

~~~
hellofunk
I have been testing this, and I'm not convinced. A lambda with no capture at
all (just []) will not compile if it accesses any variable in the scope of the
lambda definition (including globals). I must add [=] or similar to get access
to them. What were you referring to?

The compiler error is clear: "variable 'lam' cannot be implicitly captured in
a lambda with no capture-default specified"

~~~
pbsd
[https://godbolt.org/g/gimahO](https://godbolt.org/g/gimahO)

~~~
hellofunk
Indeed, I was noticing after my comment that the automatic access of globals
is a different behavior than the access to other variables outside the
lambda's scope, but not globals.

Thanks for the example.

------
TickleSteve
Correct me if I'm wrong (I only skimmed it) but this is less about not liking
inlining than having deterministic/time-bounded performance.

These are two separate/orthogonal issues, I doubt he would turn his nose up at
the processor doing less work _iff_ it was also deterministic and had
predictable worst-case timing.

~~~
bluetomcat
I would sum it up as "reducing variability in code paths", i.e. not causing
degraded performance when certain conditions change.

~~~
TickleSteve
yes, but the intention behind that is to have control and knowledge over the
maximum time a function will take.

What he is effectively saying is to treat all code in the same manner as you
would for a hard-real-time system.

I certainly agree with this for performance critical code (performance being
overall duration or latency), but this is not a one-size-fits-all solution.
There are a lot of cases where this is not appropriate.

~~~
ajuc
It depends if we optimize for thtoughput or latency.

~~~
TickleSteve
(I mentioned overall duration or latency which covers your point).

but those two cases are the same.... they're both performance-critical code.

Not all code is.

~~~
ajuc
I would argue in this context "optimize running time" is different while
"optimize latency" and "don't care about performance" are similar.

I mean - if you don't care about total running time it makes sense to remove
special cases that don't change the worst case because of readibility/bugs, no
matter if you care about latency.

------
Animats
Both games and low-level real time systems have one big loop executed at a
fixed rate. That leads to the architecture Carmack describes.

It's not particularly helpful to a server that's fielding vast numbers of
requests of various types.

------
saynsedit
I think Carmack is conflating FP with good abstractions.

Haskell abstractions are often good because they flow from category theory and
there are usually well established mathematical laws associated with them. I'm
thinking of the "monad laws" and the "monoid laws."

Mathematicians tend to create abstractions if the abstraction satisfies
coherent and provable properties. Programmers tend to be less rigorous about
what and how they abstract.

There is nothing about C++ that prevents making good abstractions. It's just
the culture of the language. Industry programmers are taught to not duplicate
code and to keep functions short but they are not taught the fundamentals of
what makes a good abstraction.

------
andy_ppp
Yes, once you understand functional programming you never want to go back to
non-explicit state changes to all of your variables contents without you
knowing or your explicit consent.

~~~
xxs
a snarky remark: 'citation need'

Functional programming ain't a panacea, either.

~~~
andy_ppp
Have you read the article - John Carmack suggest FP and writing pure functions
is better.

~~~
accatyyc
Not really. He suggests that if a function is called from multiple places, one
should try to make it pure to avoid subtle bugs. If called from a single place
it may be better to inline it.

Citation: "I don’t think that purely functional programming writ large is a
pragmatic development plan, because it makes for very obscure code and
spectacular inefficiencies, but if a function only references a piece or two
of global state, it is probably wise to consider passing it in as a variable."

~~~
Malice
Your citation is from his 2007 thoughts.

His 2014 thoughts: No matter what language you work in, programming in a
functional style provides benefits. You should do it whenever it is
convenient, and you should think hard about the decision when it isn't
convenient.

~~~
jbooth
Which is still a very different statement from the way functional zealots
would put it.

Carmack's talking about pure functions at the architecture/design level.
Within those functions, there's still lots of temporary mutable state, I'd be
willing to bet. He's writing graphics code, he's probably not passing
functions to functions in order to sum an array, he'll just do the fast,
iterative thing.

~~~
rtpg
I'm not going to do the whole spiel about performance, but I know a lot of
FPers who wouldn't care about the implementation of map, so long as the
external contracts are kept.

The beauty of functional programming is that it doesn't matter how map works.
So you can make map work as fast as possible through all the techniques you
want, since code can't rely on the behaviour. Only on the input.

~~~
jbooth
Interfaces you supply will necessarily constrain implementation details. So it
winds up mattering.

Passing a function to a function is a bunch of indirection and extra stack
frames(!) compared to updating very small, memory aligned mutable state in-
line with the work you're doing. It's even worse with closures where you're
creating anonymous data structures and passing them around. You can read up on
TLBs and the speed difference between L1 cache and main memory if you'd like
to know more.

You might not care about the above if it's more 'beautiful' to you, but it's
vastly, vastly less performant.

~~~
rtpg
that's if you implement it as a function :)

You could totally implement these things as compile-time macros, or do many
different optimisation passes, or so many other things.

~~~
AstralStorm
The old school of algorithm developers used to introduce hard, explicitly
verified preconditions and postconditions. Typically this is done in C and C++
using assertions.

In general, any dependence on external mutable state should be asserted or
otherwise verified. Those checks can be disabled for performance later. Meshes
very well with actual tests too.

------
corysama
Reposting my comment from the last time this was posted. There was a lot of
nice discussion there:
[https://news.ycombinator.com/item?id=8374345](https://news.ycombinator.com/item?id=8374345)

===

The older I get, the more my code (mostly C++ and Python) has been moving
towards mostly-functional, mostly-immutable assignment (let assignments).

Lately, I've noticed a pattern emerging that I think John is referring to in
the second part. The situation is that often a large function will be composed
of many smaller, clearly separable steps that involve temporary, intermediate
results. These are clear candidates to be broken out into smaller functions.
But, a conflict arises from the fact that they would each only be invoked at
exactly one location. So, moving the tiny bits of code away from their only
invocation point has mixed results on the readability of the larger function.
It becomes more readable because it is composed of only short, descriptive
function names, but less readable because deeper understanding of the
intermediate steps requires disjointly bouncing around the code looking for
the internals of the smaller functions.

The compromise I have often found is to reformat the intermediate steps in the
form of control blocks that resemble a function definitions. The pseudocode
below is not a great example because, to keep it brief, the control flow is so
simple that it could have been just a chain of method calls on anonymous
return values.

    
    
        AwesomenessT largerFunction(Foo1 foo1, Foo2 foo2)
        {
            // state the purpose of step1
            ResultT1 result1; // inline ResultT1 step1(Foo1 foo)
            {
                Bar bar = barFromFoo1(foo);
                Baz baz = bar.makeBaz();
                result1 = baz.awesome(); // return baz.awesome();
            }  // bar and baz no longer require consideration
    
            // state the purpose of step2
            ResultT2 result2; // inline ResultT2 step2(Foo2 foo)
            {
                Bar bar = barFromFoo2(foo); // 2nd bar's lifetime does not overlap with the 1st
                result2 = bar.awesome(); // return bar.awesome();
            }
    
            return result1.howAwesome(result2);
        }
    

If it's done strictly in the style that I've shown above then refactoring the
blocks into separate functions should be a matter of "cut, paste, add function
boilerplate". The only tricky part is reconstructing the function parameters.
That's one of the reasons I like this style. The inline blocks often do get
factored out later. So, setting them up to be easy to extract is a guilt-free
way of putting off extracting them until it really is clearly necessary.

===

In the earlier discussion sjolsen did a good job of illustrating how to
implement this using lambdas
[https://news.ycombinator.com/item?id=8375341](https://news.ycombinator.com/item?id=8375341)
Improvements on his version would be to make everything const and the lambda
inputs explicit.

    
    
        AwesomenessT largerFunction(Foo1 foo1, Foo2 foo2)
        {
            const ResultT1 result1 = [foo1] {
                const Bar bar = barFromFoo1(foo1);
                const Baz baz = bar.makeBaz();
                return baz.awesome();
            } ();
    
            const ResultT2 result2 = [foo2] {
                const Bar bar = barFromFoo2(foo2);
                return bar.awesome();
            } ();
    
            return result1.howAwesome(result2);
        }
    

It's my understanding that compilers are already surprisingly good at
optimizing out local lambdas. I recall a demo from Herb Sutter where
std::for_each(someLambda) was faster than a classic for(int i;i<100000;i++)
loop with a trivial body because the for_each internally unrolled the loop and
the lamdba body was therefore inlined as unrolled.

------
dustingetz
50% of HN comments have misread this! The first few paragraphs mentioning FP
were written in 2014 and are retracting the opinion of the long email about
inlining, which is from 2007

~~~
abritinthebay
Not... quite. He's not retracting so much as saying he's much more positive
about FP now in its ability to solve this issues his 2007 email talks about.

It's not a retraction so much as an expansion of solutions to include (with
caveats) FP.

Also it clarifies some drawbacks to the approach on mobile/limited resource
platforms.

------
hiou
I think this is a great example of something that is different for an
exceptional developer as opposed to an average one.

A developer like Carmack and likely the teams he works with are able to keep a
much larger system in their head at one time than an average developer.

And this is typically why they can write larger functions like that and get
away with it.

A less talented developer will be much more likely to introduce bugs near the
top of that function over time as they struggle to maintain the entire
function in there head.

Sometimes choosing the correct tool has more to do with the craftsman than the
craft.

~~~
ajuc
As an average developer I find it much easier to keep the system in head when
it's not cut into 10-line-long pieces in random order.

Hiding the complexity doesn't make it irrelevant suddenly. That's how you get
code that does the same thing 5 times in 5 different branches of highly nested
call tree "just to be sure".

~~~
yxhuvud
Functions with the correct level of abstraction doesn't hide the complexity -
it categorizes it and put names to the different categories.

------
Kenji
_and I was quite surprised at how often copy-paste-modify operations resulted
in subtle bugs that weren’t immediately obvious._

I noticed this quite some time ago. This is also a major source of bugs that I
write. That is, until I decided to stop copy-pasting more than a word at all,
and retype everything character by character when I need it again.
Interestingly enough, this saves a lot of time because the bugs I would
generate otherwise cost way more time than a bit of typing.

