
Link-time code generation invalidates classical assumptions about linking - luu
http://blogs.msdn.com/b/oldnewthing/archive/2014/06/06/10531604.aspx
======
skybrian
We call this "tree shaking" when compiling to JavaScript. The biggest downside
is compile time. Your linker is doing more work and it can't easily be
distributed.

~~~
gioele
For those like me that do not know the term "tree shaking", it is «a technique
to "shake" off unused code» [1].

I suppose it is dead code removal done at link time. But what is "link time"
in JavaScript?

[1] [http://blog.sethladd.com/2013/01/minification-is-not-
enough-...](http://blog.sethladd.com/2013/01/minification-is-not-enough-you-
need.html)

------
4ad
Hey, they rediscovered the way Plan 9 C compilers work: [http://plan9.bell-
labs.com/sys/doc/compiler.html](http://plan9.bell-
labs.com/sys/doc/compiler.html)

------
panzi
The assumptions with inlineing virtual functions seem flawed to me. What if
the program loads a plugin that defines (and returns an instance of) a class
that overloads said function?

~~~
ksherlock
If you follow the link
([http://blogs.msdn.com/b/oldnewthing/archive/2012/08/31/10345...](http://blogs.msdn.com/b/oldnewthing/archive/2012/08/31/10345196.aspx)),
they're optimizing undefined behavior. Proper code won't be overly optimized.

------
na85
I dread the day someone figures out a way to weaponize this in practical
fashion. I'd be astounded to learn that not-very-practical attacks don't
already exist.

~~~
klodolph
You can't really weaponize this, because if you're controlling the source code
to a program written in C or C++ on a victim's computer, you've basically
already won.

What I'd expect is that some loads / stores get optimized out across modules,
causing bugs in multithreaded programs that were living the high life in x86
memory model land. Of course, these programs probably already have other bugs.

~~~
vetrom
To be honest, I'm not so sure about that. Two attack surfaces I can think of
off the top of my head where that could be leveraged are dynamic plugin
loading, a-la old school ActiveX, and LD_LIBRARY_PATH attacks on ld.so based
platforms.

One might say that both of these vectors are obsolete, but anytime you have a
dynamic coding situation, that is one more term which is added to the equation
of how one can interact with a system, regardless of its legitimacy.

I'm reminded of this example:
[http://archive09.linux.com/feature/42031](http://archive09.linux.com/feature/42031)

Or perhaps another example involving attacks which use many vectors, a dual
action bytecode verifier & x86 shellcode attack:
[http://www.securityfocus.com/blogs/746](http://www.securityfocus.com/blogs/746)

If you can convince any piece of code to do the unintended, that can be a
constructive attack vector. Sure, we can harken back to the old days when code
only did simple parsing and stayed away from data parsed in multiple contexts.
I doubt that will be a sellable market proposition in the age of the Internet
of Advertising, however!

~~~
phaedrus
I think you are confused between what linking and DLL loading are. They are
different things. Linking is a step done during program development on object
code by a compiler toolchain program to produce an executable. DLL loading is
a step done by the OS on a user's machine at runtime to load an executable.
Most non-developer Windows user machines will not even have a linker installed
on the system.

------
MaysonL
This reminds me a little bit of Michael Franz's PhD thesis, which explored his
implementation of _load-time_ code generation for multiple architectures.

------
michaelfeathers
You can call that linking, but it's not linking.

~~~
jerf
Terminology warfare is so tedious. Whatever it is, it isn't going to flip one
bit of it for us sitting here and arguing about what word best describes it,
and the space of possible practical compilers is so, so much larger than the
space of words we might have for them that it's not even funny. That is to
say, it's not like there _is_ a word for this, so why argue about whether it's
"linking"? (It's not like looking at a functional language and calling it OO,
where even if the all the terms involved are fuzzy, it's still clearly wrong.)

If compilers were somehow first developed on machines with the resources we
have today, I bet we wouldn't even have a "link" step in practice. It's an
artifact of having to break compilation up into little units that fit into
small-numbers-of-kilobytes. Linkers inevitably fading into the background of
just general "partial compilation" and "final compilation" steps is pretty
inevitable, and that only as a form of caching the compilation rather than any
sort of theoretically necessary step.

~~~
michaelfeathers
> That is to say, it's not like there is a word for this

I'd argue that there is a word, er, _term_ for this - code generation. It's
the last phase of compilation. It doesn't matter whether the linker does it.

In fact, I'm not sure there even is a real identifiable linker in this scheme.

