
Slim Binaries (1997) - tosh
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.1711&rep=rep1&type=pdf
======
EdwardCoffin
This reminds me of a technique used by another language I read about in the
early nineties, used by a language called (I think) 'C@T' (don't bother
Googling that: I found nothing relevant, and the results seem to be mostly
NSFW).

The technique was instruction interleaving. They supported several very
different architectures from _one_ platform-independent binary. I think they
were x86, 68k, MIPS and SPARC. Each executable started with a boilerplate
header: a bit of black magic which worked on all of the platforms (more on
that below). When execution emerged at the end of the header, the instruction
pointer was pointed at the entry point appropriate for the architecture it was
executing on. Each entry point (one for each architecture) made sure that the
instruction pointer would always jump over parts with the other architectures,
to the next applicable instruction in the correct architecture.

The boilerplate header worked something like this: the first few bytes, when
interpreted on one particular architecture, would jump to the address
immediately following the header, which was the starting point for _that_
architecture's thread-of-control. All other architectures interpreted that
instruction as something that was effectively a no-op for the purpose of flow
control (like maybe added two registers into another register). Consequently,
the next instruction would only be seen by one of the remaining architectures,
and a similar process would peel off the remaining architectures, one-by-one,
directing each to the appropriate real entry point for that architecture. As I
recall they did not do something simple like have monolithic blocks of code
separated by architecture: if there was a loop, the body of the loop had
interleaved instructions for all architectures. I'm sure this was in the days
before instruction caches were a thing.

I think they were able to get some space savings with this, at least by
sharing constant data segments (maybe). At the time I read it I did not
appreciate some of what they did, and now I could but I can't find the article
I read nor any papers about it. I probably read it in _Computer Language_ or
_Dr. Dobbs_ or something like that, probably in the early nineties. If anyone
else remembers this and can point me to a source, I'd love to revisit it. I
think it _might_ have been a project out of AT&T labs.

~~~
jandrese
> As I recall they did not do something simple like have monolithic blocks of
> code separated by architecture: if there was a loop, the body of the loop
> had interleaved instructions for all architectures. I'm sure this was in the
> days before instruction caches were a thing.

The first part of your post sounded pretty clever and workable, but this part
sounds completely insane. Managing all of the side effects for all of those
other architectures constantly would have been an never ending nightmare.
Keeping separate blocks and only sharing data segments seems like the only way
to keep your sanity.

~~~
kazinator
Well, there are never going to be side effects for the other architectures,
because those parts simply can't be touched. Code for the i386 only jumps to
other i386 code. Anything that is not i386 code might as well be donkey
droppings. If we think about how sequences of NOP instructions are routinely
added to machine code for cache-aligning the start of a block, or how non-code
data such a string literals is weaved into the middle of code, it doesn't seem
far-fetched at all.

~~~
jandrese
That makes even less sense. Was the code flow like:

    
    
      x86_instruction
      jump +6
      mips instruction
      jump +6
      sparc instruction
      jump +6
      68k instruction
      jump +6
    

Not separating the code out by blocks seems unworkable to me.

~~~
kazinator
Sounds like it was like:

    
    
         block
         of
         x86
         code
         je L43
         more
         x86
         code
         jmp L57
    
       L33:
         sparc
         code
         never
         mind
    
       L43:
         x86
         code
     
       L37:
         sparc
         never
         mind
    
       L57:
         x86
         code
         again
    
    

That kind of thing: blocks of code mashed together, but everything is
correctly generated to jumps to its own kind.

In regular compiler-generated machine code, we already have "foreign" blocks
of stuff in the middle of the instructions, such as string (and other)
literals, and computed branch tables. The generated code doesn't accidentally
jump into these things. This is kind of the same: all the other architecture
stuff is just a literal (that is not referenced). From the x86 POV, the stuff
at L33 and L37 above is just data.

------
sedatk
I was thinking to myself “why did these guys talk about inventing another
Java, like it was a new thing?” then it hit me that the article was written in
1997. I mean Java existed back then but HotSpot JIT didn’t.

~~~
lnanek2
Yeah, this part: " We have implemented a system taking a wholly different
approach that we call “slim binaries.” The object files in our system do not
really contain native code for any particular processor at all, but a highly
compact, architecture-neutral intermediate program representation "

is basically the definition of Java byte code. Wikipedia search of the author
seems to say he came up with it first, though: " His doctoral dissertation,
entitled "Code Generation On-The-Fly: A Key To Portable Software"[11] proposed
to make software portable among different target computer architectures by way
of using on-the-fly compilation from a compressed intermediate data structure
at load time. Two years later, the Java programming language and system were
launched and took this idea mainstream, albeit using the term "just-in-time
compilation" instead of the term "on-the-fly compilation" that Franz had used.
"
[https://en.wikipedia.org/wiki/Michael_Franz](https://en.wikipedia.org/wiki/Michael_Franz)

~~~
eterps
> is basically the definition of Java byte code

Except that the slim binaries concept is not based on byte code at all, but
more like a Huffman compressed abstract syntax tree. Not only are the binaries
smaller, they can potentially be optimised further because no semantic
information is lost.

See also this blog post for more details: [https://hokstad.com/semantic-
dictionary-encoding](https://hokstad.com/semantic-dictionary-encoding)

Original website on Wayback machine:
[https://web.archive.org/web/20020617132121/http://caesar.ics...](https://web.archive.org/web/20020617132121/http://caesar.ics.uci.edu/oberon/research.html)

~~~
kragen
The Self bytecode was pretty close to an abstract syntax tree, though not
huffman-compressed. How is it that nobody has mentioned Self yet in this
thread? Self was doing JIT compiling from a platform-independent bytecode in
the late 80s and getting within a factor of 2 of C performance by the early
90s.

------
Scaevolus
Microsoft Office used a vaguely similar VM-based technique in the 90s, called
P-Code. Cold functions in Office were compild to a space-efficient stack-based
bytecode, minimizing how much native code had to be written and validated for
each architecture.
[https://web.archive.org/web/20010222035711/http://msdn.micro...](https://web.archive.org/web/20010222035711/http://msdn.microsoft.com/library/backgrnd/html/msdn_c7pcode2.htm)

~~~
acqq
Not similar at all, if you'd compare what which did. The p-code is practically
assembly for a virtual CPU which uses more or less the same resources that
assembly code for the "real" CPU would do, only does that with less bytes per
instruction.

------
jacinabox
In the early aughts I was kicking around an idea for minimization of
binaries/libraries. The idea was to take a compiled binary plus debug
information, and use disassembly to work out which code paths are reachable
from a selected set of entry points (say the ones used by your application).
Then code paths which are considered "dead" would be zeroed out, preparing a
minimized binary.

~~~
thefreeman
Don't most compilers already do this?

~~~
saagarjha
Yeah, isn't this just link-time optimization?

~~~
throwaway542134
It's called dead code elimination/optimization and compilers can do it without
LTO, with LTO you can can get more savings but ymmv.

~~~
saagarjha
The original comment pointed out "entry points", so I assumed that they were
talking about libraries rather than standard dead code elimination (which has
been a thing for quite some time).

------
NieDzejkob
Please add [pdf] to the title next time.

> If you submit a link to a video or pdf, please warn us by appending [video]
> or [pdf] to the title.

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

------
rbanffy
How architecture neutral is llvm's intermediate representation?

It's interesting how not having the source code drove technology in the 90's.

~~~
saagarjha
LLVM IR is relatively agnostic, but not so much so that you can use it to
recompile for another architecture (unless you take effort to design an entire
ABI to work around the platform-specific things that creep into it, as Apple
did with arm64_32).

------
saagarjha
> And since each fat binary usually contains only a single code image per
> supported architecture, rather than separate versions for different
> implementations of each architecture, fat binaries still do not solve the
> problem of intra-architectural hardware variation.
    
    
      $ lipo -info /System/Library/Frameworks/CoreFoundation.framework/CoreFoundation
      Architectures in the fat file: /System/Library/Frameworks/CoreFoundation.framework/CoreFoundation are: x86_64 x86_64h

~~~
juliangoldsmith
Haswell wasn't quite out yet when the article was written, so that was true at
the time.

~~~
saagarjha
Yeah, I know. Just pointing out that this is no longer true.

------
dleslie
PDF is invalid and won't load.

~~~
aloknnikhil
Works fine on Safari for me.

~~~
dleslie
Failed on Drive PDF on Android

~~~
stronglikedan
Acrobat (Pro) isn't complaining. Looks like Drive PDF is what's invalid and
won't load (the valid PDF).

