
Understanding V8’s Bytecode - kiyanwang
https://medium.com/dailyjs/understanding-v8s-bytecode-317d46c94775
======
tyingq
Looks like Wikipedia's V8 page needs an update.

[https://en.m.wikipedia.org/wiki/Chrome_V8](https://en.m.wikipedia.org/wiki/Chrome_V8)

 _" V8 compiles JavaScript directly to native machine code before executing
it, instead of more traditional techniques such as interpreting bytecode"_

~~~
amelius
V8's central documentation could use some work too. Why do we need to read
about internals in a Medium post first?

~~~
hyperpape
Which languages have thorough first-party documentation of how the runtime
works in an easily accessible form that's available to the public? I don't
think Python or Ruby do. Lua maybe, but that's "cheating" because the runtime
is small enough to comfortably read through.

It seems to me like Hotspot has more resources and better documentation of its
architecture than most, but when it comes to things like inlining policies,
the specific compiler optimizations that happen, safepoints, etc. it's mostly
a matter of tracking down random blog-posts, most of which come from third
parties.

Edit: Python is better than I remember.

~~~
amelius
Check the documentation of Mozilla's Spidermonkey in comparison:

[https://developer.mozilla.org/en-
US/docs/Mozilla/Projects/Sp...](https://developer.mozilla.org/en-
US/docs/Mozilla/Projects/SpiderMonkey/Internals)

~~~
johncolanduoni
Wow! The last time I looked at the SpiderMonkey API documentation (a few
months ago) it was like reading the rings on a tree, so much so that I gave up
and went with V8 even though there were actively developed SpiderMonkey
bindings for the language I was using (Rust). Nice to see things have improved
greatly!

------
CalChris
I don't really like this article.

 _Bytecode is an abstraction of machine code._

No, it isn't an _abstraction_ if a few paragraphs later the author gives
_SuspendGenerator_ as an example bytecode. Abstraction is the _wrong_ word
here. A bytecode is a machine instruction for a virtual machine. In this case,
the V8 bytecode is a _higher level ISA_ than is for example, ARMv8; that would
be more accurate. System 38 MI is a higher level ISA as well.

 _Ignition, the interpreter, generates bytecode from this syntax tree._

No, interpreters don't generate bytecode. They interpret bytecode; that's why
they're called interpreters. Code generators generate code; that's why they're
called code generators. However that said, this must be a strange description
internal to the V8 group since it shows up here as well:

[https://v8project.blogspot.com/2016/08/firing-up-ignition-
in...](https://v8project.blogspot.com/2016/08/firing-up-ignition-
interpreter.html)

You can generate code, bytecode, directly from an AST. You can do that. That
is probably what is happening and what is meant here. On the other hand, LLVM
converts an AST into an SSA IR and optimizes then lowers that. V8 is a JIT
pipeline and LLVM is a traditional compiler pipeline.

~~~
dahart
> No, interpreters don't generate bytecode. They interpret bytecode; that's
> why they're called interpreters. Code generators generate code;

FWIW, personally I prefer the article's idea of interpreters over yours, and I
speculate the article's is more representative of what most people think. I've
never heard interpreters described the way you just did, as the part that
interprets bytecode. (But I did find discussion of "bytecode interpreters" in
the WP article, link below).

Normally, an interpreter is a complete alternative to a compiler. Usually, an
interpreter is more or less synonymous with a programming language "shell", or
even the whole programming language. The Python shell would be an interpreter,
for example.

In my mind, virtual machines are what you call the thing that executes
bytecode, not interpreters. I'd assume that someone talking about an
"interpreter" without any other qualification was talking about a program that
reads & runs a another program written in a scripting language.

As time passes, these lines are getting blurrier. V8 is liberally mixing
concepts from compilers and interpreters, so it's dangerous to draw hard
lines. But you might be interested to double-check the broad strokes.

[https://en.m.wikipedia.org/wiki/Interpreter_(computing)](https://en.m.wikipedia.org/wiki/Interpreter_\(computing\))

> Abstraction is the _wrong_ word here.

The bytecode binary runs without modification on multiple hardware platforms.
It's pretty reasonable to call that an abstraction of machine code. Because it
is.

~~~
CalChris
> Normally, an interpreter is a complete alternative to a compiler.

Yes, that would be the Bill Gates MS Basic or Forth sense of interpreter. But
it wouldn't be the _post parsing, here 's an AST, now what do we do with it_
sense. If they called V8 an interpreter (they call it an engine) I could see
that although it would still seem archaic usage. But they're stitching
Ignition on the side of V8 and calling that an interpreter which is just
strange.

> The bytecode binary runs without modification on multiple hardware
> platforms. It's pretty reasonable to call that an abstraction of machine
> code.

No, that's portability onto multiple platforms. But the article said _Bytecode
is an abstraction of machine code_. 'Abstraction of' vs 'runs on', these are
two very different concepts.

~~~
dahart
> Yes, that would be the Bill Gates MS Basic or Forth sense of interpreter.
> But it wouldn't be the post parsing, here's an AST, now what do we do with
> it sense.

It's also the JavaScript, Python, Haskell, Perl, Ruby, bash, etc. etc. etc.
sense of the word interpreter. You can use it to describe _all_ languages that
aren't compiled. You said interpreters "interpret bytecode". You claimed that
a bytecode interpreter was the broadest definition of interpreter, and that
saying the interpreter is what reads source JavaScript and turns it into
bytecode was wrong. I disagree, and Wikipedia does too. Many, perhaps even
most interpreters don't even involve bytecode at all. You're entitled to your
opinion, but I guess just expect some pushback if you're going to try to
correct people using your narrow, uncommon idea of what makes an interpreter.

> No, that's portability onto multiple platforms.

Normally when I'm talking about code with other people, "portable" means code
you can re-compile for a platform, not that the binary runs without
modification. But, sure, I'd agree that it's reasonable to call bytecode a
mechanism for portability. The way it does that is by abstracting away a
specific platform's assembly language in favor of one that runs on multiple
platforms. People call programming languages like C & Python an abstraction of
machine language. JavaScript's bytecode is just that - it's a low level
programming language. And as such, it's abstracting the hardware. Virtual
machines are an attempt to abstract the CPU specifics and have code that runs
anywhere.

> 'Abstraction of' vs 'runs on', these are very different concepts.

I don't know what your definition of abstraction is, but based on your
objections, it feels like you have a narrow and rigid idea that perhaps
doesn't match the common usage.

A function that takes a parameter is an abstraction of a block of code, just
like any programming language that runs on multiple platforms is an
abstraction of a specific CPU or machine language.

~~~
CalChris
SICP uses a John Locke quote to define _abstraction_ :

    
    
      The third is separating them from all other ideas
      that accompany them in their real existence:
      this is called abstraction
    

Oxford says: _Freedom from representational qualities in art._ You can't say
that and then at the same time say, oh and it fits in a byte. You can't say
that a function abstracts some code.

You could say that an interface abstracts a module. Or as _Principles of
Computer System Design_ says:

    
    
      The separation of the interface specification of a module
      from its internal implementation so that one can understand
      and make use of that module with no need to know how it is
      implemented internally
    

BTW, Wikipedia is awesome but I'm hella not going to take their description on
this. H+P, SICP, DragonBook, ... some primary source but _not_ Wikipedia. The
above mentioned POCSD says:

    
    
      The abstraction that models the active mechanism performing
      computations. An interpreter comprises three components:
      an instruction reference, a context reference,
      and an instruction repertoire.
    

MS-BASIC doesn't fit that definition.

~~~
dahart
Hey I'm sorry you didn't like the article. I hope you can find some reading
you do like, and comment on that instead.

I love SICP, but I feel like you mistook an example of one kind abstraction as
the definition of all abstraction.

A function does abstract code. A programming language does abstract machine
language. If you feel otherwise, good for you, but you've confirmed my
suspicion that your definition is far removed from common usage. I hope that
helps you understand all the pushback on your first comment, but otherwise I
have nothing else to add. I don't want to argue over what abstraction is, we
are already too far away from any specifics that matter.

~~~
CalChris
Thanks. I follow V8+WASM sort of closely and there are things I like there.
But the Iron Rule of HN is that if you criticize, you will suffer! That's OK.
I think it's important every once in awhile to plant your flag in the ground
and make sure you can defend it; not to the point of trolling but do I really
understand my opinion. I'll admit that my notion of interpreter is perhaps
historically limited and that there have been others. In any case, I
appreciate the back and forth. I learned things.

------
pitaj
I'd love to see a tutorial on how a certain optimization pattern is coded into
the compiler.

------
Annatar
If they have all the plumbing, why are they forcing us all to go through the
virtual machine?

Why didn't they finish the job and produced a proper compiler which generates
straight machine code, like a C compiler does?

~~~
bobinatorino
Seriously though. Omitting this detail seems strange; providing an actual
compiler seems like such an obvious thing to do -

1\. In the browser, even if you can't allow raw machine code (for security
reasons), you could at least compile your V8 bytecode into LLVM IR and then
into WebAssembly. It would make front end code way faster running (b/c of
compiler optimizations on the ENTIRE code base) and loading (smaller sizes,
probably). 2\. On the back end, you can compile straight to machine code
instead of V8 compiling stuff at runtime. It would certainly be faster, as
that compilation is happening in advance versus at runtime.

There has to be some reason why this isn't being done. Perhaps the information
at runtime is required, and a proper compiler would be too slow? Maybe there
would be no major performance benefit? I could be wrong about the above two
ideas. The V8 team has a ton of really smart people, so I can't imagine they
haven't already considered this.

------
tmzt
I glanced at the title late last night and thought it said VB bytecode, as in
PCode.

