

Bytecode - acqq
http://blog.mecheye.net/2012/12/bytecode/

======
brian_cloutier
Interestingly enough, you don't send Bitcoin to addresses. You send Bitcoin to
a script of your design, and whoever can provide an input which causes that
script to evaluate true is allowed to spend those coins.

This was originally meant to allow you to come up with all sorts of fantastic
transaction types, such as coins which need M of N signatures in order to be
released, or a Kickstarter clause that only releases the coins if a certain
amount has been pledged.

But, after the scripts had caused enough security vulnerabilities they were
severely restricted. Clients will now only accept transactions which include
one of the standard scripts.

~~~
rictic
Yeah, this makes me sad. I hope that someone takes a hard look at the bytecode
and repairs the situation. Transactions as programs is kinda brilliant.

------
lambda
I would say that Java is a more common bytecode than ACPI. Sure, if you just
consider desktop or laptops, ACPI will be more common, but feature phones all
generally include a Java VM, and the baseband on smartphones generally do too.
For that matter, SIM cards themselves include a Java Card interpreter, which
interprets a stripped down version off JVM bytecode. Your phone may contain
two or three different variants of JVM just to make a phone call.

Furthermore, exactly what you consider to be a bytecode vs a machine language
can be a bit of an open question. After all, Intel CPUs don't actually execute
the x86 instruction set directly, the execute microcode which translates the
instruction set into the actual instructions that the CPU executes. So you
could say that x86 is the ultimate bytecode. And hey, for a while Mac OS X had
PowerPC emulation support, and before that System 7 has 68k emulation support.
On the other hand, people have implemented Java in hardware, so today's
bytecode may become tomorrow's machine language, and vice versa.

 _edit_ : Can whoever downvoted please provide an explanation? This comment is
on topic and polite; if you disagree, please explain, as I would be interested
to know why. If there's something that's incorrect, a correction would be
appreciated.

~~~
Jasper_
I don't really care which bytecode is the "most popular". I just thought it
would be cool to introduce people to some specified bytecode running inside
subsystems that you probably didn't think anything about before.

~~~
cturner
Jasper that was a fabulous start to the morning. A morsel of hacker news on
hacker news. Thanks.

Would be interested in your thoughts about code generation -

I'm writing a VM, playing with ideas. I have wondered at this as an approach
to software development: whenever you have a significant task to do, first
build a virtual machine. Then create bytecode to satisfy your application.

You can have a rich instruction set to meet your needs - writing performant or
hardware-oriented features in C, but getting easy access to them through your
upstream high-level language. Highly portable, no library dependencies.

I'm fine at hand-editing bytecode, but code generation from a high-level
language is still a mystery to me. I want to find a notation that gives me
enough power to deal with high-level concepts, but for which it is easy to
write a compiler to bytecode.

Currently options in mind: scheme (lots of resources, but might be too
complicated - can tail recursion be done simply? adequate GC?); forth; some
subset of C; something fancy with ometa.

Or, could I just write scheme functions to output machine code, and build my
application logic in macros that on top of that. This bypasses the need for a
conventional compiler.

~~~
noahl
I like the idea of scheme functions to output machine code. If you know
scheme, you're probably already familiar with this, but since you say code
generation is a mystery, let me recommend SICP
([http://mitpress.mit.edu/sicp/full-
text/book/book.html](http://mitpress.mit.edu/sicp/full-text/book/book.html)).

~~~
cturner
I scratch around at SICP every few months, and generally get stuck because
there's a lot of assumption of mathematics knowledge in there that I don't
have. But I started to watch the MIT lectures just last weekend. I'll keep at
it, sounds like I'm on the right track. Thanks :)

------
Erwin
The BIOS in old Sun machined used to run some kind of Forth interpreter which
would run code from your expansion cards to initialize them (not sure if that
was meant to make this cross-platform/cross-architecture).

I vaguely recall trying to get an old SPARCstation to boot and figuring out
how to work the Forth shell (which is similar to what Grub is now) --
[http://en.wikipedia.org/wiki/Open_Firmware](http://en.wikipedia.org/wiki/Open_Firmware)

That was one of the first "small factor" pizza box machines:
[http://en.wikipedia.org/wiki/Pizza_box_form_factor](http://en.wikipedia.org/wiki/Pizza_box_form_factor)

~~~
mikeash
Macs used this for years as well. You could boot into the Forth interpreter
and do interesting things.

Amusingly, many of the older models with Open Firmware had no display drivers
in the interpreter, so while you could start it, you had to talk to it through
a serial port rather than using your keyboard and screen.

~~~
trurl42
The newer models had display drivers. You could write write small graphical
programs for them, like an animated version of the Towers of Hanoi:
[http://www.kernelthread.com/projects/hanoi/html/macprom-
gfx....](http://www.kernelthread.com/projects/hanoi/html/macprom-gfx.html) I
remember running that that on my iBook.

------
delroth
RAR archives also contain a virtual machine used to implement custom
compression filters: [http://blog.cmpxchg8b.com/2012/09/fun-with-constrained-
progr...](http://blog.cmpxchg8b.com/2012/09/fun-with-constrained-
programming.html)

~~~
est
WinRAR could also display ANSI color fonts. :D

------
majika
This is related to Meredith Patterson's talk at the 28th CCC, "The Science of
Insecurity":

[https://www.youtube.com/watch?v=3kEfedtQVOY](https://www.youtube.com/watch?v=3kEfedtQVOY)

> Why is the overwhelming majority of common networked software still not
> secure, despite all effort to the contrary? Why is it almost certain to get
> exploited so long as attackers can craft its inputs? Why is it the case that
> no amount of effort seems to be enough to fix software that must speak
> certain protocols?

> The answer to these questions is that for many protocols and services
> currently in use on the Internet, the problem of recognizing and validating
> their "good", expected inputs from bad ones is either not well-posed or is
> undecidable (i. e., no algorithm can exist to solve it in the general case),
> which means that their implementations cannot even be comprehensively
> tested, let alone automatically checked for weaknesses or correctness. The
> designers' desire for more functionality has made these protocols
> effectively unsecurable.

~~~
comex
Although interesting, that talk highly exaggerates its claims. There _is_
certainly a strong correlation between power exposed to file formats and both
likelihood of bugs and exploitability, and reducing that power is certainly a
good idea, but such protocols are far from "effectively unsecurable". It's
certainly possible to create a safe bytecode parser and even formally prove it
correct with automated tools, and while length fields are easier to get wrong
than simpler formats, this is mostly caused by C integer and pointer
computations being so easy to mess up, and the problems could be effectively
solved with little overhead by using bigints and checked pointers inside
parsers - a matter of engineering, not computer science.

------
jackhammer2022
Google Cache link:
[http://webcache.googleusercontent.com/search?q=cache:9eL0cMo...](http://webcache.googleusercontent.com/search?q=cache:9eL0cMog-
iIJ:blog.mecheye.net/2012/12/bytecode/+&cd=1&hl=en&ct=clnk&gl=us)

~~~
Jasper_
Yeah, sorry about that! Didn't expect this post to take down my blog. I'm
working on getting it back up.

EDIT: OK, it should be back up now!

~~~
dmm
In the article you claim that "the BSDs" use the Intel reference ACPI
implementation. OpenBSD wrote their own, as is their wont. This is cool
because it's the only free implementation that I know of that is independent
of the Intel one.

[http://www.openbsd.org/cgi-
bin/cvsweb/src/sys/dev/acpi/](http://www.openbsd.org/cgi-
bin/cvsweb/src/sys/dev/acpi/)

~~~
tptacek
Didn't FreeBSD also implement their own?

~~~
Jasper_
Nope:
[http://svnweb.freebsd.org/base/stable/10/sys/contrib/dev/acp...](http://svnweb.freebsd.org/base/stable/10/sys/contrib/dev/acpica/)

------
zdw
Regarding ACPI specifically, not everyone uses the Intel developed one -
Microsoft supposedly implemented their own, as well as OpenBSD (See section
3.2 and on of this:
[http://www.openbsd.org/papers/zzz.pdf](http://www.openbsd.org/papers/zzz.pdf))

ACPI is notoriously broken in many places - the OpenBSD dev's frequently had
to do a lot of "bug for bug" hacks to talk to the hardware just how Windows
did, in order for things to work.

~~~
tedunangst
My understanding is that the windows acpi code is actually very similar to or
derived from the intel code.

As for bug for bug compat, that's really more an issue of broken bios. I.e.,
the acpi byte code is broken, not the implementation that interprets the byte
code. Windows doesn't necessarily do anything crazy, it's the bios that asks
"is this windows?" And then shits itself if the answer is no.

------
lakwn
When you have bytecode, you have programs that run using them. I understand
how these are a good thing in fonts or in PDF, but what is running i this ACPI
machine the author described?

~~~
sigil
The kernel. On linux for instance, this

    
    
        $ ps aux | grep acpi
    

turns up the following kernel process on my machine:

    
    
        root 663 0.0 0.0 0 0 ? S< Oct29 0:00 [ktpacpid]

~~~
panzi
I think the question is the reverse: not what program implements the VM but
which programs run on it and what are they doing? at least that is what I'm
asking.

------
pjmlp
A few more examples:

\- OS/400 user space is also bytecode based, JIT compiled on first run or
installation time.

\- Inferno userspace applications coded in Lingo

\- Native Oberon has implementations with the kernel modules were AOT and the
remaining modules are JITed on load

\- Lillith (Modula-2 workstation)

~~~
acqq
Before Modula and Oberon there was UCSD Pascal
([http://en.wikipedia.org/wiki/UCSD_Pascal](http://en.wikipedia.org/wiki/UCSD_Pascal))
(1978) which had its own p-code machine
([http://en.wikipedia.org/wiki/P-code_machine](http://en.wikipedia.org/wiki/P-code_machine))
to which all the code was compiled.

Microsoft also used p-code ([http://en.wikipedia.org/wiki/Microsoft_P-
Code](http://en.wikipedia.org/wiki/Microsoft_P-Code)) to reduce the code
footprint in order to fit more of the big applications in the RAM which was
limited then.

When we're still at Microsoft:
[http://en.wikipedia.org/wiki/Windows_Metafile_vulnerability](http://en.wikipedia.org/wiki/Windows_Metafile_vulnerability)
"the underlying architecture of such files is from a previous era, and
includes features which allow actual code to be executed whenever a WMF file
opens. The original purpose of this was mainly to handle the cancellation of
print jobs during spooling"

~~~
pjmlp
Yes, regarding P-Code, the funny thing is that Wirth originally planned to use
it as a means to botstrap the compiler which would then be used to compile one
that could generate native code, not as final execution medium. :)

------
mtdewcmu
I never thought of bytecode as its own class with its own identity before. I
guess you could say that a lot of the web runs on bytecode. Maybe 99% in fact,
if you include bytecode that was passed through a JIT.

~~~
panzi
> I never thought of bytecode as its own class with its own identity before.

What do you mean? How did you think of it before?

~~~
mtdewcmu
Hmm. Like codes in bytes, I guess, as a way to pick low-hanging fruit in an
interpreter. It just never occurred to me to think much of the bytecodes
themselves as the most salient piece. Perhaps bytecode is the best word to
describe anything non-native that runs? I'd normally want to say interpreted,
but I suppose that technically would exclude anything running in a JIT.
Although, where something happens to be running isn't an intrinsic property of
that thing. On the other hand, whether a language compiles to bytecode or gets
directly interpreted is not a static feature of that language, either. Maybe
bytecode is the best way to split the difference. Is this the birth of a new
bit of language? You heard it here first.

~~~
panzi
What do you mean by "gets directly interpreted"? Interpreting the AST? Writing
a bytecode interpreter is pretty easy and can run much faster than a tree
interpreter, especially when you use gcc extensions (taking addresses from
labels and filling a jump table with it, which eliminates the long "if ...
else if ... else if ..." like code a switch statement creates).

