

Python-based implementation of Notch's DCPU-16 - jtauber
https://github.com/jtauber/dcpu16py

======
binarycrusader
For the love of pete; please put a copyright notice and license on your code!

Even if you want to use the same license as notch, you must explicitly specify
that, otherwise, no one can really use your code.

~~~
guelo
Doesn't no copyright notice mean that it's in the public domain?

~~~
eridius
No. Copyright is implicit. Something can only be in the Public Domain if it's
explicitly placed there (and supposedly, not all countries even recognize
Public Domain, hence the creation of the WTFPL[1]).

[1] <http://sam.zoy.org/wtfpl/>

~~~
thristian
Actually, something can only be in the Public Domain if it it's not covered by
copyright, or if its copyright has expired; saying "I put this in the public
domain" might have no effect whatsoever.

I don't know of any countries that don't recognise public domain (that is,
implement eternal copyright).

~~~
rmc
You could release it under a copyright licence that is de facto the same as
the public domain. It's "copyrighted to you", but anyone can do anything on
it.

------
look_lookatme
What is a good resource for understanding how all of this works? Books, site,
tutorial, minix usenet threads... I don't care, just something to get me
started.

edit:

Assuming I'm a standard, competent web dev-ish Ruby, Python programmer with
little experience below those languages.

~~~
hendzen
First read K&R C and get comfortable writing C programs. Then pick up the
latest edition of Patterson & Hennessy and learn the MIPS ISA. It's a very
simple instruction set that avoids most of the pedagogical distractions
imposed by x86 or some other more complicated architecture.

~~~
tptacek
Second Hennesy & Patterson. Read it 12 years ago, still remember it vividly;
everything made more sense afterwards.

It's expensive though; you might do just as well with a used older edition.

A somewhat perverse alternative approach: get a book on how debuggers work
(like "How Debuggers Work").

~~~
hendzen
I actually have the pleasure of taking a course with Patterson this semester.
He's a great lecturer and he manages to keep the (sometimes very dry) material
very interesting.

------
tzs
Looks like implementing this thing is becoming the new national passtime, at
least based on the number of different implementations that have been
discussed on HN in the last day.

I wonder how long till someone implements it in minecraft?

------
dbh937
There should just be one thread for all the dcpu-16 implementations

------
krollew
I wonder why so many started to implement VMs for that processor. Just
impatience when his next game will be released?

~~~
krasin
implementing a VM is a game itself

~~~
krollew
There are so much things to implement. Why VM from game that is yet created?

~~~
baq
it's something you can do in one evening.

~~~
krollew
OK, that make sense. :)

------
jmpeax
Waiting for a glsl implementation.

~~~
jlawer
I am fairly sure notch would have one... if he is planning on running this,
then I am going to imagine that he is basically going to have massive farms of
GPGPUs planned for doing this emulation.

~~~
jerf
You could try to run a simulation of branch-heavy code on your hardware
optimized for branchless number crunching, but you're probably better off
compiling the opcodes into something that you can then use to simulate branch-
heavy code on your branch-heavy-code-optimized hardware.

~~~
jlawer
Good Point sir....

But I was kind of imagining that you could essentially emulate a single
virtual system per "Stream processor" or whatever they are labeling the basic
units. I was factoring that they could run a couple of hundred "virtual cores"
per card despite the fact they weren't that optimized. But I will be the first
to admit to not really knowing the details.

The other option of course is something like intel's Knights Corner
architecture, which wouldn't pay such a penalty on performance for branching.

~~~
lucian1900
Branching used to be done by turning the memory of the cores that failed an if
test to read-only and just letting all of them continue the computation.

It's gotten better now, but branching is still extremely unwieldy to do. No
branch prediction either.

~~~
jlawer
I suppose the question is if you would need to have multiple cores running
simultaneously on the same processing element, or if the fact you have some
many processing elements means you can just be inefficent and give each core
the role of emulating a processor. I haven't seen anything about the core
speed of the virtual cpu. however if its 5 or 10mhz, you don't really need
high performance or efficency, your just need a way of craming more jobs into
your servers and leaving the CPUs to run other game code.

~~~
jerf
I was trying to be somewhat polite, but... GPUs aren't magic speed juice. You
know those big speed gains that get GPU advocates so pumped up? CPUs have the
exact same massive speed advantages over GPUs too! That is, when have a task
that the CPU is designed for and the GPU doesn't, CPUs kick GPU's ass.

There's no point in trying to jump through hoops to convince the GPU to be
something it isn't. It isn't going to be faster than a CPU, or rather, a lot
of CPUs.

Being 5 or 10MHz is irrelevant. Being able to simulate them faster means you
need fewer servers to do it. (You can tell who actually works on clouds and
who doesn't by the attitude towards performance; people who don't actually
work on clouds think performance matters _less_ in the cloud....)

~~~
jlawer
Sorry but I think you misunderstood me.

I am fully aware that your average GPU isn't optimal for this task, however I
was imagining that there would still be value in shifting the world load off
the primary CPUs.

My line of thinking is around being able to use a single GPU stream processor
to emulate this CPU in the required performance (ie 10mhz). If you could
essentially do that you could have hundreds of these processors emulated for
the cost of managing the IO to them.

I am not expecting it to be "Magic Speed Juice", I am actually expecting to be
getting 1-5% performance from what the GPU are capable of. However I would see
this as a nett advantage if it took the workload off the CPU. Something like
Knights Corner could easily do this (its basically a pentium 1 core).

The point I am making is that Notch's CPU is basically a home computer CPU
from the 80s. They don't require that much functionality to emulate (as if a
dozen emulators in a few hours wasn't a good enough indication) and since
OpenCL is turing complete you can emulate anything (see running arm linux on
an 8 bit processor), the question is if its efficient enough to be viable?

Can a 1ghz stream processor emulate a 10mhz single issue simple risc core? I
have no idea, but I suspect its not the part we have seen so far that will be
the determining factor, I believe it will instead be the IO devices that
determine the requirements.

------
severb
Here's a Python asm compiler to go with those tools
<https://github.com/severb/0x10c-asm>

~~~
coderdude
This is so cool. Between jtauber's work and the other versions floating around
I can't imagine that we won't be seeing basic compilers for higher level
languages soon. So far there are emulators for the CPU, multiple assemblers,
and a disassembler. Have you checked out the C version on the front page? He's
been updating it like mad.

------
overshard
This is certainly the best implementation so far, clean, readable and actually
works. Also allows for writing in ASM and not HEX with the assembler.

------
anthonyb
That Cell class looks a bit odd, and I can't imagine it's doing good things
for your memory use. Perhaps you could simplify it by making your registers a
list or a dictionary, rather than a tuple?

ie.

    
    
      self.registers = { 'A': 0x0000, ... }
    

then you could access it directly:

    
    
      self.registers['A'] = 0x0001

~~~
jerf
Memory use isn't the problem, it's the added code complexity. Not having

    
    
       self.registers = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    

instead is adding ".value" all over the place where it isn't necessary. I
haven't actually tried this but it looks like dropping the Cell class
entirely, putting that line in as the definition of registers, then
s/.values//g ought to work, or very nearly work.

PC, SP, and O are already defined as variables containing the index for that
register, a fine way to do it.

~~~
jtauber
The reason for the Cell() is explained in a comment; I have to be able to pass
around references to registers and memory locations distinct from their value.
I'm open to alternatives but the above won't work as how do you pass in
"register A" or "memory location 0x1000" as the arg to an instruction method
in that case?

~~~
jerf
As a offset, probably, with appropriate changes. You're probably better off
channeling C design here than Python. I'm running on the assumption that while
speed may not be your overriding priority, you will want this to run with
_some_ speed. I haven't examined the opcodes, but in this case even if I had
to distinguish between a number and a register reference I'd probably do
something like let numbers be numbers and let register references be one-
element tuples containing the register offset, then switch on the type when it
came time to try to use them. That is most likely going to be significantly
faster than going through the very powerful class/instance machinery on
Python, and should you be inclined to play with PyPy it'll probably JIT a heck
of a lot better too. (Although I'd also play with having a number with a very
high bit set to indicate that it's a register reference, which would probably
JIT even better, since there'd be no type check.)

By splitting the difference and playing with PyPy you should be able to use
Python to dodge out on a lot of the C bookkeeping BS while potentially not
paying very much on the speed penalty. Using a lot of Python constructs could
result in a multiple orders of magnitude slowdown for only marginal gain in
this case.

~~~
jtauber
I did think about passing around a (type, identifier) tuple where type =
REGISTER|MEMORY|LITERAL but I was put off by writing code conditional on type.
The OO programmer in me dies a little when that is done rather than
polymorphism.

~~~
jerf
Match the tools to the task. Organizational schemes suitable for multi-
hundred-thousand line codebases aren't always needed for something like this,
which just isn't going to get that large. Old-school bitbashing can be both
fast and easy enough to read. OO can cost you a lot here for not very much
gain.

Or whatever. Your program, of course. (No sarcasm.)

------
mahmud
this one is one of the more complete ones: has a disassembler. should be
trivial to add stepping and breakpoints.

------
praptak
How did you decide which value of the PC to use (current instruction? next
instruction? second word of current instruction?) when PC is one of the
operands? The spec is not very clear on when exactly is PC incremented.

------
VMG
I'd love to see a Haskell or Lisp implementation

~~~
clavalle
Here you go:

[https://docs.google.com/?pli=1#folders/0BxArLOveUH1bb081YWhD...](https://docs.google.com/?pli=1#folders/0BxArLOveUH1bb081YWhDOHVTUzZCR2czdFdPOHNXZw)

