
Qnice – An Elegant 16 Bit Processor - doener
http://qnice.sourceforge.net/
======
throwaway_pdp09
Tangential question, something I've always wanted to ask. The ISA here is
given by example, here's a couple

    
    
      MOVE src, dst:     dst := src 
      ADD src, dst:     dst := dst + src 
    

I have always wondered why assembler is written the first way

    
    
      MOVE src, dst
      ADD src, dst
    

rather than the far more intuitive (and slightly more compact) second,
something like

    
    
      dst := src 
      dst += src 
    

This also completely eliminates questions about which direction data goes, is
'mov a,b' a:=b or b:=a for example.

I can't see any reason for not using the established C-type notation, so why
is the original style always perpetuated?

I'm aware that C approximately maps onto the original PDP ISA and has been
called a high-level assembler, true or not that's irrelevant, but why the
higher-level syntax has never made its way to lower level ASM has baffled me.

~~~
pjc50
> why the higher-level syntax has never made its way to lower level ASM has
> baffled me

Traditionally assembler was both for bootstrap processes and for the older
heavily resource constrained systems, so there was a lot of emphasis on making
it as simple to parse as possible; opcode first, then arguments, because the
opcode is first in the byte stream of almost all variable-instruction length
systems.

And of course no support for complex expressions, so no point in building a
full mathematic expression parser.

Then there's the question of _which_ add instruction you want. The vast
proliferation of instructions in things like AVX gives you "VADDPS" and even
"VFMSUBADD132PD", which expands to:
[https://www.felixcloutier.com/x86/vfmsubadd132pd:vfmsubadd21...](https://www.felixcloutier.com/x86/vfmsubadd132pd:vfmsubadd213pd:vfmsubadd231pd)

    
    
        IF (VEX.128) THEN
            DEST[63:0]←RoundFPControl_MXCSR(DEST[63:0]*SRC3[63:0] + SRC2[63:0])
            DEST[127:64]←RoundFPControl_MXCSR(DEST[127:64]*SRC3[127:64] - SRC2[127:64])
            DEST[MAXVL-1:128] ←0
        ELSEIF (VEX.256)
            DEST[63:0]←RoundFPControl_MXCSR(DEST[63:0]*SRC3[63:0] + SRC2[63:0])
            DEST[127:64]←RoundFPControl_MXCSR(DEST[127:64]*SRC3[127:64] - SRC2[127:64])
            DEST[191:128]←RoundFPControl_MXCSR(DEST[191:128]*SRC3[191:128] + SRC2[191:128])
            DEST[255:192]←RoundFPControl_MXCSR(DEST[255:192]*SRC3[255:192] - SRC2[255:192]
        FI
        VFMSUBADD213PD DEST, SRC2, SRC3
        IF (VEX.128) THEN
            DEST[63:0]←RoundFPControl_MXCSR(SRC2[63:0]*DEST[63:0] + SRC3[63:0])
            DEST[127:64]←RoundFPControl_MXCSR(SRC2[127:64]*DEST[127:64] - SRC3[127:64])
            DEST[MAXVL-1:128] ←0
        ELSEIF (VEX.256)
            DEST[63:0]←RoundFPControl_MXCSR(SRC2[63:0]*DEST[63:0] + SRC3[63:0])
            DEST[127:64]←RoundFPControl_MXCSR(SRC2[127:64]*DEST[127:64] - SRC3[127:64])
            DEST[191:128]←RoundFPControl_MXCSR(SRC2[191:128]*DEST[191:128] + SRC3[191:128])
            DEST[255:192]←RoundFPControl_MXCSR(SRC2[255:192]*DEST[255:192] - SRC3[255:192]
        FI

~~~
throwaway_pdp09
We don't have resource constrained systems now. I doubt parsing asm ever took
up much anyway.

I'm not suggesting we permit complex (ie. general opcode combination)
expressions. I deliberately never suggested it.

As to straighforward FMAC-type instructions, that's even clearer in my
notation: a += b * c

But As for the proliferation of those add types you linked to, OK, possibly
valid, but how much of this are you going to be manually writing compared to
the very mundane style non-simd non-packed instructions? My guess is very
little as you have relatively few number of such instructions (although
they're doing a great deal of work over streams of data, but that's not
relevant).

~~~
pjc50
Oddly I think it's the weird instructions that people are going to write more
often: hardly anyone _writes_ assembler compared to the more likely use case
of reading disassembly, and when they are writing it it's usually specifically
to do something that's difficult or impossible to get a high level language to
emit.

~~~
throwaway_pdp09
I think that makes a lot of sense.

------
magicalhippo
Ah neat!

Very recently I picked up an FPGA dev board and started playing with
implementing my first toy soft-CPU. For fun, and definitely not for profit, I
decided to design my own ISA for it.

I decided to see if I could make a RISC-y MISC (minimal instruction set
computer) design. From what I could see a lot of MISC-based computers had
quite complex instructions.

While my resulting ISA is likely quite crap, being my very first ISA ever,
it's been quite a fun exercise so far. I programmed quite a lot of asm back in
the days, but thinking about which instructions are needed and why was
something else.

~~~
colatkinson
I've always wanted to do exactly what you just described! Can you recommend
any resources to get started with FPGA design for a software dev?

~~~
magicalhippo
I've so far been quite happy with the iCE40UP5k[1] based dev kit I got, though
there are a lot of options out there[2].

The iCE40 FPGAs are a bit whimpy compared to Altera and Xilinx offerings from
what I understand, but I really liked the idea of an open-source
toolchain[3][4] being available.

To get a taste without committing cash you could just use a simulator[5],
which I imagine you'd be using a fair bit anyway as it allows you easier
access to the internal state.

As to actually programming, I've found the following resources useful. I
started with Verilog mainly because code-gen tools like nMigen[6] generate
Verilog, but for writing by hand it seems VHDL is preferred.

Anyway, links, first off some exercises to get going[7]. Introduction to
Verilog[8], also has a nice general overview of HDL. Details of how non-
blocking vs blocking statements in Verilog works[9], quite specific but was
very informative for me.

There's also quite a lot of activity over at Reddit[10], and good experiences
over at the EEVBlog forums[11].

As I said I just got started so no expert yet :)

[1]: [https://www.digikey.com/product-detail/en/lattice-
semiconduc...](https://www.digikey.com/product-detail/en/lattice-
semiconductor-corporation/ICE40UP5K-B-EVN/220-2134-ND/6596291)

[2]:
[https://joelw.id.au/FPGA/CheapFPGADevelopmentBoards](https://joelw.id.au/FPGA/CheapFPGADevelopmentBoards)

[3]: [https://symbiflow.github.io/](https://symbiflow.github.io/)

[4]:
[https://github.com/cliffordwolf/icestorm](https://github.com/cliffordwolf/icestorm)

[5]: [http://iverilog.icarus.com/](http://iverilog.icarus.com/)

[6]: [https://nmigen.org](https://nmigen.org)

[7]: [https://www.fpga4fun.com/](https://www.fpga4fun.com/)

[8]: [https://www.chipverify.com/verilog/verilog-
tutorial](https://www.chipverify.com/verilog/verilog-tutorial)

[9]: [http://sunburst-
design.com/papers/CummingsSNUG2002Boston_NBA...](http://sunburst-
design.com/papers/CummingsSNUG2002Boston_NBAwithDelays.pdf)

[10]: [https://www.reddit.com/r/FPGA/](https://www.reddit.com/r/FPGA/)

[11]:
[https://www.eevblog.com/forum/fpga/](https://www.eevblog.com/forum/fpga/)

~~~
tails4e
Verilog for writing RTL is fine, especially if you use the syhthesizable
subset of SystemVerilog. There used to be a bit of a religious war between
VHDL and Verilog, as VHDL had some syntax that prevented certain errors, but
with SV and some basic coding guidelines it's fine. While I'm sure some will
still be hanging onto VHDL, I'd say most of the industry is going the SV way.

~~~
aseipp
SystemVerilog improves things a lot actually, but the problem is that there is
(currently) no freely available, robust synthesis frontend that supports the
majority of the useful SV features (e.g. interfaces). In fact I don't think
there's any robust, ~complete FOSS SystemVerilog simulators either -- though
Icarus and Verilator support SV to varying degrees...

So if you want to stick with FOSS tools, then you're stuck with synthesizable
Verilog-2005 at best, for the moment. And standard Verilog very much sucks in
a lot of ways, I would argue, synthesizable subset or not. It's an
understandable choice though, in a field full of awful options. One day I'm
hopeful Yosys will support most of the necessary SystemVerilog features people
want for synthesis... (Then, it can also serve as an effective SystemVerilog
-> Verilog translation tool, which would be very useful on its own.)

------
titzer
I really like conditional subroutine calls. I wish x86 had them. It makes it
really easy to inline a fastpath of some safety check (e.g. nullcheck, bounds
check, write barrier, etc), and have the slowpath be factored out to a common
place. What PL implementations like the JVM, JavaScript engines, etc typically
have to do without this is they insert a conditional branch to "deferred code"
which is at the end of the function (statically predicted not-taken), but that
deferred code can't be shared, because it needs to branch back to the mainline
code. That costs code size. A conditional subroutine call is exactly the right
mechanism to solve this!

~~~
aparashk
Can’t agree more!

------
chalst
> 16 registers divided into to areas: R0 to R7 are in fact a window to a
> register bank containing 256 times 8 registers while R8 to R15 are fixed.
> This architecture makes subroutine calls and saving registers very easy
> (just increment/decrement the register bank pointer which is part of the
> status register). All in all QNICE features 256 * 8 + 8 = 2056 registers.

This is indeed nice and the kind of thing whose availability can affect the
design of low-level languages, e.g. non-ISO local variables in a Forth
dialect.

Cf. [http://www.complang.tuwien.ac.at/forth/gforth/Docs-
html/Gfor...](http://www.complang.tuwien.ac.at/forth/gforth/Docs-html/Gforth-
locals.html)

~~~
throwaway_pdp09
This sounds like Spark's register windows
[https://en.wikipedia.org/wiki/SPARC#Features](https://en.wikipedia.org/wiki/SPARC#Features)

I got it from a _very_ reliable source, someone who was involved in evolving
that hardware, that this feature caused Sun "an awful lot of pain" (his words,
best I can recall).

My understanding is that it greatly destroyed the ability to do out-of-order
execution, which was on top of the original designer's failure to understand
what the compiler could do with inlining that would largely negate the value
of this. From what I've read over the years, the Sparc hardware guys didn't
talk to the compiler guys - a trap the designers of the DEC Alpha very
carefully did not fall into.

Since this is a teaching project, I'm sure that feature is fine, but just
saying.

~~~
chalst
This is a very nice comment.

The Wikipedia article would benefit from a citeable source for the problems
they had: is that something you could help find?

~~~
throwaway_pdp09
I heard it said by a guy at a conference. This guy:
[https://en.wikipedia.org/wiki/Ivan_Sutherland](https://en.wikipedia.org/wiki/Ivan_Sutherland)
Sutherland was interested in the Sparc, surprising given he's better known for
graphics but there you go.

As it happened, Steve Furber was there too. Irrelevant but bragging rights &
all that.

A criticism from wiki itself after a very quick DDG:
[https://en.wikipedia.org/wiki/Register_window#Criticism](https://en.wikipedia.org/wiki/Register_window#Criticism)

HTH

~~~
chalst
Sutherland was something like a cofounder of Sun Labs, so I guess he was
likely to develop some interest. I'm afraid I couldn't track down anything
more specific, but I'll keep my eyes peeled. An interesting point I did find:
the SPARC architecture was later extended so that register windows could be
saved and restored other than via SUB calls, allowing instruction reordering
and their use for context switching.

The Register window article doesn't talk about Sun's experience or the issue
of reordering instructions.

~~~
throwaway_pdp09
I didn't know he was a sun co-founder. I remember him saying that he had an
interest in geometry rather than graphics, which related to chip design - but
that link is to me a bit nebulous, and it was a long time ago anyway.

Sun would not advertise an horrifically expensive design mistake, so no
surprise you can't find much. I've picked up a fair bit from random reading
around over the years so I can't remember where much of it came from.

Perhaps email Mr. Sutherland and just ask? Worst that can happen is he doesn't
respond.

(thanks for the bit about using other than SUB calls, I didn't know).

~~~
jecel
Not Sun but Sun Labs. Ivan's Sutherland, Sproull and Associates was bought by
Sun in 1990 to become the seed of the new Sun Labs.

About register windows, overflows and underflows generated traps to the
operating system, and for the combination of Sparc version 8 and SunOS that
meant thousands of clock cycles. That was improved in later products.

Berkeley's RISC I to IV all had register windows but RISC-V doesn't with the
argument that we have far better compilers now. Altera's NIOS processor had
register windows which were dropped in NIOS II because it made the processor
smaller without reducing the performance too much.

The AMD 29000 had a more flexible register window scheme and the Itanium a
very complex scheme.

Computer history is often more like a spiral than a line and old ideas that
have become bad might be good again in the future. With out-of-order execution
and register renaming you might once again get better performance out of a
binary with register windows.

------
Taniwha
Very pdp11ish, even looks like it supports "mov -(pc), -(pc)"

(if you ignore the register windows and the lack of byte instructions - word
addressing harks back to a previous age)

------
projektfu
Curious why it leaves out PC-relative addressing. By 1990 it seems that was
one of the obvious shortcomings of 8086.

------
arethuza
Has there ever been any attempts to systematically generate processor
instruction set designs, evaluate them against 'real' code and measure the
results?

Edit: You'd need to generate compiler back ends for each design as well, which
might be fun...

~~~
carapace
I don't have a link handy to give you but, yes, people have done things like
that. Look into Prolog research on compiling.

------
FeepingCreature
ABRA because it jumps away?

------
ngcc_hk
Also got some spare fpga, any test required?

------
saagarjha
Has anyone implemented this in hardware yet?

~~~
xellisx
[http://qnice-fpga.com/](http://qnice-fpga.com/)

~~~
dimator
that page has a WebRing at the bottom! what a blast from the past.

this is what sites used to do before aggregators and search engines dominated.

~~~
xellisx
Oh I had a couple sites that were on web rings. Haha

------
ape4
Misspelling "architectur" on the 3rd line doesn't exactly inspire confidence.

