
Resurrecting the SuperH architecture - justin66
http://lwn.net/Articles/647636/
======
rwmj
RISC-V seems like a better bet ([http://riscv.org/](http://riscv.org/)). It is
a clean, patent-free modern architecture. It already has kernel support, and
supposedly there will be both FPGAs and ASICs "soon"
([http://www.lowrisc.org/](http://www.lowrisc.org/)). Plus you can run it
under qemu: [https://rwmj.wordpress.com/2015/06/11/booting-risc-v-
linux-w...](https://rwmj.wordpress.com/2015/06/11/booting-risc-v-linux-with-
qemu/)

------
unwind
This:

 _There have been some minor additions, he said: the J2 adds four new
instructions. One for atomic operations, one to work around the barrel
shifter, "which did not work the way the compiler wanted it to [...]_

Is _so_ intriguing! Does anyone know what was wrong with the original barrel
shifter design? I tried reading up on it but failed to find much reference
material. I followed the link to the J-core community site to read the code,
but it wasn't immediately browsable, just available for download.

I assume there were compilers for SuperH back in the day, didn't they use the
shifter? Why not fix the compiler to teach it the existing instruction, rather
than adding an instruction just for this? How wrong can a shifter be, really?
The questions just heap up.

~~~
TapamN
Compilers did use the shifter. I don't know if this is exactly what he was
referring to, but one oddity with the SH4's dynamic shift instruction is that
it only shifts to the left (there are also a limited number of shift-by-small
constant (1,2,8,16) amount instructions). To shift to the right, you have to
first negate the shift amount, then preform a left shift. So if use did a
right shift by a non-constant, you would always see a negation of the shift
amount before the shift. My guess as to why it was implemented like this was
that since the SH4 had a fixed length, 2-byte instruction set, running out of
possible instructions for future expansion was a real hazard, and not encoding
both directions was done to save space.

On the original SH4 implementation, under certain conditions, there had to be
one cycle in-between when a shift-amount was generated and when it was used,
otherwise there would be a one-cycle CPU stall. A real right shift would avoid
the need to schedule around this stall. This isn't necessarily something that
needs an extra instruction to fix, the implementation could be designed to not
need the stall, but it might difficult to work around. I don't to circuit
design, but dynamic shift instructions typically look at as few bits in the
shift amount to simplify and speed up the design of the shifter. The reason
for the delay in the original SH4 is probably because it analysis and tags
each register with information for the correct shift direction and amount, and
certain units won't have this information ready for the shifter in time, hence
the stall if the shift is too close the shift amount generation. (I've read
this certain CPU implementations have done similar work in tagging if a
register is zero or not, in order to help keep branch-on-zero/not-zero
instructions quick.) If the instruction talked about is a dedicated right
shift, it could be defined in a way that doesn't need a negation and extra
tagging, would be much more compiler friendly, and faster.

~~~
unwind
Thanks!

Does that mean that the shifter is actually capable of doing rotates?
Otherwise the negation part doesn't make any sense.

If you have 0xf0 and want to shift it three bits to the right to get 0x1e, no
amount of negated-amount left-shifting is going to do that unless the
instruction is a rotate.

If, on the other hand, you can do a 8-bit rotate left of 8-3 = 5 bits, that
would produce the same result and need that "negation" (which is actually an
inversion).

------
__david__
We used an SH2 for the main processor of a DDS (DAT) tape drive at a company I
worked at. We had a prototype that used an SH3 and I remember spending a few
days hacking Linux to boot on our hardware (I think we made it to user space
and then the project petered out).

GCC has supported the SH series since at least the 2.7 era (though Hitachi's
compiler seemed to produce better code in those days, but only ran under DOS).

------
cbd1984
From the comments there, early MIPS architectures are also patent-expired at
this point.

They might have had more actual work done with them back in the old days, so
their code might be in better shape now.

~~~
kevin_thibedeau
The key point is that this is the first widely deployed 32-bit RISC platform
with a 16-bit instruction set to come off patent. That has advantages for the
embedded applications being targeted in this case. You won't get that with a
MIPS or ARM clone because MIPS16 and Thumb are still under patent.

------
spydum
mm superH. reminds me of the old HP Jornada's.. they also ran on SH3 processor
(well, some did). back in the day these were mindblowing to me.. a real
pocked-sized PC, with a modem no less!

------
hoggle
I always had high admiration for the SuperH architecture and now to read about
its potential to fuel the much-needed open hardware movement is _fantastic_
news.

------
nickpsecurity
Good to see them doing it. I included SuperH in my list [1] of non-Intel
architectures and old hardware to use post-Snowden. Additionally, I proposed
that it falling out of favor despite Japanese chip-makers backing it might
make it a nice candidate for trying to get them to open the design but still
sell it. Concept is a proven design which can be verified by third parties,
masked by whoever, and taped out at fab of their choice. Although, there's
work in moving to new nodes and that cost would be on whoever did it. The
precedent is Gaisler's SPARC-based processors and I.P. [2] that are dual-
licensed as commercial and GPL with tools for easy customization.

Alternatively, I proposed the security enhancements for processors showing up
in academia be applied to this or another processor with low market share as a
differentiator. Some of these enhancements take almost no chip real-estate,
esp simple tags & tag-checks. The chip designer could also make money for the
semi-custom work. Time has passed, that didn't happen for low market chips,
and did happen for AMD+Intel for non-security applications [that I know of].
Matter of fact, even though my scheme didn't happen, AMD is making so much
money off the other half of my proposal that they could be cited when trying
to convince chip-makers to do it for security enhancements with mass-market
availability. So long as they don't bear the cost of failure (huge in ASIC's)
they might go for it.

Finally, anyone wanting to deploy this or other things, remember the
Structured ASIC's with FPGA conversions. eASIC [3] has a long track record in
this with offers down to 28nm. Gigoptix [4] does S-ASIC's down to 28nm. Tekmos
[5] offers a similar product at 350nm (good for budget masks). Just make sure
you design in FPGA's with ASIC transition in mind from the start & follow
published advice on that (available with Google or consultation). The result
is you prove it in FPGA's, even use it on FGPA boards, and then move it to
S-ASIC later for reduced costs/power + maybe speed increase. Authors are right
that 180nm is a sweet spot although proving it at 350nm or higher first might
be smarter given costs.

[1]
[https://www.schneier.com/blog/archives/2013/09/surreptitious...](https://www.schneier.com/blog/archives/2013/09/surreptitiously.html#c1762647)

[2] [http://www.gaisler.com/](http://www.gaisler.com/)

[3] [http://www.easic.com/products/28-nm-easic-
nextreme-3/](http://www.easic.com/products/28-nm-easic-nextreme-3/)

[4] [http://www.gigoptix.com/products/asics/asic-
type/structured-...](http://www.gigoptix.com/products/asics/asic-
type/structured-asic/)

[5] [http://www.tekmos.com/products/asics/process-
technologies](http://www.tekmos.com/products/asics/process-technologies)

~~~
nickpsecurity
EDIT to add: Triad Semiconductor [1] has a mixed-signal ASIC take on S-ASIC's.
Interesting stuff. Just found it.

[1] [http://www.triadsemi.com/vca-technology/](http://www.triadsemi.com/vca-
technology/)

------
listic
Has Xilinx bitstream been reverse-engineered and reimplemented in open-source?
I haven't heard about it.

~~~
bri3d
I don't think so - at least according to their site, the J2 build chain uses
Xilinx ISE: [http://0pf.org/j-core.html](http://0pf.org/j-core.html) . The
only fully open FPGA toolchain I'm aware of targets Lattice/SiliconBLUE iCE40:
[https://github.com/cseed/arachne-pnr](https://github.com/cseed/arachne-pnr)

------
thrownaway2424
"Resurrecting" something that's not actually dead. SuperH is still a
commercially-used CPU that you can buy off the shelf, as the CPUs themselves
or inside many devices.

------
kjs3
I do like SuperH. Especially the later ones. The SH4 had the most delightfully
odd, fully pipelined 4x4 matrix X 4x1 vector instructions. If you could fit
your problem in that box, you could get so remarkable speed for the clock.

