
Russia’s Elbrus 8CB Microarchitecture: 8-Core VLIW on TSMC 28nm - zdw
https://www.anandtech.com/show/15823/russias-elbrus-8cb-microarchitecture-8core-vliw-on-tsmc-28nm
======
mobilio
This is less known fact, but few years (15+) ago Intel hire Elbrus design
team: [https://www.extremetech.com/extreme/56406-intel-hires-
elbrus...](https://www.extremetech.com/extreme/56406-intel-hires-elbrus-
microprocessor-design-team)

"On Monday, Intel representatives confirmed reports in Russian-language
newspapers that the American chip giant had hired approximately 500 engineers
and related staff from the Elbrus Moscow Center of Sparc Technology, a state-
sponsored design house in Russia. Some of the engineers will be hired away
from Unipro, a related company. The new hires include Boris Babayan, Alexander
Kim and Ivan Bolozov, said to be the architects of the E2K processor, a failed
“Itanium-killer”."

And same happens in past too:
[https://www.theregister.com/Print/1999/06/07/intel_uses_russ...](https://www.theregister.com/Print/1999/06/07/intel_uses_russia_military_technologies/)

But i couldn't get information more about this team.

~~~
thodin
"Russian designer could have been inspiration for Pentium name " \- it's
actually a fake news, Pentkovsky joined Intel several years after start of
Pentium R&D.

~~~
monocasa
I've heard that Pentium was named Pentium really late into it's dev cycle once
the trademark office told Intel that trademarking model numbers wasn't going
to be kosher.

~~~
phkahler
I thought Intel had a contest for naming the 486 successor.

~~~
fb03
And i thought pentium was basically 'penta'=5 ... 486 + 1 arch bump.

------
rurban
I find this wide addressing mode interesting

> -m128

> compile in 128-bit secure addressing mode with hardware access control to
> objects.

> In this mode, a pointer to data and functions takes 128 bits. It contains
> the 64-bit address of the object, its size (no more than 4 GB) and the
> position of the pointer inside the object. The mode enhances program memory
> control during execution.

Like Intels unused bound checking additions, it checks the starting offset and
the ending offset. But without the HW hash table.

~~~
larozin
There was a video on YouTube (in russian) where lead developer from MCST
confess that they were unable to use this technology in practice. The entire
software stack should be ported to strictly eliminate any usage of pointer
magic in every place. They poured years on this and only succeed on porting
libc and few small base libraries.

------
skissane
There seems to be very little technical details available on the E2K
processor's instruction set. Although Linux has been ported to run native on
it, the E2K Linux distribution and sources don't appear to be publicly
available.

Some people suggest this is security-through-obscurity connected with E2K's
use in Russian military systems. I wonder if that is true.

The other issue, from what I understand, is the only C compiler for E2K is
proprietary [1] (it is using EDG frontend). There is no support for E2K in GCC
or LLVM. This also means their port of Linux couldn't be upstreamed since it
can't be compiled with GCC.

[1] [https://lvee.org/en/abstracts/303](https://lvee.org/en/abstracts/303)

~~~
ltt481
There's an article in Russian about porting V8 and Firefox's SpiderMonkey onto
these processors:
[https://habr.com/ru/company/jugru/blog/419155/](https://habr.com/ru/company/jugru/blog/419155/)

You may gleam some architectural details from the article and the comments.

~~~
Symmetry
That was well worth the read. Looks like the pipeline isn't exposed, which is
what I'd expect for a VLIW in this role. They've got a some support for
speculative loads but it seems pretty small so I'm not surprised they're
having trouble adapting to general purpose code.

~~~
throwaway_pdp09
I'm not sure I understand what you mean by the pipeline being exposed. No
processor I'm aware of other than the i860 exposed its pipeline, so I don't
know what you mean by that here. Can you elaborate?

~~~
Symmetry
Lets say that a load from the L1 data cache takes two cycles. Take the
following instruction sequence.

    
    
      r1 = 3
      r1 = load(r2) // will return 8
      r3 = r1 + 2
    

In a typical processor the system will stall or reorder things so that the
load completes before the third instruction so the final result in r3 is 10.
In an exposed pipeline model the assignment to r3 will happen before the load
instruction returns so the final value of r3 will be 5. I don't think this is
very common outside VLIW systems but you have some famous ones that have used
it there, like the Transmeta Crusoe. And the Mill guys are hoping to use it,
basically, though they're calling it phasing and it's not _exactly_ the same
thing.

------
dogma1138
Some benchmarks (posted in the AnandTech comments).

[https://raw.githubusercontent.com/EntityFX/anybench/master/d...](https://raw.githubusercontent.com/EntityFX/anybench/master/doc/results.xlsx)

[https://github.com/EntityFX/anybench/tree/master/results](https://github.com/EntityFX/anybench/tree/master/results)

~~~
bufferoverflow
Wow, quite pathetic results.

~~~
AlEinstein
The floating point results are suspiciously good compared to the rest of the
results. I wonder if they have some good VLIW instructions for floating point
or it’s just an error in the spreadsheet.

~~~
Symmetry
VLIWs tend to do well for programs with predictable-at-compile-time memory
access patterns which tends to correlate with floating point heavy code.

------
eqvinox
Just from a "proliferation and evaluation of technology" point of view, I find
it nice that VLIW is getting a bit of a refresh (as an application processor,
as opposed to GPUs or DSPs.) It's been quite some time since the Itanic sunk,
and compilers have evolved... even if it doesn't go far, this is a nice
opportunity to evaluate how well this can do in 2020.

~~~
Symmetry
We had NVidia keeping Transmeta's lineage alive in their automotive SOCs until
quite recently, but now they've switched over to Arm's latest it seems. Too
bad, the combination of code-morphing and an exposed pipeline VLIW design
seemed like it could have been a good one though scheduling around variable
delay memory accesses remains a challenge.

------
kick
For those of you in Moscow, if you go to the Yandex Museum you can play with
an Elbrus box. It's really neat! Pretty fast, and has an entire Linux
environment running on it.

------
amq
Another benchmark comparison:
[https://translate.google.com/translate?sl=auto&tl=en&u=https...](https://translate.google.com/translate?sl=auto&tl=en&u=https%3A%2F%2Fhabr.com%2Fru%2Fpost%2F501588%2F)

------
tester34
It's great to see that countires are starting to become more and more
technologically independent.

Losing dependency on Silicon Valley may be another step after becoming
hardware independent, or even those efforts can run parallelly.

I think I'd want to see tech hubs like London or Hong Kong to have significant
part of "giant corps" market

~~~
iagovar
You can't be independent when your only realistic fab is TSMC.

There are some fabs here in Europe, all of them are WAY behind TSMC, be it
Infineon, STM or even Globalfoundries, if they are still running.

Designing hardware is nice, manufacturing it is the real problem.

~~~
arianvanp
The biggest Fab factory company in the world is Dutch [0]. TSMC, Intel and
friends are customers of theirs and use _their_ equipment and IP.

So the Intellectual property seems to be there? It's just that we don't have
any actual factories in EU.

TSMC seems wholly dependent on Europe here.

[0] -
[https://en.wikipedia.org/wiki/ASML_Holding](https://en.wikipedia.org/wiki/ASML_Holding)

~~~
mrweasel
Wow, but does that mean that TSMCs 7nm technology is really ASMLs 7nm
technology, or how does that work?

~~~
wmf
No, developing a semiconductor manufacturing process requires far more than
the equipment.

~~~
mrweasel
I don’t know how to make a semiconductor, but does that mean that Intel
couldn’t make a 7nm even if you gave them access to TSMCs fab, because they
don’t know how to design a 7nm chip?

~~~
deepnotderp
Assuming no knowledge of the process, Yup!

Although it's maybe complicated by the fact that Intel has active development
on 7nm, so they might be able to pick up hints from an existing 7nm line.

------
BruceEel
So x86/64 support via BT, cool. But I couldn't readily find any (English
language) docs describing the _native_ instruction set, have you folks seen
any?

~~~
AlEinstein
The google translation is not bad:

[https://translate.googleusercontent.com/translate_c?depth=1&...](https://translate.googleusercontent.com/translate_c?depth=1&nv=1&pto=aue&rurl=translate.google.com&sl=auto&sp=nmt4&tl=en&u=http://ftp.altlinux.org/pub/people/mike/elbrus/docs/elbrus_prog/html/chapter10.html&usg=ALkJrhhrsISU2r8IfGYdVCux0QkhRjbErA)

~~~
BruceEel
Indeed, thanks!

------
peter_d_sherman
>"All of the world’s major superpowers have a vested interest in building
their own custom silicon processors. The vital ingredient to this allows the
superpower to wean itself off of US-based processors, guarantee there are no
supplemental backdoors, and if needed add their own."

Any major company that deals in anything Internet, Web or Cloud related --
also has a "vested interest in building their own custom silicon processors",
for exactly this reason -- security.

Some have realized this (Apple, Google, etc.), and have the resources to do
this.

Others have realized this and do not have the resources to do this (smaller
companies).

Still others have not realized this yet (thinking that software virus
protection will solve problems which are in hardware), and are still in the
process of waking up.

Prediction: We're going to see a lot of new CPUs and CPU designs by many
players in the next 20 years...

------
misterhtmlcss
I'm surprised they aren't just going with ARM designed chips and moving to a
Linux system. This would seem to give them scale, pricing,
availability/compatibility and also if they support open source with financial
contributions then a really strong level of application security out if the
box.

State security is important, but corporate security is critical as a first
line of defense, because without those big successful companies and their
contributions in taxes, R&D and tech support the state loses flexibility and
top talent. Then a state will lose ground in preservation of their
independence.

~~~
kick
Elbrus is a more interesting architecture and presents many benefits over
existing architectures; allowing it to be lost would be bad for both technical
and political reasons.

~~~
drivebycomment
I don't know about political aspect, but I don't think Elbrus is interesting
technically. Frankly it's just another dead-end VLIW architecture. VLIW just
isn't interesting or even relevant for general purpose computing or even high
performance computing. DSP market is the only niche where VLIW has found
meaningful usage, and even there, the advantage is not huge by any means.

~~~
kick
It's not "just" another VLIW architecture, and hearing that take almost
assures me that you haven't looked into Elbrus.

~~~
drivebycomment
I have worked with MCST people (though at this point it's very long time ago).
My view on Elbrus is that it's kept alive by politics, rather than technical
merits.

The core challenge of the modern high-performance computing is the memory /
cache latency - i.e. whichever architecture that can generate the most number
of outstanding cache misses at all levels of cache hierarchy as quickly as
possible will perform the best.

Between superscalar and SIMT (and lots of SIMD), VLIW has no design space left
for high performance computing, as superscalar and SIMT are simply more
flexible (superscalar is better for a single thread performance, and SIMT for
highly parallel streaming workload). SIMD also didn't help, since it's
available for both SIMT and superscalar - negating parts of the VLIW
advantage.

Case in point: GPU is one area where the workload is better suited for VLIW.
Yet, AMD moved away from VLIW as their new architectures are not VLIW. nVidia
has been SIMT for a long time.

The niche VLIW still has some values is in DSP, where the overhead of extra
die space for superscalar becomes significant, and the workload is
predictable.

------
Symmetry
No surprise that it's got a large L1 instruction cache given that it's a VLIW.
Those aren't known for small code sizes, though I wonder why we don't tend to
see variable length VLIW architectures?

------
MintelIE
Where can I get one of these? Sick of involuntarily being included in the
Talpiot Program.

