
New MIPS64-based Loongson processors break performance barrier - alexvoica
http://blog.imgtec.com/mips-processors/loongson-mips64-processors-performance-barrier
======
rasz_pl
Just the other day someone (chinese characters name) with loongson.cn email
started dropping Mplayer patches(fixes,optimisations) on official mplayer
mailing list.

I dont remember ever seeing that out of Chinese SoC vendors. They usually
ninja patch internally and ship half working garbage binary (Im looking at you
Rockchip cocks) to selected favourite vendors.

~~~
kryptiskt
> They usually ninja patch internally and ship half working garbage binary (Im
> looking at you Rockchip cocks) to selected favourite vendors.

And if you actually do get their source, you will likely also get your own
private hell in trying to replicate their Windows XP-based build environment.

------
creshal
Interesting benchmark results: [http://blog.imgtec.com/wp-
content/uploads/2015/09/64-bit-CPU...](http://blog.imgtec.com/wp-
content/uploads/2015/09/64-bit-CPU-performance-AMD-Intel-ARM-MIPS.png)

• Both MIPS and ARM have surpassed AMD in the low-power SoC area (the only
where AMD is even remotely competitive right now), and are comparable to their
desktop/server products

• A (2? 3?) years old Intel architecture still beats all three by a wide
margin. And the recent Skylakes are again some 10% more efficient.

~~~
Someone
Uninteresting, I would say. Give me performance/Watt, performance/dollar or
performance, period, not performance/GHz, or delve into details explaining
what makes this this CPU do more per cycle.

For example, this CPU runs at 1.5GHz, the Cortex A57 at around 2Ghz. There
goes quite a bit of the difference in speed.

~~~
alexvoica
I think I address some of your points in the article. I specify that overall
peak power consumption for the chip is 30W - this is for an octa-core
configuration. This means one CPU roughly consumes 3.5W (if you take out the
coherency manager and fabric). Then I also referenced the SPEC CPU2000
performance number at 1GHz. From that you can easily calculate performance per
W. If you want more technical details about the architecture, they do have an
user manual here
[http://www.loongson.cn/uploadfile/cpumanual/Loongson3B1500_p...](http://www.loongson.cn/uploadfile/cpumanual/Loongson3B1500_processor_user_manual_P1_v1.5.pdf)

I also state that a new processor will be released next year and it will run
at above 2GHz which should solve the difference in frequency.

------
Symmetry
_Both Loongson-3A2000 and 3B2000 are 4-way superscalar processors built on a
9-stage, super-pipelined architecture with in-order execution units, two
floating-point units, a memory management unit, and an innovative crossbar
interconnect._

Wow, that's pretty wide for an in order processor that isn't VLIW.

EDIT: All the other references to the Godson/Loongson 3 series I can find say
that it's out of order in general - at least those articles that didn't repeat
that phrase verbatim (I sense a press release). And you can see the reorder
queue in the diagram. Unless they're doing something like the Atom did where
the ALUs are in order but the AGUs are out of order, but then why have
register renaming?

I found some details here[1] making it very clear this is out of order and
what happens at each pipeline stage and some (pretty bad) benchmark results
here[2].

[1][http://www.7-cpu.com/cpu/Loongson.html#Loongson3A](http://www.7-cpu.com/cpu/Loongson.html#Loongson3A)

[2][http://www.7-cpu.com/](http://www.7-cpu.com/)

~~~
rasz_pl
As soon as I read 'in-order' I knew it wont be pretty IRL. It might benchmark
ok, but real world code is another story. Remember IA64? Itanium was also in-
order, relying heavily on compilers.

~~~
hga
Well, IA64 is not just in-order, but Very Long Instruction Word (VLIW), per
Wikipedia a 128 bit instruction had 3 instructions. If your compiler is not
smart enough (hmmm, especially if your language is too low level, like C/C++),
it sure looks like "just in time" out-of-ordering will beat VLIW in keeping
your execution engines running. And surely caching makes a big difference
here.

I don't know how to compare a MIPS family in-order superscalar with Intel's P6
style out-of-order. ARM out-of-order cores are supposed to be quite a bit
faster than their in-order designs in IRL, aren't they?

~~~
sliken
Most in order architectures approximately double their performance per clock
whe they switch to out-of-order.

Examples include cortex-a57 vs a53. Alpha 21264 vs alpha 21164. As well as
many of the atoms until very recently vs core2 and similar chips.

With the various arm derivatives from qualcomm, samsung, and apple a under 2
GHz in order CPU doesn't sound particularly impressive. Especially when they
talk about "under 30 watts", that's intel territory, not tablets let alone
smart phones.

Trying to emulate arm and beat arm on price/perf or perf/watt seems like a
long shot with an in-order mips64. Similarly beating intel at price/perf or
perf/watt seems exceedingly unlikely with lower IPC and a MUCH slower clock.

The only thing that makes these mips64 look good is the even slower previous
generation mips.

~~~
alexvoica
I agree on the out-of-/in-order commentary. But to paraphrase, with great
performance comes great power consumption.
[http://www.anandtech.com/show/9330/exynos-7420-deep-
dive/5](http://www.anandtech.com/show/9330/exynos-7420-deep-dive/5)

I perceive the situation on binary translation in a slightly different way. I
think the aim here is to perhaps ensure that some legacy code written for
x86/ARM also runs alongside the apps written for MIPS. So they are not trying
to compete, but ensure they have a broad(er) ecosystem.

~~~
hga
_with great performance comes great power consumption_

Heh. And I suppose that's intrinsically true, with the added shadow registers
and logic to drive it all?

------
lallysingh
So let's not forget the larger impact here:

\- It's a CPU architecture who's primary R&D&Fab is outside the US, and in
fact in China.

\- As people move to mobile, they could just as easily to Longsoon over ARM.

\- Hey, guess where there's a giant mobile market?

Separately, can these run Irix? I miss Irix.

~~~
photosinensis
They might be able to run Irix. But why would you use an operating system that
hasn't been updated in 9 years? It's not going to have sufficient crypto
support to be useful.

That said, the smartphone market in China has been dominated by ARM chips. The
real question is whether there's enough market demand for a MIPS-based
architecture right now, or if the Chinese government can drive demand away
from ARM and towards MIPS.

~~~
yellowapple
> But why would you use an operating system that hasn't been updated in 9
> years?

The same reason why people still run Apple ][s and Commodore 64s and old DOS
machines: nostalgia.

I mean, if I were to try and build myself a _real_ Jurassic Park workstation,
it would be a disservice for it to be running Linux on an x86. No, I want the
real deal: IRIX on MIPS, complete with an animated Nedry confronting
unauthorized users with "Ah ah ah! You didn't say the magic word! Ah ah ah!".

And yes, I'm aware of jurassicsystems.com, but it just doesn't feel the same.
It also doesn't have `fsn`, so I can't zoom around my filesystem in 3-D while
muttering that "it's a UNIX system; I _know_ this!".

------
tinco
Anyone know if these CPU's require chipsets with locked down firmware? Would
be really cool if we could get a modern RMS proof laptop again.

~~~
rogerbraun
The most modern RMS proof laptop is the Thinkpad X200 with Libreboot. It's
probably still faster than this chip.

------
alexvoica
I don't know if people are aware of this but Richard Stallman (rms) uses a
Loongson-based laptop from Lemote.

~~~
scintill76
He moved to a Thinkpad X60: [https://stallman.org/stallman-
computing.html](https://stallman.org/stallman-computing.html)

~~~
alexvoica
Oh, that must have happened recently. Thanks for letting me know!

