
Neural network spotted deep inside Samsung's Galaxy S7 silicon brain - dragonbonheur
http://www.theregister.co.uk/2016/08/22/samsung_m1_core/
======
hcs
My Computer Architecture professor Daniel Jimenez worked on (invented?)
something like this:

"Dynamic Branch Prediction with Perceptrons" (PDF)

[http://hpca23.cse.tamu.edu/taco/pdfs/hpca7_dist.pdf](http://hpca23.cse.tamu.edu/taco/pdfs/hpca7_dist.pdf)

~~~
akash_m
I read the paper description and immediately jumped to conclusion that you are
Calvin's student :) We had to study this paper during one of my classes at
UTCS

------
sehugg
I don't see why branching is so hard -- it's two cycles to test the branch,
one more if the branch is taken, one more if it crosses a page boundary!
/6502joke

~~~
pawadu
well... for the size of its pipeline (1?) and the amount of speculative
execution going on (0) the non-existing branch prediction of 6502 was
basically optimal.

~~~
Someone
The 6502 has a tiny bit of pipeline/speculative execution. The CPU fetches the
next instruction before the current one has finished.

[http://www.atarihq.com/danb/files/64doc.txt](http://www.atarihq.com/danb/files/64doc.txt):

 _" If an instruction does not store data in memory on its last cycle, the
processor can fetch the opcode of the next instruction while executing the
last cycle."_

The 6502 also sometimes does excess reads and writes. For example, _" Read-
modify-write instructions (like INC) read the original data, write it back,
and then write the modified data."_
([http://forum.6502.org/viewtopic.php?t=2](http://forum.6502.org/viewtopic.php?t=2)).

One could call that speculative execution, where the processor writes back the
value, just in case adding one to it doesn't increase it, and then, having
discovered that adding one changes the value, writes the right value :-)

~~~
phire
The 6502 had no concept of not doing a memory operation.

Either it's doing a read or it's doing a write, there is no option to use the
bus unused. So the 6502 spams dummy reads and occasionally dummy writes
whenever it's doing something internally.

Edit: And that's not the only interesting thing I've learned recently about
the 6502, There is a very good reason why the stack is on page 1 and not some
other constant page. There are only 3 constant values which can be pushed on
the SB/ADH buses, 0xff, 0x00 and 0x01.

Ox00 is loaded into upper address register for zero page instructions. For
Push operations, it loads 0x01 into the upper address register and puts the
stack pointer + 0xff into the adder, which subtracts one from the stack
pointer. (The 0xff is just the default bus value, if nothing else is pulling
it low, so you get it for free)

For Pop operations, it pushes the same 0x01 into both the upper address
register and the adder, at the same time.

So if they wanted to put the stack page at another constant page, or at a
variable page they would also need to provide more gates to provide another
constant one somewhere.

------
dingo_bat
This is literally insane. I've never heard of a neural net branch predictor.

Edit: my layman knowledge about branch prediction and neural networks is
showing its gross inadequacy :/

~~~
nilved
e: was wrong about everything

~~~
xlayn
If you have to pull an army of cores to fight two the really impressive arch
belongs to Apple.

OTOH I see this as the optimization of hardware to fix the cap that software,
paradigms and devs mean.

~~~
dingo_bat
As long as the die area is same, I see it as a fair fight.

~~~
pawadu
In fact, for the same die size, Android hardware has been adapted to work
better with the way Android works (many small processes working together).

This is maybe not the best approach for certain games or benchmarks, but for
normal use this has proven to be great.

------
Jerry2
Even with all those neural net advantages, dual-core, 2GB RAM iPhone 6s beats
octa-core, 4GB RAM Galaxy Note 7:

[http://www.redmondpie.com/galaxy-note-7-vs-iphone-6s-real-
wo...](http://www.redmondpie.com/galaxy-note-7-vs-iphone-6s-real-world-speed-
test/)

Video (3:29 sec):
[https://www.youtube.com/watch?v=3-61FFoJFy0](https://www.youtube.com/watch?v=3-61FFoJFy0)

PS: I highly recommend you check out the video... iPhone completely
obliterated Note 7.

~~~
viraptor
That test is relevant for the end user, but it doesn't make sense as a
hardware test. It's a different system, on different hardware, with different
apps (that happen to share the same name), and essentially what you're
comparing is resource loading strategies. iPhone is better at that - great.

But for a hardware test, I'd expect a single running app, executing shared
native codebase, in no disruptions airplane mode, measuring specific part of
the hardware. And that's not even close to measuring before/after improved
branch prediction, which could be useful in both phones.

~~~
ddebernardy
> That test is relevant for the end user, but it doesn't make sense as a
> hardware test.

Err? It seems to me that the only tests that make any sense are those that
count for end users. Everything else is pointless. It doesn't matter if it's
the same system or code base or whatever other engineering fetish.

Users who compare their phones to see which goes faster look at how the same
app opens and runs on two separate phones. Which incidentally is what this
test is about. It's the only sensible test you can do.

~~~
MaulingMonkey
> Everything else is pointless.

Better understanding individual components lets engineers build better overall
systems. Millions of choices and decisions go into something like a phone,
most of which can't have their result easily measured by just looking at the
end result. But if enough of them are good, you end up with the iPhone.

If enough of them are bad, you end up with mass recalls, and a class action
lawsuit because your phones catch fire when charged overnight because your
engineers didn't think to test that specific use case, didn't test their
capacitors, didn't test their QA process, didn't unit test their battery
management code, or otherwise failed to indulge their "fetish" (read: job.)

Proper hardware tests might be pointless to end users, but that doesn't make
them pointless.

------
AstralStorm
This is not new, AMD has been using perceptron-based predictors probably even
before Bulldozer. They are good, but harder to optimise for than standard rule
based heuristic ones.

~~~
sitkack
They should expose the NN coefficients to the end user so that one can at
least start from a known-better value. This problem would seem analogous to
JIT warmup.

------
pepijndevos
The latter half of the article makes me wonder to what extend the cycle count
of instructions is specified, or up to a specific implementation.

------
mrfusion
Hmm now I'm wondering if the brain does something like this? Might the brain
need branch prediction? Interesting line of thought.

~~~
sbashyal
I have a background in Machine Learning and it is my belief that the mind does
use branch prediction. Take for example a scenario where you are climbing down
a stair in your house. Your brain already predicts that you are about to land
on the next step. Now if the step height was lowered than the usual, the
prediction fails and you immediately focus back on the climbing-down process
to see what has happened and prosecute whatever fall-back actions are needed
to minimize disruption.

~~~
randomacct44
Reminds me of what happens when you step onto an escalator that isn't working
(the broken escalator phenomenon):

[https://en.wikipedia.org/wiki/Broken_escalator_phenomenon](https://en.wikipedia.org/wiki/Broken_escalator_phenomenon)

------
goombastic
Why is it still slower than oneplus3 and htc 10?

~~~
Grazester
Samsung bloat?

------
ForFreedom
iPhone excel because the OS and HW is designed by one company unlike Samsung
where the HW and OS are from different companies, add to android Samsung
bloatware.

The iOS works well with the iPhone simply because it is fine tuned for the
product. On the androids side the OS is tuned to add bloatware of the
respective HW manufactures. To add the so called speed the HW manufactures put
in more RAM.

~~~
gambiting
You do realise that samsung designs and makes most of Apple stuff, right?

~~~
eknkc
What parts do Apple get exclusively from Samsung and depend on their expertise
these days?

It seems like they go for redundancy, use two suppliers etc for common parts
and chips. Which means these are built based on the designs of Apple. Samsung
just manufactures them.

I don't follow closely though so I might be wrong.

~~~
pawadu
I think that was the plan after the court fight with Samsung. But after some
quality problems (IIRC with LG displays) Apple had to give up the dual-source
approach for at least some components.

Now the original iphone was probably 80% Samsung design and manufacturing, but
these days the design part is closer to 0%.

