
Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM [pdf] - espeed
https://people.inf.ethz.ch/omutlu/pub/ambit-bulk-bitwise-dram_micro17.pdf
======
BenoitP
Very interesting. At first, I thought this was pushing the processing closer
to the data with some new special CPU logic inside the DRAM. I'm not sure, but
it seems to be available in out of the box current memory.

The fact that Intel + NVidia are present in the paper is very exciting too.
Does this mean we could see the improvements with a simple software update?

This end of 2017 and start of 2018 is very weird. We got industry-wide
performance degradation with meltdown, and now we are seeing industry-wide
performance upgrade with another hardware trick!

~~~
deepnotderp
Huh? If I'm not mistaken, Ambit is not a new development. I remember reading
about it. (or at least some other commodity DRAM bitwise accelerator)

~~~
puzzle
Twenty years ago, Sun had special RAM in their Elite3D cards to do write-only
alpha blending and z-buffering (saturated updates), instead of the much slower
read-modify-write: one PCI write instead of one read + CPU computation + one
write. That might have been UPA or even Sbus instead of PCI, but you get the
idea. ATI and E&S later used it, too.

[http://www.michaelfrankdeering.com/Projects/HardWare/p3DRAM/...](http://www.michaelfrankdeering.com/Projects/HardWare/p3DRAM/p3DRAM.html)

This looks like a variant of the Sun/Mitsubishi work, after a quick skim.

------
mkj
Having RAM chips being able to do bzero() internally would give a general
speedup in a pretty wide range of applications. Might help security with cheap
zero-on-free too.

~~~
reacweb
malloc and free could also be hardcoded.

------
Growlzler
Reminiscent of HPE's comment at the In-Memory Compute Summit September 2017 -
GSI's PIM paper would be interesting with a few changes. Also persistent
memory will be announced most probably in May time frame which is required
with GSI's which requires a non-destructive Read...,

------
deepnotderp
As an unrelated sidenote, I feel really bad for Onur Mutlu, his thesis,
Runahead execution, basically became a giant attack vector thanks to Spectre
(and also perhaps Meltdown).

~~~
sitkack
I don't think speculation is dead, but I do think the MMU/L1/L2 system will be
turned on its head.

It needed to anyway for cloud workloads. The peer guest VMs should have never
been able to flush my caches. Modern CPUs are still designed for DOS with some
extra cruft tacked on the side.

~~~
xenadu02
Meltdown is fairly well handled (per-process tagging of TLB entries and the
separate kernel vs userspace top-level mappings solves the problem without
much perf impact).

For Spectre the problem is ultimately a failure to completely roll-back
processor state. I don't want to downplay the complexity but the state of the
cache needs to be tracked and if speculative execution triggered loading of
cache lines where the speculation was incorrect then those cache lines need to
be thrown away, effectively rolling back the cache. That shouldn't have a huge
performance impact but it seems to require new CPU designs.

~~~
rwmj
Intel CAT (Cache Allocation Technology) and similar helps a bit by subdividing
the cache between trust zones. So if you have two VMs rented by different
tenants and you have set up CAT to isolate them then they shouldn't be able to
interfere with each other. Of course CAT only covers the lowest layer of cache
so there may be still be attacks on stuff in L1/L2.

Edit: Paper on this subject:
[http://palms.ee.princeton.edu/system/files/CATalyst_vfinal_c...](http://palms.ee.princeton.edu/system/files/CATalyst_vfinal_correct.pdf)

