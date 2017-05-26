I read the whole article and I don't see anywhere in this claiming it is a new idea. But certainly there is no other processor on the market like this one. So it fits the category of "new type of processor", even if it's for a dedicated use-case - just like those ML optimized processors.
There's a big difference between knowing it is theoretically possible and the value you get from a real world implementation with real users. That sounds pretty news worthy to me.
I started reading the comments before I read the article. And, sadly, once I saw your comment I knew exactly who wrote the article. One click and confirmed!
An interesting approach to non-Von Neumann computing is to put ALUs in memory, to take advantage of the fact that DRAMs have far more internal bandwidth than what is exposed in traditional systems: http://researcher.ibm.com/researcher/files/us-leejinho/tvlsi....
However, their business model presumes "Wait until a DRAM manufacturer buys us", which IMO is why nothing's moved forward. DRAM manufacture is low-margin and not really the place to look for this kind of risky introduction to the market. I'd love to see this form of parallelism, and their take on breaking the memory bandwidth wall; it meshes great with the types of problems I work on.
Do you see interposer style designs as linking up terabytes of DRAM? (at least in the near future) All the chips you're talking about are pretty major dies, not really suitable for having many stacks of them in conventionally tightly spaced DIMM arrays to reach such RAM sizes.
Of course, 3d chip advances might throw all current assumptions out the window and change the layout of everything.
I don't think we're going to see a terabyte of dram on an interposer for a while (4GB is about the max you can get commercially right now). I'm not sure what you're trying to get at though; even with logic in DRAM you have to go off chip to get to terabyte levels, so I don't see the advantage.
> All the chips you're talking about are pretty major dies, not really suitable for having many stacks of them in conventionally tightly spaced DIMM arrays to reach such RAM sizes.
The stacking happens in package (<1mm thick). Your DIMM array is going to have to be pretty damn tight for that to matter.
> Of course, 3d chip advances might throw all current assumptions out the window and change the layout of everything.
TSVs are 3D (or "2.5" depending on the configuration). You should have thrown out the assumptions back in 2014.
Many-core processors with low-latency wide-bus on-chip random access speeds need to scale horizontally as well. Focusing on large chips means you're not going to have very many on a single motherboard, where QPI/HyperTransport/memory-bus style communication can achieve higher and more user-transparent shared memory access performance, compared to offboard communication networking.
The "stacks" I was talking about are just the rows of DIMM slots stacked together in tight proximity, compared to the number of CPUs/GPUs/etc per unit area on a multi-socket motherboard to achieve the same memory footprint. (easily apples and oranges in the current incarnations, admittedly, but focusing on end-user expandability and configuration options)
In my opinion, this type of on-chip fast-RAM model in larger memory systems would best take advantage of splitting up processing to where the memory is, as opposed to a fatter node model, especially when it comes to physical size and inter-chip communication of many chips.
However, if we soon have many-core chips with 32 parallel memory buses leading to in-package 256GB DRAM silicon, it does become more moot.
Yes, I know that 3d silicon stacking, HBM, etc exist now. While they've had some good speed & power advantages, they remain very limited in terms of memory footprint. And of course, the memory size is fixed per such a chip, and there doesn't seem to be a path for many-chip expansion solutions for anything but the top-end enterprise market. I think the Venray model has a simplicity and expandability that keeps the most advantageous tradeoffs.
So with that out of the way, what exactly is the advantage of logic in memory? Because so far nothing you have described is actually an intrinsic advantage.
Logic on DRAM should be simpler & cheaper, which would in the long tail lend itself to more horizontal scaling (and horizontal scaling is currently required to get large memory footprints economically). More elaborate & expensive designs would end up more in fat node designs. There's really no technical difference when looking at many-chip architectures as the chip package is a black box at that level, but it's more an economic one.
Yeah thats the big black box titled "magic happens here" in their diagram. Maybe something like HBM2?
which effectively extended the MTA thread pool to...infinity
Similar to the machine found in TIS-100:
I wonder if that was a translation to PR artifact, or if there might be something there to accelerate some of the Java or .Net memory access patterns that we all use.
On top of what access patterns the developers tend to use, there's always the JVM garbage collector (the bane of efficiency) which runs a basic graph algorithm over the entire program's network of pointers. Although, I suspect in many applications the graph in question is small (by comparison to big-data-scale graphs) and throwing heavy machinery like this at it would be overkill.
Then again, maybe I'm not dreaming big enough, and this kind of processor will make the need for cache line locality optimizations, careful instruction scheduling around memory I/O, and half-second freezes for GC a thing of the past?
> "This non-von-Neumann approach allows one big map that can be accessed by many processors at the same time, each using its own local scratch-pad memory while simultaneously performing scatter-and-gather operations across global memory."
As they collect more data related to this, they'll need better ways to traverse these graphs.
The SPUs are still designed for sequential processing of memory, just smaller, discrete blocks. The whole chip is orchestrated by a standard von Neumann processor anyway, so that acts as a bottleneck to keeping the SPUs busy.
If anything, we should be doing more of this... create dedicated academic R&D funding streams by taxing established or dying industries in order to publicly explore new fields, and use X Prize-style programs or NASA's COTS/CRS programs to incentivize private commercialization. America needs jobs, what's better than creating new industries?
Plus, public investment in R&D has a good track record. This study from 1980 [1] indicates a $17 return over 18 years for every $1 invested in NASA - 1700% ROI. Returns depend on programs and the administration in office, but similar numbers can be obtained from other programs (NOAA and agricultural, military and medical research, etc). R&D is speculative, but most programs do much better than breaking even.
Versus the taxpayer-funded I.P. being licensed to all parties in the country that funded it with the big corporations being forced to compete on better implementations delivered faster to customers. Big players can try to make billions, small players can innovate on top of same I.P. without a lawsuit, and individuals can even attempt to homebrew their own without a lawsuit.
The stuff that DARPA gets right is building stealth planes and prosthetics, arpa-net was a one-off exception in their history.
Mars Andreesen started off working on an NSF grant at NCSA. One of those things Al Gore encouraged (when people misquote him as inventing the internet)
Edit: also why web browsers are free. Netscape had to compete with a free browser.
Seriously, it's a graph processor. Graph analysis is basically the entire job of a modern intelligence agency.
There's a cold-war level arms race going on in cybersecurity. Russia embarrassed us with their sophisticated cyberwarfare capability in the last election -- they were able to infiltrate both campaigns AND the FBI (they used false information to manipulate Comey into making a statement -- which required them to know how conflicted he was over interference in the Clinton e-mail thing), and undoubtedly were behind the news cycles in the months before the election.
Better/more condensed graph analysis capability, at the scale that the three-letter-agencies use it? That's a strategic advantage. You can bet the Russians are working on something similar. If they haven't already -- throughout the cold war they tended to push the frontier of technology faster than the US, but had trouble mobilizing that advantage because communism was so damn inefficient.
I'm still unclear how "sophisticated" these attacks were. Per the unclassified information, all I've heard mentioned is:
1. social media manipulation
2. traditional media manipulation (RT)
3. email spearphishing
None of the above strikes me as particularly sophisticated, even if it's potentially effective. (I don't know if it was effective or not in the case of the election.)
Of course, I have no idea what the classified information says regarding any of this.
Source?
Sidenote: Comey strongly implied (damn near said it really) in his testimony before the Senate Intelligence Committee yesterday that this did not actually happen and the news articles were way off. Makes you wonder what did actually happen.
I missed this exchange in the livestream, relevant bit:
"BURR: Were there other things that contributed to that that you can describe in an open session?
COMEY: There were other things that contributed to that. One significant item I can’t, I know the committee’s been briefed on. There’s been some public accounts of it, which are nonsense, but I understand the committee’s been briefed on the classified facts."
We obviously don't know what the classified facts are.
This also seems to be in reference to the announcements in late summer vs the disclosure right before the election which I haven't found anything in the transcript about.
Full Transcript: https://www.nytimes.com/2017/06/08/us/politics/senate-hearin...
(Aside: linking to NY times because it was slightly less ad heavy than other sites I could find. Couldn't find a single one without auto-play video...)
What? Not so good idea anymore? It's funny how ironic and shortsighted these HN commies are.
The hilarious part is that you don't appear to be complaining about the government funding of research part.
So we're reaching the conclusion that a capitalist economy combined with a functional government allocating resources to projects that hold the promise of long term potential but not short term financial gains is a good system? That both capitalism and government activity play important roles that each is better at filling than the other?
And we're also realizing that a debate which reduces to 'capitalist vs communist' is idiotic? Wow. I'm proud of us.
