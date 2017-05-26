Hacker News new | comments | show | ask | jobs | submit login
Darpa Funds Development of New Type of Processor (eetimes.com)
199 points by farseer 9 months ago | hide | past | web | favorite | 97 comments



Graph processing machines are not new. The signal-to-noise ratio of this article is so low that I can't tell how this architecture differs from e.g. the Cray Eldorado. Or the Connection Machine for that matter.


There is a real use-case here that has become so widespread and common in the intelligence and law enforcement community that they saw utility in having a processor optimized for graph analysis. They likely see some uses for it outside of that market as well which is why both Intel and Qualcomm are interested in working on it. Even if it is a variation on 'Threadstorm' it's still optimized for this particular type of data processing.

I read the whole article and I don't see anywhere in this claiming it is a new idea. But certainly there is no other processor on the market like this one. So it fits the category of "new type of processor", even if it's for a dedicated use-case - just like those ML optimized processors.

There's a big difference between knowing it is theoretically possible and the value you get from a real world implementation with real users. That sounds pretty news worthy to me.


The signal-to-noise ratio of this article is so low

I started reading the comments before I read the article. And, sadly, once I saw your comment I knew exactly who wrote the article. One click and confirmed!


What does the memory interface look like for this? None of the papers I can find indicate how you can get so much parallelism out of the CPU <-> memory interface.

An interesting approach to non-Von Neumann computing is to put ALUs in memory, to take advantage of the fact that DRAMs have far more internal bandwidth than what is exposed in traditional systems: http://researcher.ibm.com/researcher/files/us-leejinho/tvlsi....


Manufacturing logic on the same wafer as DRAM is difficult since the processes are so different. On the other hand, manufacturing on separate dies and connecting with an interposer or TSVs gets you tremendous bandwidth and relatively low latency; this is how some newer generation graphics memory is implemented (see https://en.m.wikipedia.org/wiki/High_Bandwidth_Memory).


CPU logic on DRAM process is apparently a solved problem: http://venraytechnology.com/Implementations.htm


You can make logic on a DRAM process, but either your logic will be slow or you'll have to increase the cost of your cost/power consumption of your DRAM cells. With separate chips connected with TSVs you can have your cake and eat it too (each process can be specialized for it's components) plus you get the added benefit of increased yields (chips are smaller, no special processing steps).


That's kind of the entire point of Venray's claims: The CPUs are both plentiful and fast, and with far lower power consumption than computationally equivalent strength discrete CPU + RAM combos. They apparently only add a couple percent to the die size of the DRAMs.

However, their business model presumes "Wait until a DRAM manufacturer buys us", which IMO is why nothing's moved forward. DRAM manufacture is low-margin and not really the place to look for this kind of risky introduction to the market. I'd love to see this form of parallelism, and their take on breaking the memory bandwidth wall; it meshes great with the types of problems I work on.


The point of my post was there's not much upside that integrated DRAM/logic has that TSVs don't, but plenty of downsides. Regardless of Venray's claims, there's a reason modern high performance parallel architectures go with TSVs and interposers (Knights landing, new GPUs, some deep learning platforms, etc...) instead of logic in DRAM.


Part of that is simply silicon design inertia, though.

Do you see interposer style designs as linking up terabytes of DRAM? (at least in the near future) All the chips you're talking about are pretty major dies, not really suitable for having many stacks of them in conventionally tightly spaced DIMM arrays to reach such RAM sizes.

Of course, 3d chip advances might throw all current assumptions out the window and change the layout of everything.


> Do you see interposer style designs as linking up terabytes of DRAM?

I don't think we're going to see a terabyte of dram on an interposer for a while (4GB is about the max you can get commercially right now). I'm not sure what you're trying to get at though; even with logic in DRAM you have to go off chip to get to terabyte levels, so I don't see the advantage.

> All the chips you're talking about are pretty major dies, not really suitable for having many stacks of them in conventionally tightly spaced DIMM arrays to reach such RAM sizes.

The stacking happens in package (<1mm thick). Your DIMM array is going to have to be pretty damn tight for that to matter.

> Of course, 3d chip advances might throw all current assumptions out the window and change the layout of everything.

TSVs are 3D (or "2.5" depending on the configuration). You should have thrown out the assumptions back in 2014.


> I'm not sure what you're trying to get at though; even with logic in DRAM you have to go off chip to get to terabyte levels, so I don't see the advantage.

Many-core processors with low-latency wide-bus on-chip random access speeds need to scale horizontally as well. Focusing on large chips means you're not going to have very many on a single motherboard, where QPI/HyperTransport/memory-bus style communication can achieve higher and more user-transparent shared memory access performance, compared to offboard communication networking.

The "stacks" I was talking about are just the rows of DIMM slots stacked together in tight proximity, compared to the number of CPUs/GPUs/etc per unit area on a multi-socket motherboard to achieve the same memory footprint. (easily apples and oranges in the current incarnations, admittedly, but focusing on end-user expandability and configuration options)

In my opinion, this type of on-chip fast-RAM model in larger memory systems would best take advantage of splitting up processing to where the memory is, as opposed to a fatter node model, especially when it comes to physical size and inter-chip communication of many chips.

However, if we soon have many-core chips with 32 parallel memory buses leading to in-package 256GB DRAM silicon, it does become more moot.

Yes, I know that 3d silicon stacking, HBM, etc exist now. While they've had some good speed & power advantages, they remain very limited in terms of memory footprint. And of course, the memory size is fixed per such a chip, and there doesn't seem to be a path for many-chip expansion solutions for anything but the top-end enterprise market. I think the Venray model has a simplicity and expandability that keeps the most advantageous tradeoffs.


Okay, I think we're talking past each other at this point so let me be as clear as possible: the original comparison was between TSVs and logic in DRAM. Both of these are a way to get DRAM on chip and as physically close to the core logic as possible. Logic in memory is on die, while TSVs are on package; neither can be "extended" by an end-user without connecting off chip. Neither changes the physical package size very much (TSVs are not intrinsically bigger than logic in DRAM). Both have nothing to do with off chip connections; as soon as you start talking about things happening off package they behave identically (grids of processor/DRAM combos can be done with either in exactly the same way). Any chip you like can have TSVs (many core, single core, big, small, whatever); there's​ no architecture that logic in DRAM can have that TSVs can't. Both can be used to "split up processing to where the memory is". Neither has to be a "fat node".

So with that out of the way, what exactly is the advantage of logic in memory? Because so far nothing you have described is actually an intrinsic advantage.


Right, it becomes less about individual chips' TSVs vs logic in DRAM, and more about the scalability of the architecture. In the marketplace right now, the trend for these TSV/interposer/multi-die sorts of devices is in "fat node" designs, instead of more on-board distributed designs.

Logic on DRAM should be simpler & cheaper, which would in the long tail lend itself to more horizontal scaling (and horizontal scaling is currently required to get large memory footprints economically). More elaborate & expensive designs would end up more in fat node designs. There's really no technical difference when looking at many-chip architectures as the chip package is a black box at that level, but it's more an economic one.


But logic and SRAM are more compatible. I'm still curious because multiple processors having random access to a large memory is not easy. The scratchpad they mention sounds like a cache, but maybe it's something more than that.


'scratchpad' implies they are directly addressable, i guess you could say a software controlled cache.


> What does the memory interface look like for this?

Yeah thats the big black box titled "magic happens here" in their diagram. Maybe something like HBM2?


there was a great architecture proposal from Thomas Sterling and the old MTA folks that matched the hardware thread context to the DRAM row size.

which effectively extended the MTA thread pool to...infinity


Is it just me or is the illustration of 'graphs' of totally the wrong kind of graphs? I'm sure they don't mean bar charts.


It didn't even dawn on me that that figure was supposed to represent graphs until I read your comment (and then the caption), I had just assumed it was supposed to represent Big Data.


Yeah, they missed including a pie chart and a gantt chart for good measure.


Most of the industry has moved on to Deep Pie Charts.


FYI, they're called 'chicago charts' for those in the know...


Yeah, a bit embarrassing for a magazine that covers technology, HIVE defintely works on graph-theory graphs. Even weirder is the (Source: DARPA) attribution.


Also they say the processor will use "1000 times less energy". Incredibly confused thinking goes into writing a sentence like that.


Maybe an Igon Value Problem?

http://rationalwiki.org/wiki/Igon_Value_Problem


That would make sense to me.

I had to stop and chuckle when I saw the author put "[John] von Neumann architecture..." when quoting somebody who was simply talking about the industry standard term of "von Neumann architecture".

What's next? "[Alan] Turing machines"?


Its hard to display 2^n dimensions in a graph.


How is this different than FORTH processors, like

http://www.greenarraychips.com/

http://www.forth.org/cores.html

http://www.ultratechnology.com/chips.htm

https://www.complang.tuwien.ac.at/anton/euroforth2007/papers...

http://wiki.c2.com/?ChuckMoore

etc.


Those do stack based computation?


They they are parallel chips with many cores (128 IIRC) with small amounts of 'stack' RAM running Forth directly. Chuck designs them 'by hand' using Color Forth, a transistor simulation model and ... Forth words.

Similar to the machine found in TIS-100:

http://www.zachtronics.com/tis-100/


exactly, the chips in the OP are not stack-based, are they?


Probably.


Yes?


More about this processor here http://www.darpa.mil/attachments/BAA-16-52_HIVE_FAQ_20160831...


The way that they describe sparse graph processing in memory sounds like the kind of pointer chasing that makes run-time object-oriented programming memory access patterns slow.

I wonder if that was a translation to PR artifact, or if there might be something there to accelerate some of the Java or .Net memory access patterns that we all use.


There is a very direct sense in which pointer chasing _is_ the fundamental operation of sparse graph processing.

On top of what access patterns the developers tend to use, there's always the JVM garbage collector (the bane of efficiency) which runs a basic graph algorithm over the entire program's network of pointers. Although, I suspect in many applications the graph in question is small (by comparison to big-data-scale graphs) and throwing heavy machinery like this at it would be overkill.

Then again, maybe I'm not dreaming big enough, and this kind of processor will make the need for cache line locality optimizations, careful instruction scheduling around memory I/O, and half-second freezes for GC a thing of the past?


"Darpa Funds Development of New Type of Processor: Worlds First Non-Von-Neumann "

https://en.wikipedia.org/wiki/Harvard_architecture ?


https://en.wikipedia.org/wiki/Dataflow_architecture ?

cheap clickbait title.


Was my thought as well "What? AVR are harvard architecture and I can buy them for 0.1$". But after looking, apparently it is more linked to new parallelism paradigms:

> "This non-von-Neumann approach allows one big map that can be accessed by many processors at the same time, each using its own local scratch-pad memory while simultaneously performing scatter-and-gather operations across global memory."


processors with cheap and limited local memory and costly and scalable external memory are also not new....and that doesn't really make them non-von-neuman except that it doesn't make sense to execute out of global memory


And any compute fabric.


I'm happy to see that a group from Georgia Tech is working on this. Actually, my group just sent in a proposal for the DARPA SSITH[1] program. The high-level goal of SSITH is to design low-level (firmware or hardware) protection techniques that can guard against common software vulnerabilities that lead to hardware exploitation.

[1]: http://www.darpa.mil/news-events/2017-04-10


My guess at what's driving this need, is for intelligence agencies to make sense of and traverse the large and complicated graphs that are used to map real world relationships.

As they collect more data related to this, they'll need better ways to traverse these graphs.


I can't tell if the processing in TFA is the kind of graph processing that a machine like TIGRE or SKIM were proposed to do in the '80's. Or perhaps the graph nodes are less specialized than these machines?


Sounds a lot like the Cell in concept.

https://en.wikipedia.org/wiki/Cell_(microprocessor)


The SPUs on the Cell were closer in concept to DSPs than they are to what the article describes. The Cell chip itself was essentially a general-purpose CPU joined in silicon to several DSPs.

The SPUs are still designed for sequential processing of memory, just smaller, discrete blocks. The whole chip is orchestrated by a standard von Neumann processor anyway, so that acts as a bottleneck to keeping the SPUs busy.


What's the "community detection" benchmark referenced in the article?


From a layman: Would this necessitate a different programming paradigm?


Yes, it's mentioned on page 2 of the article (you may have missed it was two-paged)


Thanks! I did miss that page. But even though they mention "... calls for the development of software tools to help programming the new architecture...", I'm still unsure as to how programming for such a processor might look like.


Perhaps it will make trees faster than hashmaps? Linked lists better than arrays?


Yeah, that's a design choice I just don't understand. The article is already lengthy, what improvement does splitting it to pages manually give? I completely missed the fact that it's multiple pages.


In the old days this technique was used to get information about number of people who found the article interesting enough to read the next page, all without JavaScript. This of course assumes that readers notice the "next page" link, where is where they seem to have failed (or maybe the expectations of readers have changed so much that the new breed of readers no longer notice such links - quite possible).


It was used for ad impressions. Each page read = new ads = more $$ for the publisher.


For better or worse, it also improves analytics for the publisher— they can measure what proportion of readers made it to page 2, and of those, how long they spent on page 1.

(Yes, some of that could be done with scroll tracking too, "clicked to view page 2" is still a very strong signal of engagement.)


The civilian application proposed in the article, mapping the many to many relationships between amazon purchasers and items purchased, is unsettling.


That doesn't need anything special in terms of hardware so it's a bad example, current hardware is well capable of making those connections.


OSSed last year, silly example...

https://github.com/amzn/amazon-dsstne


Can't wait for this publicly funded research to make some private corporation billions!


Versus, what? No public funding? No private utilization of public research? Public research should fall in the public domain, free of IP and open for any use.

If anything, we should be doing more of this... create dedicated academic R&D funding streams by taxing established or dying industries in order to publicly explore new fields, and use X Prize-style programs or NASA's COTS/CRS programs to incentivize private commercialization. America needs jobs, what's better than creating new industries?

Plus, public investment in R&D has a good track record. This study from 1980 [1] indicates a $17 return over 18 years for every $1 invested in NASA - 1700% ROI. Returns depend on programs and the administration in office, but similar numbers can be obtained from other programs (NOAA and agricultural, military and medical research, etc). R&D is speculative, but most programs do much better than breaking even.

[1] https://er.jsc.nasa.gov/seh/economics.html


"Versus, what? No public funding? No private utilization of public research?"

Versus the taxpayer-funded I.P. being licensed to all parties in the country that funded it with the big corporations being forced to compete on better implementations delivered faster to customers. Big players can try to make billions, small players can innovate on top of same I.P. without a lawsuit, and individuals can even attempt to homebrew their own without a lawsuit.


There is no sign in the article that only Intel and Qualcomm will have rights to the new design.


It's private companies doing the wofk with a track record of patenting and suing over their tech. Quite a bit of tech funded through NSF and DARPA gets patented. So, it should be assumed and mitigated.


You mean like the internet? Or self driving cars?


The original ARPANET was a proof of concept of a network which could withstand the failure multiple simultaneous nodes, that is, survive a nuclear strike. Whether knowing you have a robust command and control network makes you more or less likely to use your own nuclear weapons in a first strike is arguable. But it did guarantee a second strike would be possible (which if you believe MAD, made the world safer.) And robust second-strike capability, in the end, is about killing people.


or government protected at&t bell labs which invented​ everything we do!


The "Internet" would've happened regardless, Xerox's Parc Project and their "Parc Universal Packet" is one example.

[0] https://en.wikipedia.org/wiki/PARC_Universal_Packet


Not sure if sarcastic or sincere. You can damn near predict what will be the future tech 10 years down the line based on what DARPA is doing to make killing more efficient.


Have you actually been to a DARPA kickoff? I guarantee you that 90-95% of the stuff that they are funding will not be broadly useful in 10 years.

The stuff that DARPA gets right is building stealth planes and prosthetics, arpa-net was a one-off exception in their history.


does killing the RIGHT people count as efficient?


That's just a formality. The implementation is trivial and left to the reader of any good DARPA project as an exercise.


Killing the wrong people isn't DARPA's problem, we have bombs that we can drop out of a plane and down a chimney. Blame the elected officials that ordered the bomb dropped in the first place and the intelligence apparatus that convinced them it was good idea.


And the venal election system that put them in charge, and the social order that creates vast power and wealth imbalances.


Like web browsers?

Mars Andreesen started off working on an NSF grant at NCSA. One of those things Al Gore encouraged (when people misquote him as inventing the internet)

Edit: also why web browsers are free. Netscape had to compete with a free browser.


You mean that private corporations can somehow just sit there and collect rent from this publicly funded research? Amazing!


"Intel [assuming Intel wins] will also have the rights to productize versions for the worldwide market." This means that Intel gets exclusive license for the tech they didn't pay for, and no one else can use it even though all taxpayers funded it.


No way; this is the kind of thing that can become a state-secret level advantage.

Seriously, it's a graph processor. Graph analysis is basically the entire job of a modern intelligence agency.

There's a cold-war level arms race going on in cybersecurity. Russia embarrassed us with their sophisticated cyberwarfare capability in the last election -- they were able to infiltrate both campaigns AND the FBI (they used false information to manipulate Comey into making a statement -- which required them to know how conflicted he was over interference in the Clinton e-mail thing), and undoubtedly were behind the news cycles in the months before the election.

Better/more condensed graph analysis capability, at the scale that the three-letter-agencies use it? That's a strategic advantage. You can bet the Russians are working on something similar. If they haven't already -- throughout the cold war they tended to push the frontier of technology faster than the US, but had trouble mobilizing that advantage because communism was so damn inefficient.


> Russia embarrassed us with their sophisticated cyberwarfare capability in the last election

I'm still unclear how "sophisticated" these attacks were. Per the unclassified information, all I've heard mentioned is:

1. social media manipulation

2. traditional media manipulation (RT)

3. email spearphishing

None of the above strikes me as particularly sophisticated, even if it's potentially effective. (I don't know if it was effective or not in the case of the election.)

Of course, I have no idea what the classified information says regarding any of this.


Sophistication means technical complexity, but also scale. Certainly social-media manipulation requires a large-scale approach to make it feasible.


> they used false information to manipulate Comey into making a statement -- which required them to know how conflicted he was over interference in the Clinton e-mail thing

Source?


> they used false information to manipulate Comey into making a statement -- which required them to know how conflicted he was over interference in the Clinton e-mail thing

Sidenote: Comey strongly implied (damn near said it really) in his testimony before the Senate Intelligence Committee yesterday that this did not actually happen and the news articles were way off. Makes you wonder what did actually happen.


I think the comment is referring to this story: http://www.cnn.com/2017/05/26/politics/james-comey-fbi-inves...

I missed this exchange in the livestream, relevant bit: "BURR: Were there other things that contributed to that that you can describe in an open session?

COMEY: There were other things that contributed to that. One significant item I can’t, I know the committee’s been briefed on. There’s been some public accounts of it, which are nonsense, but I understand the committee’s been briefed on the classified facts."

We obviously don't know what the classified facts are. This also seems to be in reference to the announcements in late summer vs the disclosure right before the election which I haven't found anything in the transcript about.

Full Transcript: https://www.nytimes.com/2017/06/08/us/politics/senate-hearin... (Aside: linking to NY times because it was slightly less ad heavy than other sites I could find. Couldn't find a single one without auto-play video...)


That's actually one of the most interesting things I've heard to come out of this. Not sure why I haven't read more about this until now.


If it did happen, would it be in his interest to admit it?


Would he even know it (without the Russians coming out and saying so)?


This is DARPA not NSF. It is meant to make someone rich, so they can supply what is under research.


Maybe the money taxed from that said corporation could be channeled to pay for universal income.


Yeah, we could take the money from a corporation that is creating jobs and value right now, and use it to create incentives for people to never figure out how to be productive citizens because hey, there's no jobs.


The way things are now, that corporation is probably creating jobs in china, india or SEA and storing its "value" in Ireland. Just to sidetrack for a second, I know we're still quite far from ai taking away jobs from the software industry, but oh boy when that day comes I won't feel one bit sorry for you. It doesn't take much to fall through the cracks in society and I hope you learn this the hard way.


So you start with false hyperbole before moving onto vengeful and mean-spirited hopes for my future, all while presumably trying to capture the moral high ground because to you all problems can be solved with high taxes. I don't think you are alone and we are probably doomed. Because when your system doesn't work, people will simply double-down on the same ideas and make it worse.


Someone stem the tide against these anti-capitalists! They are everywhere!


If it annoys you that much, stop using internet, throw away your laptop and pretty much everything else in your life. No need to support these "evil private corporations". Yeah, really. I'm sure you can do it.

What? Not so good idea anymore? It's funny how ironic and shortsighted these HN commies are.


> It's funny how ironic and shortsighted these HN commies are.

The hilarious part is that you don't appear to be complaining about the government funding of research part.


Government funded research played a role in the development of the Internet. Private enterprise also played a huge role.

So we're reaching the conclusion that a capitalist economy combined with a functional government allocating resources to projects that hold the promise of long term potential but not short term financial gains is a good system? That both capitalism and government activity play important roles that each is better at filling than the other?

And we're also realizing that a debate which reduces to 'capitalist vs communist' is idiotic? Wow. I'm proud of us.


Just keep Sarah Connor away from the lab:

https://gndn.files.wordpress.com/2016/04/shot00341.jpg


Worlds first, my ass. Journalism is dead; has been for a long, long time; we're living in the end-times of the walking dead.




