> The Vanilla-5 cores are a 5-stage in-order pipeline RV32IM cores so they support the integer and multiply extensions
So, roughly comparable to a high-speed Cortex M.
> instead of using caches, the entire memory address space is mapped across all the nodes in the network using a 32-bit address scheme. This approach, which also means no virtualization or translation, simplifies the design a great deal.
The diagram shows each core has icache and dcache; what they've ditched is cache coherency. That certainly makes it simpler to implement but now the cores have to be responsible for their own coherency. Also, none of your protected mode operating system nonsense - this is designed to run a single program and get everything out of the way. Every core can potentially overwrite any other core's memory, and if it does so you won't know until you have a cache miss. Good luck figuring that one out in the debugger.
This is very clearly intended for the sort of AI or image processing workload where you can clearly partition it two-dimensionally across the array to identical nodes, and then have those nodes collaborate locally by passing messages across the edges.
This is not quite true: the local data memories are not caches, i.e., they do not implicitly move memory in from a more distant tier in the memory hierarchy. They are just plain explicitly managed local memories (sometimes called "scratchpads" to distinguish them from caches).
Like the quote says: “There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.”
It is kind of surprising that that is acceptable. I suppose the usual case is that each core already knows what its neigbors will want to see, and sends those values before each neighbor needs them.
Yeah, it's not even eventual consistency. They get away with it by using a heavy mailbox message passing setup for synchronization and separate address spaces.
If this is the case then I assume I'd just need to design-wise restrict one processor from writing to another's area, yes?
I've worked on worse. Quite promising set of ideas from my limited reading.
The question is whether it's worthwhile to adapt that new programming model and architecture? What can be achieved that is not possible or is price compatible with something boring and old-fashioned like 128 x64 cores etc.
I am all for new architectures. But the trick is getting people to make the effort to make use of them.
I think that when I write a good comment it gets buried and the only time comments are really popular is when I write something trite or silly that a lot of people are already thinking.
probably more to do with the gamification of user generated content. you see the same thing on imgur and reddit: the well thoughtout comments don't get up voted nearly as much as the memes and injokes.
>I am all for new architectures. But the trick is getting people to make the effort to make use of them.
IIRC at the time there were a lot of complaints against the Cell processor that it was "too hard" to program for. >https://www.cnet.com/news/sony-ps3-is-hard-to-develop-for-on...
>What can be achieved that is not possible or is price compatible with something boring and old-fashioned like 128 x64 cores etc
That's probably the main issue today, x86/64 are cheap and nearly all the problems can be fixed in software. Changing architectures and instruction set is too big of an upfront cost for most people/companies to deal with, and I think that is why we're only seeing Google and Amazon starting to look for other solutions.
Indeed. Interestingly GPUs were initially very hard to program (and still are kind of a pain). What made them viable was the introduction of practical development tools that Cell never had. This machine shares (some of the) instruction set between the fat and the puny cores, which makes it a much easier target to program.
If it's going to be compared to something though, I think a GPU would make much more sense.