P.S. We're hiring Chisel developers! If you don't know chisel, but want to learn and have RTL experience, we'd love to have you learn on the job! Check out our website: http://rexcomputing.com
Once we get closer to having silicon in hand, I'd love to publish our experience as both a startup making a new processor in this day and age, along with using Chisel and other new tools.
The NoC is also entirely non blocking... a router is able to read/write to its cores scratchpad and do a passthrough in the same cycle.
1. Neo has a 64 bit core, and conforms the IEEE 754-2008 Floating Point standard... Epiphany is 32 bit, and is not fully IEEE compliant (along with only being capable of single precision FP).
2. The existing Epiphany chips cap out at 32KB of local memory per core (with the Epiphany IV having a total of 2MB of on chip memory), while the planned Neo chip will have 128KB of local memory per core (32MB of on chip memory).
3. Epiphany is limited to using it's 4 eLink (based on ARM's AXI interface) connectors to access the outside world, and would typically be connected to either other Epiphany chips or to its host processor. Each eLink port only supports 1.6GB/s bidirectional traffic, giving a total of 6.4GB/s of aggregate chip bandwidth. For Neo, we have developed a new 96GB/s (bidirectional, 48GB/s each way) interface with either 3 or 4 interfaces per chip, giving an aggregate chip-to-chip bandwidth of 288-384GB/s.
4. Neo can directly address DRAM attached to it, instead of having to go through a host processor.
5. Neo is a Quad issue VLIW core (capable of a 64 bit ALU op, 1 64 bit FPU op/2 32 bit FPU ops, and 2 load/store ops every cycle) compared to Epiphany's standard superscalar core (Capable of 1 32 bit ALU op, 1 32 bit FPU op, and 1 load/store op per cycle).
All of this adds up to actually being a commercially viable (for industry, not hobbyists) processor. Above all, memory bandwidth has been what kills Epiphany and completely prevents it from reaching their advertised performance.