AFAIK Taalas, the company behind this demo, still only have their initially "hardwarized" model available to test in ChatJimmy, which IIRC is a rather stupid Llama 3ish 8b.
Don't get me wrong though, that demo is still incredibly impressive & makes me very much excited for the hardware-based model era (potentially) ahead.
Once you've experienced those speeds, you really start to think about the whole class of things that becomes possible; massively parallel decode paths, extensive reasoning loops, etc…
Don't get me wrong though, that demo is still incredibly impressive & makes me very much excited for the hardware-based model era (potentially) ahead.
Once you've experienced those speeds, you really start to think about the whole class of things that becomes possible; massively parallel decode paths, extensive reasoning loops, etc…