Tiny Classifier Circuits: Evolving Accelerators for Tabular Data

UncleOxidant · on March 3, 2023

That's really interesting. Skimmed the paper really quickly, definitely need to do a deeper dive into the paper.

From the Abstract:

Despite Tiny Classifiers being constrained to a few hundred logic gates, we observe no statistically significant difference in prediction performance in comparison to the best-performing ML baseline. When synthesised as a Silicon chip, Tiny Classifiers use 8-56x less area and 4-22x less power. When implemented as an ultra-low cost chip on a flexible substrate (i.e., FlexIC), they occupy 10-75x less area and consume 13-75x less power compared to the most hardware-efficient ML baseline. On an FPGA, Tiny Classifiers consume 3-11x fewer resources.

Seems like this would be a pretty significant result if true.

mock-possum · on March 3, 2023

Wonder what the catch is

detrites · on March 3, 2023

Catch seems to be no detailed implementation/code as it's a private company and so likely a proprietary technology.

Haven't read the whole paper and would be delighted to be wrong.

gnramires · on March 3, 2023

It seems this is using actual circuits, as in logic gates.

Which is interesting, and offers the benefits discussed, but probably not ideal for things like microcontrollers and other embedded devices. I wonder what the results would be for a more general program search, or a more tractable operation graph involving commonly available instructions like *,+,=,>>,<<, etc. (I guess you could use AST symbols directly? I unfortunately know very little about compilers). This would make them a more general version of neural networks.

Since this uses genetic algorithms, which can in principle tackle any kind of structure, I think Turing-completeness (i.e. recurrence and memory) of the programs could yield significantly greater inference capabilities. They do mention flip-flops in the article (which I haven't read completely), I wonder if there is significant recurrence or its just gate buffers. Turing completeness of course opens the gate (no pun intended) for more strange effects and bugs of course (but that's kind of expected of any similar algorithm like neural nets?).

The benefits for constrained environments are interesting, I'd love to try it for something like making a tiny insect-like robot and other fun applications :)

PaulHoule · on March 3, 2023

The future of machine learning, I think, is systems that are a lot more efficient. It’s just nuts that we are encoding linguistic information as float32 (how many of those bits are you really using?) and not much better to be using float16 or float8.

I think we might be seeing the beginning of the end of GPU progress but for inference particularly this kind of few-order-of-magnitude.

gnramires · on March 9, 2023

I agree on f32, although I still think the differential approach will remain dominant for large systems. Formally, that's likely because of training efficiency of gradient-based approaches. More speculatively, I think beliefs in general benefit from having associated values (strongly believing something, versus weakly believing something) which to me suggest an inherent benefit of numeric values inside the networks (see fuzzy logic[1]). Of course, sometimes you're dealing with rules that are less gradual, s.t. boolean logic (or maybe something like fp4) may suffice.

To me the non-differential approach will shine on very constrained environments and open quite a few applications. In the near future we might have very efficient and effective code generation from LLMs which changes the landscape as well.

dimatura · on March 3, 2023

Only skimmed the paper, but I'd say one potential issue is that they show good results in tabular data domains, where it's likely easier to get away with minimal models than in other domains, such as visual/lidar/audio data. And it seems to me those are the domains where it'd actually be more interesting to deploy this kind of model. In either case, there seems to be a tradeoff in that while you're saving space/energy with the hardware model, updating that model after its deployed might be more complicated than with a typical software-only model.

UncleOxidant · on March 3, 2023

> updating that model after its deployed might be more complicated than with a typical software-only model.

Using an FPGA would help here.

UncleOxidant · on March 3, 2023

I think that catch is that these are going to be fairly small classification problems and that they've reported the results where it worked well and we're not seeing the ones where it didn't?

Faint · on March 3, 2023

That you have to have at least FPGA to run it?

PaulHoule · on March 3, 2023

You could build one out with discrete logic.

dimatura · on March 3, 2023

With discrete logic you'd be committed to the model for whatever its lifetime is, which might be acceptable depending on the application. With FPGA you would still be able to update it after it's deployed.