How would one get these 16x16 images generated in a way that does not need a lot more compute power than the inference itself? Maybe by using a sensor from an optical mouse which seems to have a similar resolution? [0] According to a quick web-search, the CH32V003 seems to support SPI and I²C out of the box [1] which the mentioned sensor supports?
IO does tend to take considerable resources/power. In fact it is one of the reasons it is desirable to run ML as close to the sensor as possible. It allows to extract and transmit onwards just the information of interest (usually very low bitrate) instead of raw sensor data. Especially important on wireless and battery.
One area where very low resolution images are used is in 3d and IR sensing. For example a 8x8 depth image from a time of flight sensor like ST VL53L5CX. Could be mounted say household and detect for example human vs pet vs static object. Though the sensor is the expensive part, so one would probably afford a larger microcontroller :D
What would one do with such a system?
[0] https://pickandplace.wordpress.com/2012/05/16/2d-positioning...
[1] https://www.wch-ic.com/products/CH32V003.html