>> I have no problem imagining a security camera application needing to monitor quite a few video channels.
As a joke I sometimes tell people the automatic flushing toilets in public bathrooms work by having a little camera monitored by someone in a 3rd world country who remotely flushes as needed, while monitoring a whole lot of video feeds. They usually don't buy it, but will often acknowledge that our world is uncomfortably close to having stuff like become reality.
On the inference accelerator? IIUC, the RAM is just to hold the model and whatever state it needs during a particular inference operation. I'm not an expert on ML but AFAIK 16 GiB is plenty. I suppose it'd also need to hold onto reference frames for the video decoding, but at 1080p with e.g. YUV420 (12 bits per pixel), you can hold a lot of those in 16 GiB. edit: e.g., 4 references for each of the 96 streams would take ~1 GiB.
Even on the host, 16 GiB is fine for say an NVR. They don't need to keep a lot of state in RAM (or for that matter to do a lot of on-CPU computation either). I can run an 8-stream NVR on a Raspberry Pi 2 without on-NVR analytics. That's about its limit because the network and disk are on the same USB2 bus, but there's spare CPU and RAM.
I have models in production that currently monitor ~400 cameras with an addition of 2-3 cameras/month. If it were cheap enough, it would be useful for our use case (Quality Control). We generally pull from cameras roughly 6400 pixels per region of interest, of which one instance may have 4-30 RoIs across N cameras.
Not sure about open information about the models, but from tooling/infra we are running k8s w/ in-house API for image acquisition. Features are defined as x,y coordinates denoting center of a feature, with a pixel count denoting size of rectangle in each direction from center.
16gb RAM / 96 video channels ... I haven't done any of that work but it feels like they expect that "96" not to be fully used in practice.