> Another architecture that would work is to stream the encoded video from the encoding machines to other machines to decode and inspect. That would work as well. And again avoid the inefficiencies with saving and passing around individual images.
No, that’s still a bad architecture. Bandwidth within AWS may be “free” within the same AZ, but it’s very limited. Until you get to very very large instance types, you max out at 30 Gbps instance networking, and even the largest types only hit 200 Gbps. A single 1080p uncompressed stream is 3 Gbps or so. There is no way you can effectively use any of the large M7g instances to decode and stream uncompressed video.(Maybe the very smallest, but that has its own issues.)
In contrast, if you decode and process the data on the same machine, you can very easily fit enough buffers in memory, getting the full memory bandwidth, which is more like 1Tbps. If you can process partial frames so you never write whole frames to memory, you can live in cache for even more bandwidth and improved multi core scalability.
Ah. I was thinking that the encoding machines were not bandwidth limited but rather cpu limited as they were doing expensive encoding algorithms. So I was thinking the streams were streaming out at less than real time. I figured this was better than the dual/multi encode method I think they are now relying upon when all the detection code doesn’t fit on the same machine as the encoder.
No, that’s still a bad architecture. Bandwidth within AWS may be “free” within the same AZ, but it’s very limited. Until you get to very very large instance types, you max out at 30 Gbps instance networking, and even the largest types only hit 200 Gbps. A single 1080p uncompressed stream is 3 Gbps or so. There is no way you can effectively use any of the large M7g instances to decode and stream uncompressed video.(Maybe the very smallest, but that has its own issues.)
In contrast, if you decode and process the data on the same machine, you can very easily fit enough buffers in memory, getting the full memory bandwidth, which is more like 1Tbps. If you can process partial frames so you never write whole frames to memory, you can live in cache for even more bandwidth and improved multi core scalability.