You take a video stream and compress it down into a timestamped stream of IDs. It's really lossy, but it's the same as OCR or Speech-to-Text -- it is a tool that allows us to better handle large streams of data.
As always, the tool isn't the problem. It's the use of the tool.
(But I think we all know that.)