I can't imagine that there arent already some Palantir-like efforts to accomplish this.
Imagine a REALLY good zoom lens on a very small drone that can not be seen/heard by a target and that drone is doing something like this to gain info.
Imagine the same zooming through windows as well.
This will be the next big ML-military step towards Total Information Awareness taken, if its not already available in the wild.
Frequency attenuation + sub-pixel color profiling means you don't even need an expensive camera in a lot of cases.
Get a plastic cup of water or similar object, put it on someone's desk, record video from far away, combine with something like this  and you've got a very interesting avenue for corporate espionage. If you could reconstruct typed passwords from the object, it's a really powerful technique.
Also, this is only tangentially related, but you can also see through walls using WiFi:
Doesn't deep learning imply training on sample result?
edit: now I see it is being used to match audio samples, not to generate text so it wouldn't create an independent value from the audio in this arrangement. Other than i.e. speaker attribution which they mentioned.