You'd likely lose the following features:
* Access to the camera from your phone
* The ability to talk to people near the camera
* The "people detection" image classification stuff
* The ability to highlight a range of video and get a shareable link for it
A sane-default DNS configuration could get around this automatically. The default can be some managed thing like your-preferred-name.cameraservice.com and if you're particularly adventurous could be camera.yourdomain.com. IPv6 would greatly simplify the NAT complications here. A device that can double as a firewall or talk to one to auto configure would go a long way.
I don't see how the cloud is necessary for this other than slightly simplifying notification. Can't the box at your house just shoot off a text message with a link? A centralized notification service could be used here that contains only a message along the lines of "there is activity at your camera" and the phone app can initiate the stream directly.
How hard is this really? Once the model/algorithm is in place the actual computation is easy right? Do they really have to run in the cloud?
This can be done with either a share-to-youtube link for videos you don't mind making public or simply direct links to your device for small audiences.