A guy with dark hair in front of a white wall? I could luma key that in 10 seconds. The book example is more interesting, but there you can already see a bit of chatter (which might have to do with compression and noise tho).
In your defense you probably aim at a different target audience than people like me.
I just purchased a very basic greenscreen and 3 point lighting kit for about $120 on Amazon.
There are dozens of techniques of varying success that have been developed over the course of a decade and a half. My guess is that this is taking some more common implementation like 'closed form matting' and putting it on a server with ffmpeg. To guess the foreground I would use motion vectors as a starting point.
Also note that an alpha channel doesn't get you all the way there. You have to solve the full matting equation to extract both the foreground and alpha. You can see a bright edge around the hair in the example. The result they show still looks pretty good in general though.
Deep learning is making decades of research obsolete by delivering better results with more generalisation and less time.
Please try again later or contact email@example.com if the problem persists.
Separately, because I'd bet the makers are reading, are there any plans to offer the segmentation models or APIs locally? Was looking for this for the remove.bg product as well.
It uses whatever AI systems they made to single out the foreground objects from the background objects. And then it's basically just taking the camera input, applying filters or transparency and outputting it as a new video device.
I once had a clip that I trimmed off another scene. Only after converting the video file did a frame of that scene come back.
However, simple frame-by-frame segmentation will probably not be enough to get temporal consistency, so for each frame's segmentation they probably take previous and following frames into account.
For a deep learning approach, I would start by looking into literature on semantic segmentation. Here is a blog post I just found which gives an intro: 
With state-of-the-art models (e.g. DeepLabV3) and a good dataset of foreground/background segmentations, the results could be of useful quality already.
The next step would be to look into literature on image matting (e.g. deep image matting ) which instead of trying to classify each pixel as foreground/background, regresses the foreground colour and transparency.
I have some knowledge of creating an OCR program using deep learning from the last online course I took, but this looks like a very different beast and so it would be great fun to learn
The examples are great! I recorded a short video of myself and the processing failed horribly. Whelp.