There is a short sample of audio in the YouTube link. Your video is not even remotely similar in sound. If you could take the audio from the original and extract easy (for humans) to understand speaking, your confidence that this won’t work would be obviously justified. But so far you don’t appear to have even understood how the device works.

