Hello from San Francisco!
I built this tool in 2023 that detects whether or not a voice is AI-generated.
It takes an audio clip of somebody speaking as input, and gives a binary classification ('human' or 'AI') as output.
I tested it tonight (2025) on some ElevenLabs clips and it still works!
I built it using a fairly simple Convolutional Neural Network (CNN).
Essentially, we pre-process the audio to produce a Mel Spectrogram, then we use the CNN to do image classification on the spectrogram.
The Jupyter notebook file that I wrote to train the model is in the 'model' dir, but if you wanna just use the tool, there's a python script in the root directory of the project.
I trained the model on a Paperspace (acquired by DigitalOcean) cloud server with one GPU.
Check it out!
Thanks,
Zuri Obozuwa