Show HN: Using 3D Convolutional Neural Networks for Speaker Verification

irsina · on June 25, 2017

We leveraged 3D convolutional architecture for creating the speaker model in order to simultaneously capturing the speech-related and temporal information from the speakers' utterances.

woodson · on June 25, 2017

Have you compared your system to state of the art results on NIST Speaker Recognition Evaluations? I'm asking because the datasets released for these evaluations every two years or so are what have been driving progress in the field for close to two decades, and your dataset is completely unknown in the speaker recognition literature. Not trying to discredit your work, but that's THE obvious data you would use to evaluate anything speaker recognition related and not doing that just raises some flags.

irsina · on June 28, 2017

Thank you so much for your feedback ... You are absolutely correct ... Usually, for publishing, people use any dataset, for example in Google they usually use their own huge dataset for the speaker-dependent scenario and not the NIST. But I accept the comparison using NIST dataset could showcase the work in a better way. Unfortunately, I am not working on that project anymore so I have no time for doing that but I am open to collaboration for this effort if anyone is interested to continue.

skndr · on June 25, 2017

Any hints about future improvements for open set speaker diarization?

irsina · on June 28, 2017

Deep learning is going into that but when? That's obviously another question. Lol!