spkrec-ecapa-voxceleb
简介
Speaker Verification with ECAPA-TDNN embeddings on Voxceleb
模型卡片
模型配置
模型详情
已翻译基于Voxceleb的ECAPA-TDNN embedding说话人确认
本仓库提供使用SpeechBrain基于预训练ECAPA-TDNN模型进行说话人确认所需的全部工具。
该系统也可用于提取说话人embedding。
模型基于Voxceleb 1+ Voxceleb2训练数据进行训练。
为获得更好体验,我们建议您进一步了解
SpeechBrain。模型在Voxceleb1测试集(清洗版)上的性能表现如下:
| 发布版本 | EER(%) |
|---|---|
| 21-03-05 | 0.80 |
流程说明
该系统由ECAPA-TDNN模型构成,是卷积块与残差块的组合。embedding通过注意力统计池化提取,采用加性角度间隔Softmax损失函数进行训练。说话人确认通过计算说话人embedding之间的余弦距离实现。
安装SpeechBrain
首先,请使用以下命令安装SpeechBrain:
pip install git+https://github.com/speechbrain/speechbrain.git@develop
请注意,我们建议您阅读教程并进一步了解
SpeechBrain。
计算您的说话人embedding
import torchaudio
from speechbrain.inference.speaker import EncoderClassifier
classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb")
signal, fs =torchaudio.load('tests/samples/ASR/spk1_snt1.wav')
embeddings = classifier.encode_batch(signal)
该系统使用16kHz采样率(单声道)的录音进行训练。
调用classify_file时,代码将自动对音频进行归一化处理(即重采样+单声道选择)。若使用encode_batch和classify_batch,请确保输入tensor符合预期的采样率。
执行说话人确认
from speechbrain.inference.speaker import SpeakerRecognition
verification = SpeakerRecognition.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb", savedir="pretrained_models/spkrec-ecapa-voxceleb")
score, prediction = verification.verify_files("tests/samples/ASR/spk1_snt1.wav", "tests/samples/ASR/spk2_snt1.wav") # Different Speakers
score, prediction = verification.verify_files("tests/samples/ASR/spk1_snt1.wav", "tests/samples/ASR/spk1_snt2.wav") # Same Speaker
如果输入的两个信号来自同一说话人,预测结果为1,否则为0。
在GPU上进行推理
要在GPU上进行推理,请在调用from_hparams方法时添加run_opts={"device":"cuda"}。
训练
该模型使用SpeechBrain(aa018540版本)进行训练。
如需从头训练,请按以下步骤操作:
1. 克隆SpeechBrain:
git clone https://github.com/speechbrain/speechbrain/
- 安装:
cd speechbrain
pip install -r requirements.txt
pip install -e .
- 运行训练:
cd recipes/VoxCeleb/SpeakerRec
python train_speaker_embeddings.py hparams/train_ecapa_tdnn.yaml --data_folder=your_data_folder
您可以在此处找到我们的训练结果(模型、日志等)。
局限性
SpeechBrain团队不对该模型在其他数据集上使用时的性能提供任何保证。
引用ECAPA-TDNN
@inproceedings{DBLP:conf/interspeech/DesplanquesTD20,
author = {Brecht Desplanques and
Jenthe Thienpondt and
Kris Demuynck},
editor = {Helen Meng and
Bo Xu and
Thomas Fang Zheng},
title = {{ECAPA-TDNN:} Emphasized Channel Attention, Propagation and Aggregation
in {TDNN} Based Speaker Verification},
booktitle = {Interspeech 2020},
pages = {3830--3834},
publisher = {{ISCA}},
year = {2020},
}
引用SpeechBrain
如果您将SpeechBrain用于研究或商业用途,请引用它。
@misc{speechbrain,
title={{SpeechBrain}: A General-Purpose Speech Toolkit},
author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
year={2021},
eprint={2106.04624},
archivePrefix={arXiv},
primaryClass={eess.AS},
note={arXiv:2106.04624}
}
关于SpeechBrain
- 网站:https://speechbrain.github.io/
- 代码:https://github.com/speechbrain/speechbrain/
- HuggingFace:https://huggingface.co/speechbrain/
正在翻译中,请稍候...