wespeaker-voxceleb-resnet34-LM
pyannote
pyannote-audio
pyannote/wespeaker-voxceleb-resnet34-LM
10,855,696
下载量
320
收藏数
17
浏览量
cc-by-4.0
许可
简介
在生产环境中使用这个开源模型? 考虑切换到 pyannoteAI 以获得更优、更快的选择。
模型卡片
许可协议
cc-by-4.0
数据集
voxceleb
pyannote
pyannote-audio
pyannote-audio-model
wespeaker
audio
voice
speech
speaker
speaker-recognition
speaker-verification
speaker-identification
speaker-embedding
模型详情
已翻译在生产环境中使用这个开源模型?
考虑切换到 pyannoteAI 以获得更好更快的选择。
🎹 wespeaker-voxceleb-resnet34-LM 的封装
该模型需要 pyannote.audio 3.1 或更高版本。
这是 WeSpeaker wespeaker-voxceleb-resnet34-LM 预训练说话人 embedding 模型的封装,用于 pyannote.audio。
基本用法
# instantiate pretrained model
from pyannote.audio import Model
model = Model.from_pretrained("pyannote/wespeaker-voxceleb-resnet34-LM")
from pyannote.audio import Inference
inference = Inference(model, window="whole")
embedding1 = inference("speaker1.wav")
embedding2 = inference("speaker2.wav")
# `embeddingX` is (1 x D) numpy array extracted from the file as a whole.
from scipy.spatial.distance import cdist
distance = cdist(embedding1, embedding2, metric="cosine")[0,0]
# `distance` is a `float` describing how dissimilar speakers 1 and 2 are.
高级用法
在 GPU 上运行
import torch
inference.to(torch.device("cuda"))
embedding = inference("audio.wav")
从片段中提取 embedding
from pyannote.audio import Inference
from pyannote.core import Segment
inference = Inference(model, window="whole")
excerpt = Segment(13.37, 19.81)
embedding = inference.crop("audio.wav", excerpt)
# `embedding` is (1 x D) numpy array extracted from the file excerpt.
使用滑动窗口提取 embeddings
from pyannote.audio import Inference
inference = Inference(model, window="sliding",
duration=3.0, step=1.0)
embeddings = inference("audio.wav")
# `embeddings` is a (N x D) pyannote.core.SlidingWindowFeature
# `embeddings[i]` is the embedding of the ith position of the
# sliding window, i.e. from [i * step, i * step + duration].
许可证
根据 此页面 说明:
WeNet 中的预训练模型遵循其对应数据集的许可证。例如,基于 VoxCeleb 的预训练模型遵循 Creative Commons Attribution 4.0 International License,因为该许可证被用作 VoxCeleb 数据集的许可证,详见 https://mm.kaist.ac.kr/datasets/voxceleb/。
引用
@inproceedings{Wang2023,
title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
booktitle={ICASSP 2023, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2023},
organization={IEEE}
}
@inproceedings{Bredin23,
author={Hervé Bredin},
title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
pages={1983--1987},
doi={10.21437/Interspeech.2023-105}
}
正在翻译中,请稍候...
标签
pyannote
pyannote-audio-model
wespeaker
audio
voice
speech
speaker
speaker-recognition
speaker-verification