模型库 / pyannote/wespeaker-voxceleb-resnet34-LM

wespeaker-voxceleb-resnet34-LM

pyannote pyannote-audio
pyannote/wespeaker-voxceleb-resnet34-LM
10,855,696
下载量
320
收藏数
17
浏览量
cc-by-4.0
许可

简介

在生产环境中使用这个开源模型? 考虑切换到 pyannoteAI 以获得更优、更快的选择。

模型卡片

许可协议 cc-by-4.0
数据集
voxceleb
pyannote pyannote-audio pyannote-audio-model wespeaker audio voice speech speaker speaker-recognition speaker-verification speaker-identification speaker-embedding

模型详情

已翻译

在生产环境中使用这个开源模型?
考虑切换到 pyannoteAI 以获得更好更快的选择。

🎹 wespeaker-voxceleb-resnet34-LM 的封装

该模型需要 pyannote.audio 3.1 或更高版本。

这是 WeSpeaker wespeaker-voxceleb-resnet34-LM 预训练说话人 embedding 模型的封装,用于 pyannote.audio

基本用法

# instantiate pretrained model
from pyannote.audio import Model
model = Model.from_pretrained("pyannote/wespeaker-voxceleb-resnet34-LM")
from pyannote.audio import Inference
inference = Inference(model, window="whole")
embedding1 = inference("speaker1.wav")
embedding2 = inference("speaker2.wav")
# `embeddingX` is (1 x D) numpy array extracted from the file as a whole.

from scipy.spatial.distance import cdist
distance = cdist(embedding1, embedding2, metric="cosine")[0,0]
# `distance` is a `float` describing how dissimilar speakers 1 and 2 are.

高级用法

在 GPU 上运行

import torch
inference.to(torch.device("cuda"))
embedding = inference("audio.wav")

从片段中提取 embedding

from pyannote.audio import Inference
from pyannote.core import Segment
inference = Inference(model, window="whole")
excerpt = Segment(13.37, 19.81)
embedding = inference.crop("audio.wav", excerpt)
# `embedding` is (1 x D) numpy array extracted from the file excerpt.

使用滑动窗口提取 embeddings

from pyannote.audio import Inference
inference = Inference(model, window="sliding",
                      duration=3.0, step=1.0)
embeddings = inference("audio.wav")
# `embeddings` is a (N x D) pyannote.core.SlidingWindowFeature
# `embeddings[i]` is the embedding of the ith position of the
# sliding window, i.e. from [i * step, i * step + duration].

许可证

根据 此页面 说明:

WeNet 中的预训练模型遵循其对应数据集的许可证。例如,基于 VoxCeleb 的预训练模型遵循 Creative Commons Attribution 4.0 International License,因为该许可证被用作 VoxCeleb 数据集的许可证,详见 https://mm.kaist.ac.kr/datasets/voxceleb/。

引用

@inproceedings{Wang2023,
  title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
  author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
  booktitle={ICASSP 2023, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}
@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={1983--1987},
  doi={10.21437/Interspeech.2023-105}
}

标签

pyannote pyannote-audio-model wespeaker audio voice speech speaker speaker-recognition speaker-verification

操作


详细信息

厂商
pyannote
框架
pyannote-audio
许可(HF)
cc-by-4.0