wespeaker-voxceleb-resnet34-LM

pyannote pyannote-audio

pyannote/wespeaker-voxceleb-resnet34-LM

10,855,696

下载量

460

收藏数

39

浏览量

cc-by-4.0

许可

简介

在生产环境中使用这个开源模型？考虑切换到 pyannoteAI 以获得更优、更快的选择。

模型卡片

许可协议 cc-by-4.0

数据集

voxceleb

pyannote pyannote-audio pyannote-audio-model wespeaker audio voice speech speaker speaker-recognition speaker-verification speaker-identification speaker-embedding

模型详情

已翻译

在生产环境中使用这个开源模型？
考虑切换到 pyannoteAI 以获得更好更快的选择。

🎹 wespeaker-voxceleb-resnet34-LM 的封装

该模型需要 pyannote.audio 3.1 或更高版本。

这是 WeSpeaker wespeaker-voxceleb-resnet34-LM 预训练说话人 embedding 模型的封装，用于 pyannote.audio。

基本用法

# instantiate pretrained model
from pyannote.audio import Model
model = Model.from_pretrained("pyannote/wespeaker-voxceleb-resnet34-LM")

from pyannote.audio import Inference
inference = Inference(model, window="whole")
embedding1 = inference("speaker1.wav")
embedding2 = inference("speaker2.wav")
# `embeddingX` is (1 x D) numpy array extracted from the file as a whole.

from scipy.spatial.distance import cdist
distance = cdist(embedding1, embedding2, metric="cosine")[0,0]
# `distance` is a `float` describing how dissimilar speakers 1 and 2 are.

高级用法

在 GPU 上运行

import torch
inference.to(torch.device("cuda"))
embedding = inference("audio.wav")

从片段中提取 embedding

from pyannote.audio import Inference
from pyannote.core import Segment
inference = Inference(model, window="whole")
excerpt = Segment(13.37, 19.81)
embedding = inference.crop("audio.wav", excerpt)
# `embedding` is (1 x D) numpy array extracted from the file excerpt.

使用滑动窗口提取 embeddings

from pyannote.audio import Inference
inference = Inference(model, window="sliding",
                      duration=3.0, step=1.0)
embeddings = inference("audio.wav")
# `embeddings` is a (N x D) pyannote.core.SlidingWindowFeature
# `embeddings[i]` is the embedding of the ith position of the
# sliding window, i.e. from [i * step, i * step + duration].

许可证

根据此页面说明：

WeNet 中的预训练模型遵循其对应数据集的许可证。例如，基于 VoxCeleb 的预训练模型遵循 Creative Commons Attribution 4.0 International License，因为该许可证被用作 VoxCeleb 数据集的许可证，详见 https://mm.kaist.ac.kr/datasets/voxceleb/。

引用

@inproceedings{Wang2023,
  title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
  author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
  booktitle={ICASSP 2023, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={1983--1987},
  doi={10.21437/Interspeech.2023-105}
}