OmniVoice
k2-fsa
text-to-speech
omnivoice
aae
aal
aao
k2-fsa/OmniVoice
2,212,436
下载量
840
收藏数
8
浏览量
apache-2.0
许可
简介
模型卡片
许可协议
apache-2.0
语言
aae
aal
aao
ab
abb
abn
abr
abs
abv
acm
acw
acx
adf
adx
ady
aeb
aec
af
afb
afo
ahl
ahs
ajg
aju
ala
aln
alo
am
amu
an
anc
ank
anp
anw
aom
apc
apd
arb
arq
ars
ary
arz
as
ast
avl
awo
ayl
ayp
az
ba
bag
bas
bax
bba
bbj
bbl
bbu
bce
bci
bcs
bcy
bda
bde
bdm
be
beb
bew
bfd
bft
bg
bgp
bhb
bhh
bho
bhp
bhr
bjj
bjk
bjn
bjt
bkh
bkm
bky
bmm
bmq
bn
bnm
bnn
bns
bo
bou
bqg
br
bra
brh
bri
brx
bs
bsh
bsj
bsk
btm
btv
bug
bum
buo
bux
bwr
bxf
byc
bys
byv
byx
bzc
bzw
ca
ccg
ceb
cen
cfa
cgg
chq
cjk
ckb
ckl
ckr
cky
cnh
cpy
cs
cte
ctl
cut
cux
cv
cy
da
dag
dar
dav
dbd
dcc
de
deg
dgh
dgo
dje
dmk
dml
dru
dty
dua
dv
dyu
dzg
ebr
ebu
ego
eiv
eko
ekr
el
elm
en
eo
es
esu
et
eto
ets
etu
eu
ewo
ext
eyo
fa
fan
fat
ff
ffm
fi
fia
fil
fip
fkk
fmp
fr
fub
fuc
fue
fuf
fuh
fui
fuq
fuv
fy
ga
gbm
gbr
gby
gcc
gdf
gej
ges
ggg
gid
gig
giz
gjk
gju
gl
glw
gn
gol
gom
gsl
gu
gui
gur
guz
gv
gwc
gwe
gwt
gya
gyz
ha
hah
hao
haw
haz
hbb
he
hem
hi
hia
hkk
hla
hno
hoj
hr
hsb
ht
hu
hue
hul
hux
hwo
hy
hz
ia
ibb
id
ida
idu
ig
ijc
ijn
ik
ikw
is
ish
iso
it
its
itw
itz
ja
jal
jax
jgo
jmx
jns
jqr
juk
juo
jv
ka
kab
kai
kaj
kam
kbd
kbl
kbt
kcq
kdh
kea
keu
kfe
kfk
kfp
khg
khw
kj
kjc
kjk
kk
kln
kls
km
kmr
kmy
kn
kna
knn
ko
kol
koo
kpo
kqo
ks
ksd
ksf
kto
kuh
kvx
kw
kwm
kxp
ky
kyx
lag
lb
lcm
ldb
lg
lij
lir
lkb
lla
ln
lnu
lo
loa
lrk
lss
lt
ltg
lto
lua
luo
lus
lv
lwg
mab
maf
mai
mau
max
mbo
mcf
mcn
mcx
mdd
mde
mdf
mek
mer
meu
mfm
mfn
mfo
mfv
mgg
mgi
mhk
mhr
mi
mig
miu
mk
mkf
mki
ml
mlq
mn
mne
mni
mqy
mr
mrj
mrr
mrt
ms
mse
msh
msw
mt
mtr
mtu
mtx
mua
mug
mui
mve
mvy
mxs
mxu
mxy
my
myv
mzl
nal
nan
nap
nb
nbh
ncf
nco
ncx
ndi
ng
ngi
nhg
nhi
nhn
nhq
nja
nl
nla
nlv
nmg
nmz
nn
nnh
no
noe
npi
nso
ny
nyu
oc
odk
odu
ogo
om
orc
oru
ory
os
pa
pbs
pbt
pbu
pcm
pex
phl
phr
pip
piy
pko
pl
plk
plt
pmq
pms
pmy
pnb
poc
poe
pow
prq
ps
pst
pt
pua
pwn
qug
qum
qup
qur
qus
quv
qux
quy
qva
qvi
qvj
qvl
qwa
qws
qxa
qxp
qxt
qxu
qxw
rag
rm
ro
rob
rof
roo
rth
ru
rup
rw
sa
sah
sat
sau
say
sbn
sc
scl
scn
sd
sei
shu
si
sip
siw
sjr
sk
skg
skr
sl
sn
snc
snk
so
sol
sps
sq
sr
src
sro
ssi
ste
sua
sv
sva
sw
szy
ta
tan
tar
tay
tbf
tcf
tcy
tdn
tdx
te
tg
tgc
th
the
thq
thr
thv
ti
tig
tio
tk
tkg
tkt
tli
tlp
tn
tok
tpl
tpz
tqp
tr
trp
trq
trv
trw
tt
ttj
ttr
ttu
tui
tul
tuq
tuv
tuy
tvo
tvu
tw
twu
txs
txy
udl
ug
uk
uki
umb
ur
ush
uz
uzn
vai
var
ver
vi
vmc
vmj
vmm
vmp
vmz
vot
vro
wbl
wci
weo
wes
wja
wji
wo
wof
xh
xhe
xka
xmf
xmv
xmw
xpe
xti
xtu
yaq
yav
yay
ydd
ydg
yer
yes
yi
yo
yue
zga
zgh
zh
zoc
zoh
zor
zpv
zpy
ztg
ztn
ztp
zts
ztu
zu
zza
框架
omnivoice
任务
text-to-speech
zero-shot
multilingual
voice-cloning
voice-design
模型配置
模型类型
omnivoice
架构
OmniVoice
模型详情
已翻译OmniVoice 🌍
OmniVoice 是一个大规模多语言零样本文本转语音(TTS)模型,支持超过 600 种语言。它基于新颖的扩散语言模型风格架构构建,能够生成高质量语音,具备卓越的推理速度,并支持语音克隆和语音设计。
- 论文: OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
- 仓库: GitHub
- 演示: Hugging Face Space
- Colab: Google Colab Notebook
主要特性
- 支持 600+ 种语言:零样本 TTS 模型中最广泛的语言覆盖范围。
- 语音克隆:基于短参考音频,实现最先进的语音克隆质量。
- 语音设计:通过指定的说话人属性(性别、年龄、音高、方言/口音、耳语等)控制语音。
- 细粒度控制:非语言符号(例如
[laughter])以及通过拼音或音素进行发音纠正。 - 快速推理:RTF 低至 0.025(比实时快 40 倍)。
- 扩散语言模型风格架构:简洁、流线型且可扩展的设计,兼具质量和速度。
使用方法
首先,安装 omnivoice 库:
建议使用全新的虚拟环境(例如
conda、venv等)以避免冲突。
步骤 1:安装 PyTorch
NVIDIA GPU
# Install pytorch with your CUDA version, e.g.
pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128
其他版本的安装请参见 PyTorch 官方网站。
Apple Silicon
pip install torch==2.8.0 torchaudio==2.8.0
步骤 2:安装 OmniVoice
pip install omnivoice
Python API
您可以按如下方式使用 OmniVoice 进行零样本语音克隆:
from omnivoice import OmniVoice
import soundfile as sf
import torch
# Load the model
model = OmniVoice.from_pretrained(
"k2-fsa/OmniVoice",
device_map="cuda:0",
dtype=torch.float16
)
# Generate audio
audio = model.generate(
text="Hello, this is a test of zero-shot voice cloning.",
ref_audio="ref.wav",
ref_text="Transcription of the reference audio.",
) # audio is a list of `np.ndarray` with shape (T,) at 24 kHz.
sf.write("out.wav", audio[0], 24000)
有关更多生成模式(例如语音设计)、功能(例如非语言符号、发音纠正)以及完整的使用说明,请参见我们的 GitHub 仓库。
讨论与交流
您可以直接在 GitHub Issues 上进行讨论。
您也可以扫描二维码加入我们的微信群或关注我们的微信公众号。
| 微信群 | 微信公众号 |
|---|---|
![]() |
![]() |
引用
@article{zhu2026omnivoice,
title={OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models},
author={Zhu, Han and Ye, Lingxuan and Kang, Wei and Yao, Zengwei and Guo, Liyong and Kuang, Fangjun and Han, Zhifeng and Zhuang, Weiji and Lin, Long and Povey, Daniel},
journal={arXiv preprint arXiv:2604.00688},
year={2026}
}
免责声明
严禁用户将本模型用于未经授权的语音克隆、语音冒充、欺诈、诈骗或任何其他非法或不道德的活动。所有用户应确保完全遵守适用的当地法律、法规和道德标准。开发者不对任何滥用本模型的行为承担责任,并倡导负责任的人工智能开发和使用,鼓励社区在人工智能研究和应用中坚守安全和道德原则。
正在翻译中,请稍候...
标签
zero-shot
multilingual
voice-cloning
voice-design
aae
aal
aao
ab
操作
详细信息
- 厂商
- k2-fsa
- 任务
- text-to-speech
- 框架
- omnivoice
- 模型类型
- omnivoice
- 许可(HF)
- apache-2.0
- 语言
- aae, aal, aao, ab, abb, abn, abr, abs, abv, acm, acw, acx, adf, adx, ady, aeb, aec, af, afb, afo, ahl, ahs, ajg, aju, ala, aln, alo, am, amu, an, anc, ank, anp, anw, aom, apc, apd, arb, arq, ars, ary, arz, as, ast, avl, awo, ayl, ayp, az, ba, bag, bas, bax, bba, bbj, bbl, bbu, bce, bci, bcs, bcy, bda, bde, bdm, be, beb, bew, bfd, bft, bg, bgp, bhb, bhh, bho, bhp, bhr, bjj, bjk, bjn, bjt, bkh, bkm, bky, bmm, bmq, bn, bnm, bnn, bns, bo, bou, bqg, br, bra, brh, bri, brx, bs, bsh, bsj, bsk, btm, btv, bug, bum, buo, bux, bwr, bxf, byc, bys, byv, byx, bzc, bzw, ca, ccg, ceb, cen, cfa, cgg, chq, cjk, ckb, ckl, ckr, cky, cnh, cpy, cs, cte, ctl, cut, cux, cv, cy, da, dag, dar, dav, dbd, dcc, de, deg, dgh, dgo, dje, dmk, dml, dru, dty, dua, dv, dyu, dzg, ebr, ebu, ego, eiv, eko, ekr, el, elm, en, eo, es, esu, et, eto, ets, etu, eu, ewo, ext, eyo, fa, fan, fat, ff, ffm, fi, fia, fil, fip, fkk, fmp, fr, fub, fuc, fue, fuf, fuh, fui, fuq, fuv, fy, ga, gbm, gbr, gby, gcc, gdf, gej, ges, ggg, gid, gig, giz, gjk, gju, gl, glw, gn, gol, gom, gsl, gu, gui, gur, guz, gv, gwc, gwe, gwt, gya, gyz, ha, hah, hao, haw, haz, hbb, he, hem, hi, hia, hkk, hla, hno, hoj, hr, hsb, ht, hu, hue, hul, hux, hwo, hy, hz, ia, ibb, id, ida, idu, ig, ijc, ijn, ik, ikw, is, ish, iso, it, its, itw, itz, ja, jal, jax, jgo, jmx, jns, jqr, juk, juo, jv, ka, kab, kai, kaj, kam, kbd, kbl, kbt, kcq, kdh, kea, keu, kfe, kfk, kfp, khg, khw, kj, kjc, kjk, kk, kln, kls, km, kmr, kmy, kn, kna, knn, ko, kol, koo, kpo, kqo, ks, ksd, ksf, kto, kuh, kvx, kw, kwm, kxp, ky, kyx, lag, lb, lcm, ldb, lg, lij, lir, lkb, lla, ln, lnu, lo, loa, lrk, lss, lt, ltg, lto, lua, luo, lus, lv, lwg, mab, maf, mai, mau, max, mbo, mcf, mcn, mcx, mdd, mde, mdf, mek, mer, meu, mfm, mfn, mfo, mfv, mgg, mgi, mhk, mhr, mi, mig, miu, mk, mkf, mki, ml, mlq, mn, mne, mni, mqy, mr, mrj, mrr, mrt, ms, mse, msh, msw, mt, mtr, mtu, mtx, mua, mug, mui, mve, mvy, mxs, mxu, mxy, my, myv, mzl, nal, nan, nap, nb, nbh, ncf, nco, ncx, ndi, ng, ngi, nhg, nhi, nhn, nhq, nja, nl, nla, nlv, nmg, nmz, nn, nnh, no, noe, npi, nso, ny, nyu, oc, odk, odu, ogo, om, orc, oru, ory, os, pa, pbs, pbt, pbu, pcm, pex, phl, phr, pip, piy, pko, pl, plk, plt, pmq, pms, pmy, pnb, poc, poe, pow, prq, ps, pst, pt, pua, pwn, qug, qum, qup, qur, qus, quv, qux, quy, qva, qvi, qvj, qvl, qwa, qws, qxa, qxp, qxt, qxu, qxw, rag, rm, ro, rob, rof, roo, rth, ru, rup, rw, sa, sah, sat, sau, say, sbn, sc, scl, scn, sd, sei, shu, si, sip, siw, sjr, sk, skg, skr, sl, sn, snc, snk, so, sol, sps, sq, sr, src, sro, ssi, ste, sua, sv, sva, sw, szy, ta, tan, tar, tay, tbf, tcf, tcy, tdn, tdx, te, tg, tgc, th, the, thq, thr, thv, ti, tig, tio, tk, tkg, tkt, tli, tlp, tn, tok, tpl, tpz, tqp, tr, trp, trq, trv, trw, tt, ttj, ttr, ttu, tui, tul, tuq, tuv, tuy, tvo, tvu, tw, twu, txs, txy, udl, ug, uk, uki, umb, ur, ush, uz, uzn, vai, var, ver, vi, vmc, vmj, vmm, vmp, vmz, vot, vro, wbl, wci, weo, wes, wja, wji, wo, wof, xh, xhe, xka, xmf, xmv, xmw, xpe, xti, xtu, yaq, yav, yay, ydd, ydg, yer, yes, yi, yo, yue, zga, zgh, zh, zoc, zoh, zor, zpv, zpy, ztg, ztn, ztp, zts, ztu, zu, zza

