模型库 / k2-fsa/OmniVoice

OmniVoice

k2-fsa text-to-speech omnivoice aae aal aao
k2-fsa/OmniVoice
2,212,436
下载量
840
收藏数
8
浏览量
apache-2.0
许可

简介

         

模型卡片

许可协议 apache-2.0
语言
aae aal aao ab abb abn abr abs abv acm acw acx adf adx ady aeb aec af afb afo ahl ahs ajg aju ala aln alo am amu an anc ank anp anw aom apc apd arb arq ars ary arz as ast avl awo ayl ayp az ba bag bas bax bba bbj bbl bbu bce bci bcs bcy bda bde bdm be beb bew bfd bft bg bgp bhb bhh bho bhp bhr bjj bjk bjn bjt bkh bkm bky bmm bmq bn bnm bnn bns bo bou bqg br bra brh bri brx bs bsh bsj bsk btm btv bug bum buo bux bwr bxf byc bys byv byx bzc bzw ca ccg ceb cen cfa cgg chq cjk ckb ckl ckr cky cnh cpy cs cte ctl cut cux cv cy da dag dar dav dbd dcc de deg dgh dgo dje dmk dml dru dty dua dv dyu dzg ebr ebu ego eiv eko ekr el elm en eo es esu et eto ets etu eu ewo ext eyo fa fan fat ff ffm fi fia fil fip fkk fmp fr fub fuc fue fuf fuh fui fuq fuv fy ga gbm gbr gby gcc gdf gej ges ggg gid gig giz gjk gju gl glw gn gol gom gsl gu gui gur guz gv gwc gwe gwt gya gyz ha hah hao haw haz hbb he hem hi hia hkk hla hno hoj hr hsb ht hu hue hul hux hwo hy hz ia ibb id ida idu ig ijc ijn ik ikw is ish iso it its itw itz ja jal jax jgo jmx jns jqr juk juo jv ka kab kai kaj kam kbd kbl kbt kcq kdh kea keu kfe kfk kfp khg khw kj kjc kjk kk kln kls km kmr kmy kn kna knn ko kol koo kpo kqo ks ksd ksf kto kuh kvx kw kwm kxp ky kyx lag lb lcm ldb lg lij lir lkb lla ln lnu lo loa lrk lss lt ltg lto lua luo lus lv lwg mab maf mai mau max mbo mcf mcn mcx mdd mde mdf mek mer meu mfm mfn mfo mfv mgg mgi mhk mhr mi mig miu mk mkf mki ml mlq mn mne mni mqy mr mrj mrr mrt ms mse msh msw mt mtr mtu mtx mua mug mui mve mvy mxs mxu mxy my myv mzl nal nan nap nb nbh ncf nco ncx ndi ng ngi nhg nhi nhn nhq nja nl nla nlv nmg nmz nn nnh no noe npi nso ny nyu oc odk odu ogo om orc oru ory os pa pbs pbt pbu pcm pex phl phr pip piy pko pl plk plt pmq pms pmy pnb poc poe pow prq ps pst pt pua pwn qug qum qup qur qus quv qux quy qva qvi qvj qvl qwa qws qxa qxp qxt qxu qxw rag rm ro rob rof roo rth ru rup rw sa sah sat sau say sbn sc scl scn sd sei shu si sip siw sjr sk skg skr sl sn snc snk so sol sps sq sr src sro ssi ste sua sv sva sw szy ta tan tar tay tbf tcf tcy tdn tdx te tg tgc th the thq thr thv ti tig tio tk tkg tkt tli tlp tn tok tpl tpz tqp tr trp trq trv trw tt ttj ttr ttu tui tul tuq tuv tuy tvo tvu tw twu txs txy udl ug uk uki umb ur ush uz uzn vai var ver vi vmc vmj vmm vmp vmz vot vro wbl wci weo wes wja wji wo wof xh xhe xka xmf xmv xmw xpe xti xtu yaq yav yay ydd ydg yer yes yi yo yue zga zgh zh zoc zoh zor zpv zpy ztg ztn ztp zts ztu zu zza
框架 omnivoice
任务 text-to-speech
zero-shot multilingual voice-cloning voice-design

模型配置

模型类型 omnivoice
架构 OmniVoice

模型详情

已翻译

OmniVoice 🌍

 

 

 

 

 

OmniVoice 是一个大规模多语言零样本文本转语音(TTS)模型,支持超过 600 种语言。它基于新颖的扩散语言模型风格架构构建,能够生成高质量语音,具备卓越的推理速度,并支持语音克隆和语音设计。

主要特性

  • 支持 600+ 种语言:零样本 TTS 模型中最广泛的语言覆盖范围。
  • 语音克隆:基于短参考音频,实现最先进的语音克隆质量。
  • 语音设计:通过指定的说话人属性(性别、年龄、音高、方言/口音、耳语等)控制语音。
  • 细粒度控制:非语言符号(例如 [laughter])以及通过拼音或音素进行发音纠正。
  • 快速推理:RTF 低至 0.025(比实时快 40 倍)。
  • 扩散语言模型风格架构:简洁、流线型且可扩展的设计,兼具质量和速度。

使用方法

首先,安装 omnivoice 库:

建议使用全新的虚拟环境(例如 condavenv 等)以避免冲突。

步骤 1:安装 PyTorch

NVIDIA GPU

# Install pytorch with your CUDA version, e.g.
pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128

其他版本的安装请参见 PyTorch 官方网站

Apple Silicon

pip install torch==2.8.0 torchaudio==2.8.0

步骤 2:安装 OmniVoice

pip install omnivoice

Python API

您可以按如下方式使用 OmniVoice 进行零样本语音克隆:

from omnivoice import OmniVoice
import soundfile as sf
import torch

# Load the model
model = OmniVoice.from_pretrained(
    "k2-fsa/OmniVoice",
    device_map="cuda:0",
    dtype=torch.float16
)

# Generate audio
audio = model.generate(
    text="Hello, this is a test of zero-shot voice cloning.",
    ref_audio="ref.wav",
    ref_text="Transcription of the reference audio.",
) # audio is a list of `np.ndarray` with shape (T,) at 24 kHz.

sf.write("out.wav", audio[0], 24000)

有关更多生成模式(例如语音设计)、功能(例如非语言符号、发音纠正)以及完整的使用说明,请参见我们的 GitHub 仓库

讨论与交流

您可以直接在 GitHub Issues 上进行讨论。

您也可以扫描二维码加入我们的微信群或关注我们的微信公众号。

微信群 微信公众号
微信 微信

引用

@article{zhu2026omnivoice,
      title={OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models},
      author={Zhu, Han and Ye, Lingxuan and Kang, Wei and Yao, Zengwei and Guo, Liyong and Kuang, Fangjun and Han, Zhifeng and Zhuang, Weiji and Lin, Long and Povey, Daniel},
      journal={arXiv preprint arXiv:2604.00688},
      year={2026}
}

免责声明

严禁用户将本模型用于未经授权的语音克隆、语音冒充、欺诈、诈骗或任何其他非法或不道德的活动。所有用户应确保完全遵守适用的当地法律、法规和道德标准。开发者不对任何滥用本模型的行为承担责任,并倡导负责任的人工智能开发和使用,鼓励社区在人工智能研究和应用中坚守安全和道德原则。

标签

zero-shot multilingual voice-cloning voice-design aae aal aao ab

操作


详细信息

厂商
k2-fsa
任务
text-to-speech
框架
omnivoice
模型类型
omnivoice
许可(HF)
apache-2.0
语言
aae, aal, aao, ab, abb, abn, abr, abs, abv, acm, acw, acx, adf, adx, ady, aeb, aec, af, afb, afo, ahl, ahs, ajg, aju, ala, aln, alo, am, amu, an, anc, ank, anp, anw, aom, apc, apd, arb, arq, ars, ary, arz, as, ast, avl, awo, ayl, ayp, az, ba, bag, bas, bax, bba, bbj, bbl, bbu, bce, bci, bcs, bcy, bda, bde, bdm, be, beb, bew, bfd, bft, bg, bgp, bhb, bhh, bho, bhp, bhr, bjj, bjk, bjn, bjt, bkh, bkm, bky, bmm, bmq, bn, bnm, bnn, bns, bo, bou, bqg, br, bra, brh, bri, brx, bs, bsh, bsj, bsk, btm, btv, bug, bum, buo, bux, bwr, bxf, byc, bys, byv, byx, bzc, bzw, ca, ccg, ceb, cen, cfa, cgg, chq, cjk, ckb, ckl, ckr, cky, cnh, cpy, cs, cte, ctl, cut, cux, cv, cy, da, dag, dar, dav, dbd, dcc, de, deg, dgh, dgo, dje, dmk, dml, dru, dty, dua, dv, dyu, dzg, ebr, ebu, ego, eiv, eko, ekr, el, elm, en, eo, es, esu, et, eto, ets, etu, eu, ewo, ext, eyo, fa, fan, fat, ff, ffm, fi, fia, fil, fip, fkk, fmp, fr, fub, fuc, fue, fuf, fuh, fui, fuq, fuv, fy, ga, gbm, gbr, gby, gcc, gdf, gej, ges, ggg, gid, gig, giz, gjk, gju, gl, glw, gn, gol, gom, gsl, gu, gui, gur, guz, gv, gwc, gwe, gwt, gya, gyz, ha, hah, hao, haw, haz, hbb, he, hem, hi, hia, hkk, hla, hno, hoj, hr, hsb, ht, hu, hue, hul, hux, hwo, hy, hz, ia, ibb, id, ida, idu, ig, ijc, ijn, ik, ikw, is, ish, iso, it, its, itw, itz, ja, jal, jax, jgo, jmx, jns, jqr, juk, juo, jv, ka, kab, kai, kaj, kam, kbd, kbl, kbt, kcq, kdh, kea, keu, kfe, kfk, kfp, khg, khw, kj, kjc, kjk, kk, kln, kls, km, kmr, kmy, kn, kna, knn, ko, kol, koo, kpo, kqo, ks, ksd, ksf, kto, kuh, kvx, kw, kwm, kxp, ky, kyx, lag, lb, lcm, ldb, lg, lij, lir, lkb, lla, ln, lnu, lo, loa, lrk, lss, lt, ltg, lto, lua, luo, lus, lv, lwg, mab, maf, mai, mau, max, mbo, mcf, mcn, mcx, mdd, mde, mdf, mek, mer, meu, mfm, mfn, mfo, mfv, mgg, mgi, mhk, mhr, mi, mig, miu, mk, mkf, mki, ml, mlq, mn, mne, mni, mqy, mr, mrj, mrr, mrt, ms, mse, msh, msw, mt, mtr, mtu, mtx, mua, mug, mui, mve, mvy, mxs, mxu, mxy, my, myv, mzl, nal, nan, nap, nb, nbh, ncf, nco, ncx, ndi, ng, ngi, nhg, nhi, nhn, nhq, nja, nl, nla, nlv, nmg, nmz, nn, nnh, no, noe, npi, nso, ny, nyu, oc, odk, odu, ogo, om, orc, oru, ory, os, pa, pbs, pbt, pbu, pcm, pex, phl, phr, pip, piy, pko, pl, plk, plt, pmq, pms, pmy, pnb, poc, poe, pow, prq, ps, pst, pt, pua, pwn, qug, qum, qup, qur, qus, quv, qux, quy, qva, qvi, qvj, qvl, qwa, qws, qxa, qxp, qxt, qxu, qxw, rag, rm, ro, rob, rof, roo, rth, ru, rup, rw, sa, sah, sat, sau, say, sbn, sc, scl, scn, sd, sei, shu, si, sip, siw, sjr, sk, skg, skr, sl, sn, snc, snk, so, sol, sps, sq, sr, src, sro, ssi, ste, sua, sv, sva, sw, szy, ta, tan, tar, tay, tbf, tcf, tcy, tdn, tdx, te, tg, tgc, th, the, thq, thr, thv, ti, tig, tio, tk, tkg, tkt, tli, tlp, tn, tok, tpl, tpz, tqp, tr, trp, trq, trv, trw, tt, ttj, ttr, ttu, tui, tul, tuq, tuv, tuy, tvo, tvu, tw, twu, txs, txy, udl, ug, uk, uki, umb, ur, ush, uz, uzn, vai, var, ver, vi, vmc, vmj, vmm, vmp, vmz, vot, vro, wbl, wci, weo, wes, wja, wji, wo, wof, xh, xhe, xka, xmf, xmv, xmw, xpe, xti, xtu, yaq, yav, yay, ydd, ydg, yer, yes, yi, yo, yue, zga, zgh, zh, zoc, zoh, zor, zpv, zpy, ztg, ztn, ztp, zts, ztu, zu, zza