five

BibleMMS

收藏
魔搭社区2025-12-04 更新2024-06-22 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/BibleMMS
下载链接
链接失效反馈
官方服务:
资源简介:
The Dataset associated with the Paper "Meta Learning Text-to-Speech Synthesis in over 7000 Languages" by Florian Lux, Sarina Meyer, Lyonel Behringer, Frank Zalkow, Phat Do, Matt Coler, Emanuël A. P. Habets and Ngoc Thang Vu (Interspeech 2024). We generate 2000 spoken utterances per language using the subsets of the eBible dataset [1] that are under free licenses as the text input to the MMS TTS models [2]. The languages associated with the following ISO-639-3 codes are represented in this dataset: ```acf, bss, deu, inb, nca, quh, wap, acr, bus, dgr, ind, maz, nch, qul, tav, wmw, acu, byr, dik, iou, mbb, ncj, qvc, tbc, xed, agd, bzh, djk, ipi, mbc, ncl, qve, tbg, xon, agg, bzj, dop, jac, mbh, ncu, qvh, tbl, xtd, agn, caa, jic, mbj, ndj, qvm, tbz, xtm, agr, cab, emp, jiv, mbt, nfa, qvn, tca, yaa, agu, cap, eng, jvn, mca, ngp, qvs, tcs, yad, aia, car, ese, mcb, ngu, qvw, yal, cax, kaq, mcd, nhe, qvz, tee, ycn, ake, cbc, far, mco, qwh, yka, alp, cbi, fra, kdc, mcp, nhu, qxh, ame, cbr, gai, kde, mcq, nhw, qxn, tew, yre, amf, cbs, gam, kdl, mdy, nhy, qxo, tfr, yva, amk, cbt, geb, kek, med, nin, rai, zaa, apb, cbu, glk, ken, mee, nko, rgu, zab, apr, cbv, meq, nld, tgo, zac, arl, cco, gng, kje, met, nlg, rop, tgp, zad, grc, klv, mgh, nnq, rro, zai, ata, cek, gub, kmu, mib, noa, ruf, tna, zam, atb, cgc, guh, kne, mie, not, rug, tnk, zao, atg, chf, knf, mih, npl, rus, tnn, zar, awb, chz, gum, knj, mil, sab, tnp, zas, cjo, guo, ksr, mio, obo, seh, toc, zav, azg, cle, gux, kue, mit, omw, sey, tos, zaw, azz, cme, gvc, kvn, miz, ood, sgb, tpi, zca, bao, cni, gwi, kwd, mkl, shp, tpt, zga, bba, cnl, gym, kwf, mkn, ote, sja, trc, ziw, bbb, cnt, gyr, kwi, mop, otq, snn, ttc, zlm, cof, hat, kyc, mox, pab, snp, tte, zos, bgt, con, kyf, mpm, pad, som, tue, zpc, bjr, cot, heb, kyg, mpp, soy, tuf, zpl, bjv, cpa, kyq, mpx, pao, spa, tuo, zpm, bjz, cpb, hlt, kyz, mqb, pib, spp, tur, zpo, bkd, cpu, hns, lac, mqj, pir, spy, txq, zpu, blz, crn, hto, lat, msy, pjt, sri, txu, zpz, bmr, cso, hub, lex, mto, pls, srm, udu, ztq, bmu, ctu, lgl, muy, poi, srn, ukr, zty, bnp, cuc, lid, mxb, pol, stp, upv, zyp, boa, cui, huu, mxq, por, sus, ura, boj, cuk, huv, llg, mxt, poy, suz, urb, box, cwe, hvn, prf, swe, urt, bpr, cya, ign, lww, myk, ptu, swh, usp, bps, daa, ikk, maj, myy, sxb, vid, bqc, dah, nab, qub, tac, vie, bqp, ded, imo, maq, nas, quf, taj, vmy``` [1] V. Akerman, D. Baines, D. Daspit, U. Hermjakob et al., “The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages,” arXiv:2304.09919, 2023.\ [2] V. Pratap, A. Tjandra, B. Shi, P. Tomasello, A. Babu, S. Kundu, A. Elkahky, Z. Ni et al., “Scaling speech technology to 1,000+ languages,” Journal of Machine Learning Research, 2024.

本数据集关联于Florian Lux、Sarina Meyer、Lyonel Behringer、Frank Zalkow、Phat Do、Matt Coler、Emanuël A. P. Habets与Ngoc Thang Vu发表于Interspeech 2024的论文《元学习实现7000余种语言的文本到语音合成(Meta Learning Text-to-Speech Synthesis)》。 我们采用遵循自由许可条款的eBible数据集[1]的子集作为输入文本,为MMS TTS模型[2]生成每种语言2000条语音语句。 本数据集覆盖了以下ISO-639-3代码对应的语言: acf, bss, deu, inb, nca, quh, wap, acr, bus, dgr, ind, maz, nch, qul, tav, wmw, acu, byr, dik, iou, mbb, ncj, qvc, tbc, xed, agd, bzh, djk, ipi, mbc, ncl, qve, tbg, xon, agg, bzj, dop, jac, mbh, ncu, qvh, tbl, xtd, agn, caa, jic, mbj, ndj, qvm, tbz, xtm, agr, cab, emp, jiv, mbt, nfa, qvn, tca, yaa, agu, cap, eng, jvn, mca, ngp, qvs, tcs, yad, aia, car, ese, mcb, ngu, qvw, yal, cax, kaq, mcd, nhe, qvz, tee, ycn, ake, cbc, far, mco, qwh, yka, alp, cbi, fra, kdc, mcp, nhu, qxh, ame, cbr, gai, kde, mcq, nhw, qxn, tew, yre, amf, cbs, gam, kdl, mdy, nhy, qxo, tfr, yva, amk, cbt, geb, kek, med, nin, rai, zaa, apb, cbu, glk, ken, mee, nko, rgu, zab, apr, cbv, meq, nld, tgo, zac, arl, cco, gng, kje, met, nlg, rop, tgp, zad, grc, klv, mgh, nnq, rro, zai, ata, cek, gub, kmu, mib, noa, ruf, tna, zam, atb, cgc, guh, kne, mie, not, rug, tnk, zao, atg, chf, knf, mih, npl, rus, tnn, zar, awb, chz, gum, knj, mil, sab, tnp, zas, cjo, guo, ksr, mio, obo, seh, toc, zav, azg, cle, gux, kue, mit, omw, sey, tos, zaw, azz, cme, gvc, kvn, miz, ood, sgb, tpi, zca, bao, cni, gwi, kwd, mkl, shp, tpt, zga, bba, cnl, gym, kwf, mkn, ote, sja, trc, ziw, bbb, cnt, gyr, kwi, mop, otq, snn, ttc, zlm, cof, hat, kyc, mox, pab, snp, tte, zos, bgt, con, kyf, mpm, pad, som, tue, zpc, bjr, cot, heb, kyg, mpp, soy, tuf, zpl, bjv, cpa, kyq, mpx, pao, spa, tuo, zpm, bjz, cpb, hlt, kyz, mqb, pib, spp, tur, zpo, bkd, cpu, hns, lac, mqj, pir, spy, txq, zpu, blz, crn, hto, lat, msy, pjt, sri, txu, zpz, bmr, cso, hub, lex, mto, pls, srm, udu, ztq, bmu, ctu, lgl, muy, poi, srn, ukr, zty, bnp, cuc, lid, mxb, pol, stp, upv, zyp, boa, cui, huu, mxq, por, sus, ura, boj, cuk, huv, llg, mxt, poy, suz, urb, box, cwe, hvn, prf, swe, urt, bpr, cya, ign, lww, myk, ptu, swh, usp, bps, daa, ikk, maj, myy, sxb, vid, bqc, dah, nab, qub, tac, vie, bqp, ded, imo, maq, nas, quf, taj, vmy [1] V. Akerman、D. Baines、D. Daspit、U. Hermjakob等,《eBible语料库:面向低资源语言圣经翻译的数据集与模型基准》,arXiv:2304.09919,2023年。 [2] V. Pratap、A. Tjandra、B. Shi、P. Tomasello、A. Babu、S. Kundu、A. Elkahky、Z. Ni等,《将语音技术扩展至1000余种语言》,《机器学习研究期刊》,2024年。
提供机构:
maas
创建时间:
2024-06-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作