espnet/mms_ulab_v2
收藏Hugging Face2025-02-04 更新2024-07-06 收录
下载链接:
https://hf-mirror.com/datasets/espnet/mms_ulab_v2
下载链接
链接失效反馈官方服务:
资源简介:
该数据集主要包含音频数据,采样率为16000Hz。数据集的特征包括id、iso3和audio字段。数据集分为一个训练集,包含221,689个样本,总大小为572,226,455,183.244字节。数据集的下载大小为569,642,045,980字节。数据集中包含多种语言,列出的语言代码从aaa到stj不等。
This dataset primarily contains audio data with a sampling rate of 16000Hz. The features of the dataset include id, iso3, and audio fields. The dataset is divided into a training set containing 221,689 samples, with a total size of 572,226,455,183.244 bytes. The download size of the dataset is 569,642,045,980 bytes. The dataset includes multiple languages, with language codes ranging from aaa to stj.
提供机构:
espnet
原始信息汇总
数据集概述
数据集信息
-
特征:
id: 字符串类型iso3: 字符串类型audio: 音频类型,采样率为16000
-
分割:
train: 训练集,包含221689个样本,大小为572226455183.244字节
-
下载大小: 569642045980字节
-
数据集大小: 572226455183.244字节
配置
- 配置名称:
default - 数据文件:
train: 路径为data/train-*
语言
- 数据集包含以下语言:
- aaa, aab, aac, aad, aaf, aai, aal, aao, aap, aar, aau, aaw, aaz, aba, abh, abi, abm, abn, abo, abr, abs, abt, abu, abz, aca, acd, ace, acf, ach, acm, acn, acq, acr, acu, acv, acw, acz, ada, add, ade, adh, adi, adj, adl, adn, ado, adq, adx, ady, adz, aeb, aec, aee, ael, aeu, aey, aez, afb, afe, afi, afo, afr, afu, afz, agb, agc, agd, age, agf, agg, agh, agi, agl, agn, agq, agr, ags, agt, agu, agw, agy, aha, ahb, ahg, ahk, ahl, ahp, ahr, ahs, aia, aif, aii, aik, aim, aio, aiw, aix, ajg, aji, akb, akc, akd, ake, akf, akg, akh, aki, akl, akp, akq, akr, aks, akt, akw, ala, ald, ale, alf, alh, alj, alk, all, aln, alp, alq, als, alt, alu, alw, alx, aly, alz, amb, amc, ame, amf, amh, ami, amk, amm, amn, amo, amr, amt, amu, anc, anf, anj, ank, anl, anm, ann, ano, anp, anr, anu, anv, anw, anx, any, aoe, aof, aog, aoi, aoj, aol, aom, aon, aot, aoz, apb, apc, apd, ape, apj, apm, apn, app, apr, apt, apu, apw, apy, apz, aqg, aqm, aqt, arb, are, arg, arh, arl, arn, aro, arp, arq, arr, arv, arw, arx, ary, arz, asa, asb, asc, asi, ask, asm, aso, asr, ass, asu, asy, ata, atb, atd, atg, ati, atk, ato, atp, atq, ats, att, atu, aty, auc, aug, aui, auk, aul, aun, aup, auq, auu, auy, ava, avd, avi, avl, avn, avt, avu, awa, awb, awe, awi, awn, awu, aww, axk, ayb, ayg, ayi, ayn, ayo, ayp, ayr, ayt, ayu, ayz, azb, azd, azg, azj, azm, azt, azz, baa, bab, bac, bag, bam, ban, bao, bap, bar, bas, bau, bav, baw, bax, bba, bbb, bbc, bbf, bbi, bbk, bbo, bbp, bbq, bbr, bbt, bbu, bbv, bbw, bby, bca, bcc, bcf, bcg, bci, bcj, bcl, bcn, bco, bcp, bcq, bcr, bcs, bcv, bcw, bcy, bcz, bda, bdb, bdd, bde, bdh, bdi, bdl, bdm, bdq, bdu, bdv, bdw, bea, bec, bee, bef, beh, bei, bej, bek, bel, bem, ben, beo, bep, beq, bet, beu, bev, bew, bex, bey, bez, bfa, bfb, bfd, bfe, bfg, bfh, bfj, bfm, bfo, bfq, bfr, bfs, bft, bfu, bfw, bfy, bfz, bga, bgc, bgd, bge, bgf, bgg, bgi, bgj, bgn, bgp, bgq, bgr, bgs, bgt, bgv, bgw, bgx, bgz, bha, bhb, bhd, bhf, bhg, bhh, bhi, bhj, bhl, bho, bhp, bhq, bhr, bhs, bht, bhu, bhw, bhx, bhy, bhz, bib, bid, bif, big, bil, bim, bin, bio, bip, bis, bit, biu, biv, bix, biy, biz, bja, bjc, bje, bjg, bjh, bji, bjj, bjk, bjn, bjo, bjp, bjr, bjt, bjx, bjz, bka, bkc, bkd, bkg, bkk, bkl, bkm, bkq, bkr, bks, bku, bkv, bkw, bkx, bky, bla, blb, blc, ble, blf, blh, bli, blk, blm, blo, blq, blr, blt, blw, bly, blz, bma, bmb, bmd, bmf, bmi, bmj, bmk, bmm, bmq, bmr, bmu, bmv, bni, bnj, bnm, bnn, bno, bnp, bns, bnv, bnx, boa, bob, bod, bof, boh, bol, bom, bon, boo, boq, bor, bos, bot, bou, bov, box, boz, bpa, bpe, bpn, bpp, bpr, bps, bpu, bpv, bpw, bpx, bpy, bpz, bqa, bqc, bqg, bqh, bqi, bqj, bqo, bqr, bqs, bqt, bqv, bqw, bqx, bra, brb, brd, bre, brf, brg, brh, bri, brl, brp, brq, brr, brt, bru, brv, brx, bsc, bse, bsf, bsh, bsi, bsk, bsn, bsp, bsq, bss, bst, bsy, bta, btd, bte, btg, btm, bts, btt, btu, btx, bub, bud, buf, bug, buh, bui, buj, buk, bul, bum, bun, buo, bus, buu, buw, bux, buz, bva, bvc, bvd, bvh, bvi, bvm, bvr, bvu, bvw, bvz, bwd, bwe, bwf, bwi, bwm, bwo, bwq, bwr, bws, bwt, bwu, bww, bwx, bxa, bxb, bxg, bxh, bxk, bxl, bxq, bxr, bxs, bya, byc, byd, bye, byj, byn, byo, byp, bys, byv, byx, byz, bza, bzd, bze, bzf, bzh, bzi, bzu, bzv, bzw, bzx, bzy, bzz, caa, cab, cac, cae, caf, cag, cak, can, cao, cap, caq, car, cas, cat, cav, cax, cay, caz, cbc, cbd, cbg, cbi, cbj, cbk, cbn, cbo, cbr, cbs, cbt, cbu, cbv, cce, ccg, cch, ccj, ccl, cco, ccp, cde, cdf, cdh, cdi, cdj, cdm, cdn, cdo, cdr, cdz, ceb, ceg, cek, ces, cfa, cfd, cfg, cfm, cgg, cgk, chb, chd, che, chf, chj, chk, chl, cho, chp, chq, chr, chw, chx, chy, cia, cib, cih, cik, cin, ciw, cja, cje, cjk, cjm, cjo, cjv, ckb, ckh, ckl, cko, ckt, cku, ckx, cky, cla, clc, cld, cle, cli, clj, clk, cll, clo, clt, clu, cly, cma, cme, cmn, cmo, cmr, cna, cnb, cnc, cnh, cni, cnk, cnl, cnq, cns, cnt, cnw, cob, coc, cod, cof, cog, coh, coj, com, con, cos, cou, cov, cox, coz, cpa, cpx, cqd, cra, crc, crh, crj, crk, crn, cro, crq, crt, crv, crw, crx, cry, csa, csh, csk, cso, csy, cta, ctd, cte, ctg, ctl, cto, ctp, ctt, ctu, ctz, cua, cub, cuc, cui, cuk, cul, cut, cuv, cux, cvg, cvn, cya, cyb, cym, cyo, czh, czn, czt, daa, dad, dag, dai, dak, dan, dao, daq, das, dav, daw, dax, dbb, dbd, dbi, dbj, dbm, dbn, dbq, dbv, dby, dcc, dde, ddg, ddn, dee, def, deg, deh, dei, dem, der, deu, dez, dga, dgc, dgd, dge, dgg, dgh, dgi, dgo, dgr, dgx, dgz, dhd, dhg, dhi, dhm, dhn, dho, dhv, dhw, dia, dib, did, dig, dih, dij, dik, dil, dim, dio, dip, dir, dis, diu, div, diw, diz, djc, dje, djk, djm, djn, djo, djr, dka, dks, dkx, dln, dma, dme, dmg, dmo, dmr, dms, dmw, dna, dnd, dni, dnj, dnn, dnw, dny, doa, dob, dof, doo, dop, dor, dos, dot, dow, dox, doy, doz, drd, dre, drg, dri, drs, dru, dry, dsh, dsn, dsq, dta, dtb, dtm, dtp, dts, dty, dua, dub, duc, due, dug, duh, dun, duq, dur, dus, duu, duv, duw, dva, dwa, dwr, dwu, dww, dwy, dwz, dya, dyg, dyi, dyo, dyu, dza, dzg, dzl, dzo, ebo, ebr, ebu, efi, ega, ego, eip, eit, eja, eka, ekg, ekl, ekp, ekr, eky, elk, ell, elm, ema, emb, eme, emg, emk, emn, emp, ems, ena, enb, end, eng, enl, enn, enq, env, enx, eot, epi, erg, erh, erk, ert, ese, esg, esh, esi, esk, ess, esu, etn, eto, etr, ets, etu, etx, eus, eve, evn, ewe, ewo, eyo, eza, eze, faa, fai, fak, fal, fan, fap, far, fat, fay, ffm, fie, fij, fin, fir, fla, fli, fll, flr, fod, foi, fon, for, fqs, fra, frc, frd, fry, fub, fuc, fue, fuf, fuh, fun, fuq, fut, fuu, fuv, fuy, fvr, fwe, gaa, gab, gad, gae, gaf, gah, gai, gaj, gaq, gar, gas, gau, gaw, gax, gaz, gbe, gbg, gbh, gbi, gbk, gbl, gbm, gbn, gbo, gbr, gbv, gby, gbz, gcd, gcf, gcn, gcr, gdb, gde, gdf, gdl, gdn, gdr, gdu, gdx, gea, geb, gec, ged, geg, gej, gek, gel, gew, gfk, gga, ggb, ggg, ggu, ggw, ghe, ghk, ghl, ghn, ghr, ghs, gia, gid, gig, gil, gim, gis, git, giw, giz, gjk, gjn, gju, gkn, gkp, gla, gle, glg, glh, glj, glk, glo, glr, glw, gmb, gmm, gmv, gmz, gna, gnb, gnd, gng, gni, gnk, gnm, gnn, gno, gnu, gnw, goa, gof, gog, goj, gok, gol, gom, gop, gor, gou, gow, gox, goz, gpa, gqa, gra, grd, grh, gri, grj, gro, grs, grt, gru, grv, grx, gry, gsw, gua, gub, guc, gud, gue, guf, gug, guh, gui, guj, guk, gul, gum, gun, guo, gup, guq, gur, gut, guu, guw, gux, guz, gvc, gvf, gvj, gvn, gvo, gvp, gvr, gvs, gwa, gwd, gwi, gwn, gwr, gwt, gww, gxx, gya, gyd, gym, gyr, gyz, haa, hac, had, hae, hag, hah, haj, hak, hal, haq, har, has, hat, hau, hav, haw, hay, haz, hbb, hbn, hca, hch, hdn, hdy, hea, heb, hed, heg, heh, hei, her, hgm, hgw, hia, hid, hif, hig, hii, hil, hin, hio, hix, hkk, hla, hlb, hld, hlt, hmb, hmd, hmg, hmj, hml, hmo, hmr, hms, hmt, hmw, hmz, hna, hnd, hne, hni, hnj, hnn, hno, hns, hoa, hoc, hoe, hoj, hol, hoo, hop, hot, how, hoy, hra, hre, hrm, hru, hrv, hsn, hto, hts, hub, huc, hue, huf, huh, hui, hul, hum, hun, hup, hur, hus, hut, huv, hux, hve, hvn, hvv, hwo, hye, hyw, iai, ian, iar, iba, ibb, ibd, ibg, ibl, ibm, ibo, iby, ica, ich, icr, ida, idi, idu, ifa, ifb, ife, ifk, ifm, ifu, ify, igb, ige, igl, ign, ihp, iii, ijc, ijj, ijn, ijs
搜集汇总
数据集介绍

背景与挑战
背景概述
MMS ulab v2是一个大规模多语言语音数据集,包含约8900小时的未标记语音,覆盖4023种语言和189个语系,主要用于语言识别、口语语言建模和语音表示学习。该数据集是原始MMS ulab的扩展版本,提供16kHz单通道的原始未分段音频,需配合语音活动检测模型使用,并采用CC BY-NC-SA 4.0许可证发布。
以上内容由遇见数据集搜集并总结生成



