five

timothyhartzog/biblenlp-corpus

收藏
Hugging Face2026-03-16 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/timothyhartzog/biblenlp-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - expert-generated language: - aai - aak - aau - aaz - abt - abx - aby - acf - acr - acu - adz - aer - aey - agd - agg - agm - agn - agr - agt - agu - aia - aii - aka - ake - alp - alq - als - aly - ame - amf - amk - amm - amn - amo - amp - amr - amu - amx - anh - anv - aoi - aoj - aom - aon - apb - ape - apn - apr - apu - apw - apz - arb - are - arl - arn - arp - asm - aso - ata - atb - atd - atg - att - auc - aui - auy - avt - awb - awk - awx - azb - azg - azz - bao - bba - bbb - bbr - bch - bco - bdd - bea - bef - bel - ben - beo - beu - bgs - bgt - bhg - bhl - big - bjk - bjp - bjr - bjv - bjz - bkd - bki - bkq - bkx - bla - blw - blz - bmh - bmk - bmr - bmu - bnp - boa - boj - bon - box - bpr - bps - bqc - bqp - bre - bsj - bsn - bsp - bss - buk - bus - bvd - bvr - bxh - byr - byx - bzd - bzh - bzj - caa - cab - cac - caf - cak - cao - cap - car - cav - cax - cbc - cbi - cbk - cbr - cbs - cbt - cbu - cbv - cco - ceb - cek - ces - cgc - cha - chd - chf - chk - chq - chz - cjo - cjv - ckb - cle - clu - cme - cmn - cni - cnl - cnt - cof - con - cop - cot - cpa - cpb - cpc - cpu - cpy - crn - crx - cso - csy - cta - cth - ctp - ctu - cub - cuc - cui - cuk - cut - cux - cwe - cya - daa - dad - dah - dan - ded - deu - dgc - dgr - dgz - dhg - dif - dik - dji - djk - djr - dob - dop - dov - dwr - dww - dwy - ebk - eko - emi - emp - eng - enq - epo - eri - ese - esk - etr - ewe - faa - fai - far - ffm - for - fra - fue - fuf - fuh - gah - gai - gam - gaw - gdn - gdr - geb - gfk - ghs - glk - gmv - gng - gnn - gnw - gof - grc - gub - guh - gui - guj - gul - gum - gun - guo - gup - gux - gvc - gvf - gvn - gvs - gwi - gym - gyr - hat - hau - haw - hbo - hch - heb - heg - hin - hix - hla - hlt - hmo - hns - hop - hot - hrv - hto - hub - hui - hun - hus - huu - huv - hvn - ian - ign - ikk - ikw - ilo - imo - inb - ind - ino - iou - ipi - isn - ita - iws - ixl - jac - jae - jao - jic - jid - jiv - jni - jpn - jvn - kan - kaq - kbc - kbh - kbm - kbq - kdc - kde - kdl - kek - ken - kew - kgf - kgk - kgp - khs - khz - kik - kiw - kiz - kje - kjn - kjs - kkc - kkl - klt - klv - kmg - kmh - kmk - kmo - kms - kmu - kne - knf - knj - knv - kos - kpf - kpg - kpj - kpr - kpw - kpx - kqa - kqc - kqf - kql - kqw - ksd - ksj - ksr - ktm - kto - kud - kue - kup - kvg - kvn - kwd - kwf - kwi - kwj - kyc - kyf - kyg - kyq - kyz - kze - lac - lat - lbb - lbk - lcm - leu - lex - lgl - lid - lif - lin - lit - llg - lug - luo - lww - maa - maj - mal - mam - maq - mar - mau - mav - maz - mbb - mbc - mbh - mbj - mbl - mbs - mbt - mca - mcb - mcd - mcf - mco - mcp - mcq - mcr - mdy - med - mee - mek - meq - met - meu - mgc - mgh - mgw - mhl - mib - mic - mie - mig - mih - mil - mio - mir - mit - miz - mjc - mkj - mkl - mkn - mks - mle - mlh - mlp - mmo - mmx - mna - mop - mox - mph - mpj - mpm - mpp - mps - mpt - mpx - mqb - mqj - msb - msc - msk - msm - msy - mti - mto - mux - muy - mva - mvn - mwc - mwe - mwf - mwp - mxb - mxp - mxq - mxt - mya - myk - myu - myw - myy - mzz - nab - naf - nak - nas - nay - nbq - nca - nch - ncj - ncl - ncu - ndg - ndj - nfa - ngp - ngu - nhe - nhg - nhi - nho - nhr - nhu - nhw - nhy - nif - nii - nin - nko - nld - nlg - nmw - nna - nnq - noa - nop - not - nou - npi - npl - nsn - nss - ntj - ntp - ntu - nuy - nvm - nwi - nya - nys - nyu - obo - okv - omw - ong - ons - ood - opm - ory - ote - otm - otn - otq - ots - pab - pad - pah - pan - pao - pes - pib - pio - pir - piu - pjt - pls - plu - pma - poe - poh - poi - pol - pon - por - poy - ppo - prf - pri - ptp - ptu - pwg - qub - quc - quf - quh - qul - qup - qvc - qve - qvh - qvm - qvn - qvs - qvw - qvz - qwh - qxh - qxn - qxo - rai - reg - rgu - rkb - rmc - rmy - ron - roo - rop - row - rro - ruf - rug - rus - rwo - sab - san - sbe - sbk - sbs - seh - sey - sgb - sgz - shj - shp - sim - sja - sll - smk - snc - snn - snp - snx - sny - som - soq - soy - spa - spl - spm - spp - sps - spy - sri - srm - srn - srp - srq - ssd - ssg - ssx - stp - sua - sue - sus - suz - swe - swh - swp - sxb - tac - taj - tam - tav - taw - tbc - tbf - tbg - tbl - tbo - tbz - tca - tcs - tcz - tdt - tee - tel - ter - tet - tew - tfr - tgk - tgl - tgo - tgp - tha - thd - tif - tim - tiw - tiy - tke - tku - tlf - tmd - tna - tnc - tnk - tnn - tnp - toc - tod - tof - toj - ton - too - top - tos - tpa - tpi - tpt - tpz - trc - tsw - ttc - tte - tuc - tue - tuf - tuo - tur - tvk - twi - txq - txu - tzj - tzo - ubr - ubu - udu - uig - ukr - uli - ulk - upv - ura - urb - urd - uri - urt - urw - usa - usp - uvh - uvl - vid - vie - viv - vmy - waj - wal - wap - wat - wbi - wbp - wed - wer - wim - wiu - wiv - wmt - wmw - wnc - wnu - wol - wos - wrk - wro - wrs - wsk - wuv - xav - xbi - xed - xla - xnn - xon - xsi - xtd - xtm - yaa - yad - yal - yap - yaq - yby - ycn - yka - yle - yml - yon - yor - yrb - yre - yss - yuj - yut - yuw - yva - zaa - zab - zac - zad - zai - zaj - zam - zao - zap - zar - zas - zat - zav - zaw - zca - zga - zia - ziw - zlm - zos - zpc - zpl - zpm - zpo - zpq - zpu - zpv - zpz - zsr - ztq - zty - zyp - be - br - cs - ch - zh - de - en - eo - fr - ht - he - hr - id - it - ja - la - nl - ru - sa - so - es - sr - sv - to - uk - vi license: - cc-by-4.0 - other multilinguality: - translation - multilingual pretty_name: biblenlp-corpus size_categories: - 1M<n<10M source_datasets: - original task_categories: - translation task_ids: [] --- # Dataset Card for BibleNLP Corpus ### Dataset Summary Partial and complete Bible translations in 833 languages, aligned by verse. ### Languages aai, aak, aau, aaz, abt, abx, aby, acf, acr, acu, adz, aer, aey, agd, agg, agm, agn, agr, agt, agu, aia, aii, aka, ake, alp, alq, als, aly, ame, amf, amk, amm, amn, amo, amp, amr, amu, amx, anh, anv, aoi, aoj, aom, aon, apb, ape, apn, apr, apu, apw, apz, arb, are, arl, arn, arp, asm, aso, ata, atb, atd, atg, att, auc, aui, auy, avt, awb, awk, awx, azb, azg, azz, bao, bba, bbb, bbr, bch, bco, bdd, bea, bef, bel, ben, beo, beu, bgs, bgt, bhg, bhl, big, bjk, bjp, bjr, bjv, bjz, bkd, bki, bkq, bkx, bla, blw, blz, bmh, bmk, bmr, bmu, bnp, boa, boj, bon, box, bpr, bps, bqc, bqp, bre, bsj, bsn, bsp, bss, buk, bus, bvd, bvr, bxh, byr, byx, bzd, bzh, bzj, caa, cab, cac, caf, cak, cao, cap, car, cav, cax, cbc, cbi, cbk, cbr, cbs, cbt, cbu, cbv, cco, ceb, cek, ces, cgc, cha, chd, chf, chk, chq, chz, cjo, cjv, ckb, cle, clu, cme, cmn, cni, cnl, cnt, cof, con, cop, cot, cpa, cpb, cpc, cpu, cpy, crn, crx, cso, csy, cta, cth, ctp, ctu, cub, cuc, cui, cuk, cut, cux, cwe, cya, daa, dad, dah, dan, ded, deu, dgc, dgr, dgz, dhg, dif, dik, dji, djk, djr, dob, dop, dov, dwr, dww, dwy, ebk, eko, emi, emp, eng, enq, epo, eri, ese, esk, etr, ewe, faa, fai, far, ffm, for, fra, fue, fuf, fuh, gah, gai, gam, gaw, gdn, gdr, geb, gfk, ghs, glk, gmv, gng, gnn, gnw, gof, grc, gub, guh, gui, guj, gul, gum, gun, guo, gup, gux, gvc, gvf, gvn, gvs, gwi, gym, gyr, hat, hau, haw, hbo, hch, heb, heg, hin, hix, hla, hlt, hmo, hns, hop, hot, hrv, hto, hub, hui, hun, hus, huu, huv, hvn, ian, ign, ikk, ikw, ilo, imo, inb, ind, ino, iou, ipi, isn, ita, iws, ixl, jac, jae, jao, jic, jid, jiv, jni, jpn, jvn, kan, kaq, kbc, kbh, kbm, kbq, kdc, kde, kdl, kek, ken, kew, kgf, kgk, kgp, khs, khz, kik, kiw, kiz, kje, kjn, kjs, kkc, kkl, klt, klv, kmg, kmh, kmk, kmo, kms, kmu, kne, knf, knj, knv, kos, kpf, kpg, kpj, kpr, kpw, kpx, kqa, kqc, kqf, kql, kqw, ksd, ksj, ksr, ktm, kto, kud, kue, kup, kvg, kvn, kwd, kwf, kwi, kwj, kyc, kyf, kyg, kyq, kyz, kze, lac, lat, lbb, lbk, lcm, leu, lex, lgl, lid, lif, lin, lit, llg, lug, luo, lww, maa, maj, mal, mam, maq, mar, mau, mav, maz, mbb, mbc, mbh, mbj, mbl, mbs, mbt, mca, mcb, mcd, mcf, mco, mcp, mcq, mcr, mdy, med, mee, mek, meq, met, meu, mgc, mgh, mgw, mhl, mib, mic, mie, mig, mih, mil, mio, mir, mit, miz, mjc, mkj, mkl, mkn, mks, mle, mlh, mlp, mmo, mmx, mna, mop, mox, mph, mpj, mpm, mpp, mps, mpt, mpx, mqb, mqj, msb, msc, msk, msm, msy, mti, mto, mux, muy, mva, mvn, mwc, mwe, mwf, mwp, mxb, mxp, mxq, mxt, mya, myk, myu, myw, myy, mzz, nab, naf, nak, nas, nay, nbq, nca, nch, ncj, ncl, ncu, ndg, ndj, nfa, ngp, ngu, nhe, nhg, nhi, nho, nhr, nhu, nhw, nhy, nif, nii, nin, nko, nld, nlg, nmw, nna, nnq, noa, nop, not, nou, npi, npl, nsn, nss, ntj, ntp, ntu, nuy, nvm, nwi, nya, nys, nyu, obo, okv, omw, ong, ons, ood, opm, ory, ote, otm, otn, otq, ots, pab, pad, pah, pan, pao, pes, pib, pio, pir, piu, pjt, pls, plu, pma, poe, poh, poi, pol, pon, por, poy, ppo, prf, pri, ptp, ptu, pwg, qub, quc, quf, quh, qul, qup, qvc, qve, qvh, qvm, qvn, qvs, qvw, qvz, qwh, qxh, qxn, qxo, rai, reg, rgu, rkb, rmc, rmy, ron, roo, rop, row, rro, ruf, rug, rus, rwo, sab, san, sbe, sbk, sbs, seh, sey, sgb, sgz, shj, shp, sim, sja, sll, smk, snc, snn, snp, snx, sny, som, soq, soy, spa, spl, spm, spp, sps, spy, sri, srm, srn, srp, srq, ssd, ssg, ssx, stp, sua, sue, sus, suz, swe, swh, swp, sxb, tac, taj, tam, tav, taw, tbc, tbf, tbg, tbl, tbo, tbz, tca, tcs, tcz, tdt, tee, tel, ter, tet, tew, tfr, tgk, tgl, tgo, tgp, tha, thd, tif, tim, tiw, tiy, tke, tku, tlf, tmd, tna, tnc, tnk, tnn, tnp, toc, tod, tof, toj, ton, too, top, tos, tpa, tpi, tpt, tpz, trc, tsw, ttc, tte, tuc, tue, tuf, tuo, tur, tvk, twi, txq, txu, tzj, tzo, ubr, ubu, udu, uig, ukr, uli, ulk, upv, ura, urb, urd, uri, urt, urw, usa, usp, uvh, uvl, vid, vie, viv, vmy, waj, wal, wap, wat, wbi, wbp, wed, wer, wim, wiu, wiv, wmt, wmw, wnc, wnu, wol, wos, wrk, wro, wrs, wsk, wuv, xav, xbi, xed, xla, xnn, xon, xsi, xtd, xtm, yaa, yad, yal, yap, yaq, yby, ycn, yka, yle, yml, yon, yor, yrb, yre, yss, yuj, yut, yuw, yva, zaa, zab, zac, zad, zai, zaj, zam, zao, zap, zar, zas, zat, zav, zaw, zca, zga, zia, ziw, zlm, zos, zpc, zpl, zpm, zpo, zpq, zpu, zpv, zpz, zsr, ztq, zty, zyp ## Dataset Structure ### Data Fields **translation** - **languages** - an N length list of the languages of the translations, sorted alphabetically - **translation** - an N length list with the translations each corresponding to the language specified in the above field **files** - **lang** - an N length list of the languages of the files, in order of input - **file** - an N length list of the filenames from the corpus on github, each corresponding with the lang above **ref** - the verse(s) contained in the record, as a list, with each represented with: ``<a three letter book code> <chapter number>:<verse number>`` **licenses** - an N length list of licenses, corresponding to the list of files above **copyrights** - information on copyright holders, corresponding to the list of files above ### Usage The dataset loading script requires installation of tqdm, ijson, and numpy Specify the languages to be paired with a list and ISO 693-3 language codes, such as ``languages = ['eng', 'fra']``. By default, the script will return individual verse pairs, as well as verses covering a full range. If only the individual verses is desired, use ``pair='single'``. If only the maximum range pairing is desired use ``pair='range'`` (for example, if one text uses the verse range covering GEN 1:1-3, all texts would return only the full length pairing). ## Sources https://github.com/BibleNLP/ebible-corpus
提供机构:
timothyhartzog
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作