eswardivi/Aksharantar
收藏Hugging Face2024-03-09 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/eswardivi/Aksharantar
下载链接
链接失效反馈官方服务:
资源简介:
Aksharantar数据集是一个多语言数据集,包含多种印度语言的词汇对,每个词汇对由native word(本地词汇)和english word(英语词汇)组成。数据集分为训练集、验证集和测试集,每个集的大小和示例数量都有详细记录。数据集支持的语言包括阿萨姆语、孟加拉语、博多语、多格里语、古吉拉特语、印地语、卡纳达语、克什米尔语、孔卡尼语、迈蒂利语、马拉雅拉姆语、马拉地语、曼尼普尔语、尼泊尔语、奥里亚语、旁遮普语、梵语、信德语、泰米尔语、泰卢固语和乌尔都语。数据集采用CC许可证。
Aksharantar数据集是一个多语言数据集,包含多种印度语言的词汇对,每个词汇对由native word(本地词汇)和english word(英语词汇)组成。数据集分为训练集、验证集和测试集,每个集的大小和示例数量都有详细记录。数据集支持的语言包括阿萨姆语、孟加拉语、博多语、多格里语、古吉拉特语、印地语、卡纳达语、克什米尔语、孔卡尼语、迈蒂利语、马拉雅拉姆语、马拉地语、曼尼普尔语、尼泊尔语、奥里亚语、旁遮普语、梵语、信德语、泰米尔语、泰卢固语和乌尔都语。数据集采用CC许可证。
提供机构:
eswardivi
原始信息汇总
数据集概述
数据集配置
asm
- 特征:
- native word: string
- english word: string
- 分割:
- train: 7246553 bytes, 178630 examples
- valid: 155473 bytes, 3788 examples
- test: 215853 bytes, 5506 examples
- 下载大小: 4806305 bytes
- 数据集大小: 7617879 bytes
ben
- 特征:
- native word: string
- english word: string
- 分割:
- train: 53625021 bytes, 1231428 examples
- valid: 425704 bytes, 11276 examples
- test: 536999 bytes, 14167 examples
- 下载大小: 33797771 bytes
- 数据集大小: 54587724 bytes
brx
- 特征:
- native word: string
- english word: string
- 分割:
- train: 1549176 bytes, 35618 examples
- valid: 127620 bytes, 3068 examples
- test: 158976 bytes, 4081 examples
- 下载大小: 1041579 bytes
- 数据集大小: 1835772 bytes
doi
- 特征:
- native word: string
- english word: string
- 分割:
- train: 50960 bytes, 1584 examples
- test: 62772 bytes, 2000 examples
- 下载大小: 75793 bytes
- 数据集大小: 113732 bytes
guj
- 特征:
- native word: string
- english word: string
- 分割:
- train: 48426490 bytes, 1143212 examples
- valid: 457631 bytes, 12419 examples
- test: 690823 bytes, 18077 examples
- 下载大小: 31145762 bytes
- 数据集大小: 49574944 bytes
hin
- 特征:
- native word: string
- english word: string
- 分割:
- train: 52038534 bytes, 1299155 examples
- valid: 223121 bytes, 6357 examples
- test: 368927 bytes, 10112 examples
- 下载大小: 34053230 bytes
- 数据集大小: 52630582 bytes
kan
- 特征:
- native word: string
- english word: string
- 分割:
- train: 158229246 bytes, 2906728 examples
- valid: 318367 bytes, 7025 examples
- test: 534114 bytes, 11380 examples
- 下载大小: 91749260 bytes
- 数据集大小: 159081727 bytes
kas
- 特征:
- native word: string
- english word: string
- 分割:
- train: 1310641 bytes, 46635 examples
- valid: 117768 bytes, 4456 examples
- test: 175480 bytes, 6908 examples
- 下载大小: 1175597 bytes
- 数据集大小: 1603889 bytes
kok
- 特征:
- native word: string
- english word: string
- 分割:
- train: 29164783 bytes, 612525 examples
- valid: 154507 bytes, 3502 examples
- test: 194477 bytes, 5042 examples
- 下载大小: 17786669 bytes
- 数据集大小: 29513767 bytes
mai
- 特征:
- native word: string
- english word: string
- 分割:
- train: 11088031 bytes, 282639 examples
- valid: 145082 bytes, 3790 examples
- test: 195832 bytes, 5449 examples
- 下载大小: 7353930 bytes
- 数据集大小: 11428945 bytes
mal
- 特征:
- native word: string
- english word: string
- 分割:
- train: 255792875 bytes, 4100621 examples
- valid: 364734 bytes, 7613 examples
- test: 613721 bytes, 12451 examples
- 下载大小: 141329273 bytes
- 数据集大小: 256771330 bytes
mar
- 特征:
- native word: string
- english word: string
- 分割:
- train: 70379039 bytes, 1452748 examples
- valid: 306473 bytes, 7646 examples
- test: 501632 bytes, 12190 examples
- 下载大小: 42714793 bytes
- 数据集大小: 71187144 bytes
mni
- 特征:
- native word: string
- english word: string
- 分割:
- train: 359476 bytes, 10060 examples
- valid: 112250 bytes, 3260 examples
- test: 166708 bytes, 4889 examples
- 下载大小: 384776 bytes
- 数据集大小: 638434 bytes
nep
- 特征:
- native word: string
- english word: string
- 分割:
- train: 115703649 bytes, 2397414 examples
- valid: 128685 bytes, 2804 examples
- test: 161326 bytes, 4101 examples
- 下载大小: 70685486 bytes
- 数据集大小: 115993660 bytes
ori
- 特征:
- native word: string
- english word: string
- 分割:
- train: 15223026 bytes, 346492 examples
- valid: 133701 bytes, 3093 examples
- test: 168260 bytes, 4228 examples
- 下载大小: 9415265 bytes
- 数据集大小: 15524987 bytes
pan
- 特征:
- native word: string
- english word: string
- 分割:
- train: 18625789 bytes, 514724 examples
- valid: 280876 bytes, 8880 examples
- test: 363793 bytes, 11237 examples
- 下载大小: 12634738 bytes
- 数据集大小: 19270458 bytes
san
- 特征:
- native word: string
- english word: string
- 分割:
- train: 103031038 bytes, 1813369 examples
- valid: 175843 bytes, 3398 examples
- test: 218125 bytes, 5302 examples
- 下载大小: 61369090 bytes
- 数据集大小: 103425006 bytes
sid
- 特征:
- native word: string
- english word: string
- 分割:
- train: 1590769 bytes, 59715 examples
- valid: 207035 bytes, 8375 examples
- test: 153505 bytes, 6407 examples
- 下载大小: 1471769 bytes
- 数据集大小: 1951309 bytes
tam
- 特征:
- native word: string
- english word: string
- 分割:
- train: 189446572 bytes, 3230902 examples
- valid: 405125 bytes, 8824 examples
- test: 512678 bytes, 11499 examples
- 下载大小: 103185235 bytes
- 数据集大小: 190364375 bytes
tel
- 特征:
- native word: string
- english word: string
- 分割:
- train: 125668188 bytes, 2429562 examples
- valid: 327494 bytes, 7681 examples
- test: 433170 bytes, 10260 examples
- 下载大小: 75120677 bytes
- 数据集大小: 126428852 bytes
urd
- 特征:
- native word: string
- english word: string
- 分割:
- train: 21546318 bytes, 699024 examples
- valid: 317819 bytes, 12419 examples
- test: 384213 bytes, 14878 examples
- 下载大小: 16824949 bytes
- 数据集大小: 22248350 bytes



