saifulhaq9/indicmarco
收藏Hugging Face2024-01-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/saifulhaq9/indicmarco
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
---
# IndicIRSuite: Multilingual Dataset and Neural Information Models for Indian Languages
Paper link: https://arxiv.org/abs/2312.09508
Dataset link: https://huggingface.co/datasets/saifulhaq9/indicmarco
Model link: https://huggingface.co/saifulhaq9/indiccolbert
## Contributors & Acknowledgements
Key Contributors and Team Members: Saiful Haq, Ashutosh Sharma, Pushpak Bhattacharyya
## Kindly cite our paper, If you are are using our datasets or models:
@article{haq2023indicirsuite,
title={IndicIRSuite: Multilingual Dataset and Neural Information Models for Indian Languages},
author={Haq, Saiful and Sharma, Ashutosh and Bhattacharyya, Pushpak},
journal={arXiv preprint arXiv:2312.09508},
year={2023}
}
## About
This repository contains query.train.tsv and collection.tsv files in 11 Indian Languages,
to train multilingual IR models.
## Language Code to Language Mapping
asm_Beng: Assamese Language
ben_Beng: Bengali Language
guj_Gujr: Gujarati Language
hin_Deva: Hindi Language
kan_Knda: Kannada Language
mal_Mlym: Malyalam Language
mar_Deva: Marathi Language
ory_Orya: Oriya Language
pan_Guru: Punjabi Language
tam_Taml: Tamil Language
tel_Telu: Telugu Language
提供机构:
saifulhaq9
原始信息汇总
IndicIRSuite: Multilingual Dataset and Neural Information Models for Indian Languages
数据集链接
模型链接
关于数据集
该数据集包含11种印度语言的query.train.tsv和collection.tsv文件,用于训练多语言信息检索模型。
语言代码与语言映射
asm_Beng: 阿萨姆语ben_Beng: 孟加拉语guj_Gujr: 古吉拉特语hin_Deva: 印地语kan_Knda: 卡纳达语mal_Mlym: 马拉雅拉姆语mar_Deva: 马拉地语ory_Orya: 奥里亚语pan_Guru: 旁遮普语tam_Taml: 泰米尔语tel_Telu: 泰卢固语



