five

saifulhaq9/indicmarco

收藏
Hugging Face2024-01-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/saifulhaq9/indicmarco
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit --- # IndicIRSuite: Multilingual Dataset and Neural Information Models for Indian Languages Paper link: https://arxiv.org/abs/2312.09508 Dataset link: https://huggingface.co/datasets/saifulhaq9/indicmarco Model link: https://huggingface.co/saifulhaq9/indiccolbert ## Contributors & Acknowledgements Key Contributors and Team Members: Saiful Haq, Ashutosh Sharma, Pushpak Bhattacharyya ## Kindly cite our paper, If you are are using our datasets or models: @article{haq2023indicirsuite, title={IndicIRSuite: Multilingual Dataset and Neural Information Models for Indian Languages}, author={Haq, Saiful and Sharma, Ashutosh and Bhattacharyya, Pushpak}, journal={arXiv preprint arXiv:2312.09508}, year={2023} } ## About This repository contains query.train.tsv and collection.tsv files in 11 Indian Languages, to train multilingual IR models. ## Language Code to Language Mapping asm_Beng: Assamese Language ben_Beng: Bengali Language guj_Gujr: Gujarati Language hin_Deva: Hindi Language kan_Knda: Kannada Language mal_Mlym: Malyalam Language mar_Deva: Marathi Language ory_Orya: Oriya Language pan_Guru: Punjabi Language tam_Taml: Tamil Language tel_Telu: Telugu Language
提供机构:
saifulhaq9
原始信息汇总

IndicIRSuite: Multilingual Dataset and Neural Information Models for Indian Languages

数据集链接

Dataset link

模型链接

Model link

关于数据集

该数据集包含11种印度语言的query.train.tsvcollection.tsv文件,用于训练多语言信息检索模型。

语言代码与语言映射

  • asm_Beng: 阿萨姆语
  • ben_Beng: 孟加拉语
  • guj_Gujr: 古吉拉特语
  • hin_Deva: 印地语
  • kan_Knda: 卡纳达语
  • mal_Mlym: 马拉雅拉姆语
  • mar_Deva: 马拉地语
  • ory_Orya: 奥里亚语
  • pan_Guru: 旁遮普语
  • tam_Taml: 泰米尔语
  • tel_Telu: 泰卢固语
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作