TIGQA
收藏arXiv2024-04-26 更新2024-06-21 收录
下载链接:
https://github.com/hailaykidu/TigQA-Datasets
下载链接
链接失效反馈官方服务:
资源简介:
TIGQA数据集是由L3S Research Center和IIT Kharagpur联合开发的一个专门针对Tigrinya语言的问答数据集。该数据集包含2685个问题-答案对,涉及122个不同主题,如气候、水和交通等。这些数据来源于537个公开可访问的Tigrinya和生物学书籍段落。TIGQA数据集的创建过程涉及使用光学字符识别技术从扫描的书籍中提取文本,并由专家进行注释。该数据集主要应用于教育领域,旨在通过机器阅读理解技术提高Tigrinya语言地区的教育质量,并推动低资源语言的自然语言处理研究。
The TIGQA dataset is a specialized question-answering dataset targeting the Tigrinya language, jointly developed by the L3S Research Center and IIT Kharagpur. It comprises 2685 question-answer pairs spanning 122 distinct topics including climate, water, and transportation. These data are sourced from 537 publicly accessible passages from Tigrinya and biology books. The development of the TIGQA dataset involves extracting text from scanned books using optical character recognition (OCR) technology, followed by expert annotation. This dataset is primarily applied in the field of education, with the dual goals of improving educational quality in Tigrinya-speaking regions via machine reading comprehension techniques, and promoting natural language processing research on low-resource languages.
提供机构:
L3S Research Center, Leibniz University Hannover, Germany †Department of Computer Science and Engineering, IIT Kharagpur, India
创建时间:
2024-04-26



