Hindi and Marathi Question-Answering Dataset

Name: Hindi and Marathi Question-Answering Dataset
Creator: 南加州大学
Published: 2024-02-17 15:02:26
License: 暂无描述

arXiv2024-02-17 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2308.09862v3

下载链接

链接失效反馈

官方服务：

资源简介：

本研究针对印度的两种低资源语言——印地语和马拉地语，开发了一个大规模的问答数据集。该数据集包含28,000个样本，旨在解决这两种语言在构建高效问答系统时面临的数据稀缺问题。数据集通过将SQuAD 2.0数据集翻译成印地语和马拉地语创建，适用于自然语言理解和机器学习应用，特别是针对印地语和马拉地语社区的需求。创建过程中，研究团队采用了一种新颖的方法来确定答案在上下文中的准确索引，确保了数据集的质量和实用性。

This study develops a large-scale question answering (QA) dataset targeting two low-resource languages of India: Hindi and Marathi. This dataset consists of 28,000 samples, aiming to address the data scarcity issue faced when building high-performance QA systems for these two languages. Constructed by translating the SQuAD 2.0 dataset into Hindi and Marathi, the dataset is suitable for natural language understanding (NLU) and machine learning applications, particularly catering to the needs of Hindi and Marathi-speaking communities. During the dataset creation process, the research team adopted a novel method to determine the accurate index of the answer within the context, ensuring the quality and practicality of the dataset.

提供机构：

南加州大学

创建时间：

2023-08-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集