Facebook bAbI Tasks for Malayalam Language
收藏doi.org2025-01-21 收录
下载链接:
http://doi.org/10.17632/h26g4n9w5j.1
下载链接
链接失效反馈官方服务:
资源简介:
A Malayalam Question Answering dataset of 5,000 training samples and 5,000 testing samples was generated by translating Facebook bAbI tasks. Facebook's bAbI tasks was originally created in English, some of the languages it has been translated are French, German, Hindi, Chinese, Russian. Twenty fictitious tasks that test a system's capacity for responding to a range of themes, including text comprehension and reasoning, are included in the dataset. Five task-oriented usability questions with comparable sentence patterns are also included in the collection. The questions here range in difficulty. Every job has 1000 test samples and 1000 training samples in the dataset. we created the dataset for the proposed work by utilizing the bAbI dataset to translate the English dataset into Malayalam for five tasks (tasks 1, 4, 11, 12, and 13), represented as tasks 1 through 5. Titles such as "Single Supporting Facts," "Two Argument Relations," "Basic Coreference," "Conjunction," and "Compound Coreference" relate to the tasks. Every sample in the dataset comprises a series of statements (sometimes called stories) about people's movements around things, a question, a suitable answer.
Tasks:
Task 1: Single supporting fact: This task tests whether a model can identify a single important fact from a story to answer a question. The story usually contains several sentences, but only one sentence is directly useful in answering the question.
Task 2: Relationships with two arguments: This task involves understanding the relationship between two entities. The model must infer relationships between pairs of objects, people or places.
Task 3: Core co-reference: Co-reference resolution is the task of linking pronouns or phrases to the correct entities. In this task, the model must resolve simple pronominal references.
Task 4: Conjunctions: This task tests the model's ability to understand sentences in which several actions or facts are joined by conjunctions such as "and" or "or". The model must process these linked statements to answer the questions correctly.
Task 5: Compound Reference: This task is more complex because it requires the model to solve the conjunctions in the sentence with composite entities or more complex structures.
本数据集为马来语问答数据集,包含5,000个训练样本和5,000个测试样本,由将Facebook的bAbI任务翻译而成。bAbI任务最初以英语创建,其翻译的语言包括法语、德语、印地语、汉语和俄语。数据集中包含二十项虚构任务,旨在测试系统对不同主题的反应能力,包括文本理解和推理能力。此外,还包括五个具有相似句式的任务导向可用性问题。问题难度各异,每个任务在数据集中均包含1,000个测试样本和1,000个训练样本。我们通过利用bAbI数据集,将英语数据集翻译成马来语,针对五个任务(任务1、4、11、12和13)进行了翻译,对应的数据集任务编号为1至5。如“单一支持事实”、“两个论据关系”、“基本指代”、“并列”和“复合指代”等标题与任务相关。数据集中的每个样本都包含一系列关于人物在事物周围移动的陈述(有时称为故事)、一个问题以及一个合适的答案。
任务:
任务1:单一支持事实:此任务旨在测试模型能否从故事中识别出单个重要事实以回答问题。故事通常包含多个句子,但只有一个句子对回答问题有直接帮助。
任务2:两个论据的关系:此任务涉及理解两个实体之间的关系。模型必须推断成对的对象、人物或地点之间的关系。
任务3:核心指代:共指消解是将代词或短语与正确的实体相联系的任务。在此任务中,模型必须解决简单的代词引用。
任务4:并列连词:此任务测试模型理解由连词“和”或“或”等连接的多个行为或事实的句子能力。模型必须处理这些相关联的陈述以正确回答问题。
任务5:复合指代:此任务更为复杂,因为它要求模型解决句子中的复合实体或更复杂的结构中的连词。
提供机构:
Mendeley Data



