ilmariky/WikiQA-100-fi

Name: ilmariky/WikiQA-100-fi
Creator: ilmariky
Published: 2022-10-25 15:47:21
License: 暂无描述

Hugging Face2022-10-25 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/ilmariky/WikiQA-100-fi

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - fi license: - gpl-3.0 multilinguality: - monolingual size_categories: - n<1k task_categories: - question-answering task_ids: - extractive-qa pretty_name: WikiQA-100-fi tags: - question-generation train-eval-index: - config: plain_text task: question-answering task_id: extractive_question_answering splits: train_split: train eval_split: validation col_mapping: question: question context: context answers: text: text answer_start: answer_start --- # Dataset Card for "WikiQA-100-fi" ### Dataset Summary WikiQA-100-fi dataset contains 100 questions related to Finnish Wikipedia articles. The dataset is in the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) format, and there are 10 questions for each category identified by the authors of SQuAD. Unlike SQuAD2.0, WikiQA-100-fi contains only answerable questions. The dataset is tiny compared to actual QA test sets, but it still gives an impression of the models' performance on purely native text data collected by a native speaker. The dataset was originally created as an evaluation set for models that had been mostly fine-tuned with automatically translated QA data. More information about the dataset and models created with it can be found [here](https://helda.helsinki.fi/handle/10138/344973). ## Dataset Structure ### Data Instances Example data: ``` { "title": "Folksonomia", "paragraphs": [ { "qas": [ { "question": "Minkälaista sisältöä käyttäjät voivat luokitella folksonomian avulla?", "id": "6t4ufel624", "answers": [ { "text": "www-sivuja, valokuvia ja linkkejä", "answer_start": 155 } ], "is_impossible": false } ], "context": "Folksonomia (engl. folksonomy) on yhteisöllisesti tuotettu, avoin luokittelujärjestelmä, jonka avulla internet-käyttäjät voivat luokitella sisältöä, kuten www-sivuja, valokuvia ja linkkejä. Etymologisesti folksonomia on peräisin sanojen \"folk\" (suom. väki) ja \"taxonomy\" (suom. taksonomia) leikkimielisestä yhdistelmästä." } ] } ``` ### Data Fields #### plain_text - `id`: a `string` feature. - `title`: a `string` feature. - `context`: a `string` feature. - `question`: a `string` feature. - `answers`: a dictionary feature containing: - `text`: a `string` feature. - `answer_start`: a `int32` feature. ### Data Splits | name | test| |----------|----:| |plain_text| 100| ### Citation Information ``` @MastersThesis{3241c198b3f147faacbc6d8b64ed9419, author = "Kylli{\"a}inen, {Ilmari}", title = "Neural Factoid Question Answering and Question Generation for Finnish", language = "en", address = "Helsinki, Finland", school = "University of Helsinki", year = "2022", month = "jun", day = "15", url = "https://helda.helsinki.fi/handle/10138/344973" } ```

提供机构：

ilmariky

原始信息汇总

WikiQA-100-fi 数据集概述

数据集总结

WikiQA-100-fi 数据集包含100个与芬兰语维基百科文章相关的问题。该数据集采用 SQuAD 格式，每个类别有10个问题。与SQuAD2.0不同，WikiQA-100-fi 仅包含可回答的问题。尽管数据集规模较小，但它仍能反映模型在纯本土文本数据上的表现。

数据集结构

数据实例

数据实例包括：

title: 文章标题，字符串类型。
context: 文章段落，字符串类型。
question: 问题，字符串类型。
answers: 答案，包含：
- text: 答案文本，字符串类型。
- answer_start: 答案在文本中的起始位置，整数类型。

数据字段

id: 字符串类型。
title: 字符串类型。
context: 字符串类型。
question: 字符串类型。
answers: 字典类型，包含：
- text: 字符串类型。
- answer_start: 整数类型。

数据分割

plain_text: 包含100个实例。

引用信息

@MastersThesis{3241c198b3f147faacbc6d8b64ed9419, author = "Kylli{"a}inen, {Ilmari}", title = "Neural Factoid Question Answering and Question Generation for Finnish", language = "en", address = "Helsinki, Finland", school = "University of Helsinki", year = "2022", month = "jun", day = "15", url = "https://helda.helsinki.fi/handle/10138/344973" }

5,000+

优质数据集

54 个

任务类型

进入经典数据集