five

ilmariky/SQuAD_v2_fi

收藏
Hugging Face2022-10-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ilmariky/SQuAD_v2_fi
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - crowdsourced language_creators: - crowdsourced - found language: - fi license: - gpl-3.0 multilinguality: - monolingual size_categories: - 10K<n<100K task_categories: - question-answering task_ids: - extractive-qa pretty_name: SQuAD-v2-fi tags: - question-generation train-eval-index: - config: plain_text task: question-answering task_id: extractive_question_answering splits: train_split: train eval_split: validation col_mapping: question: question context: context answers: text: text answer_start: answer_start --- # Dataset Card for "squad-v2-fi" ### Dataset Summary Machine translated and normalized Finnish version of the SQuAD-v2.0 dataset. Details about the translation and normalization processes can be found [here](https://helda.helsinki.fi/handle/10138/344973). Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. ## Dataset Structure ### Data Instances Example data: ``` { "title": "Josefina (Ruotsin kuningatar)", "paragraphs": [ { "qas": [ { "question": "Milloin Josefina Maximiliana Eugenia Napoleona av Leuchtenberg syntyi?", "id": "2149392872931478957", "answers": [ { "answer_start": 59, "text": "14. maaliskuuta 1807" } ], "is_impossible": false } ], "context": "Josefina Maximiliana Eugenia Napoleona av Leuchtenberg (14. maaliskuuta 1807 − 7. kesäkuuta 1876, Tukholma) oli Ruotsi-Norjan kuningatar ja kuningas Oskar I:n puoliso." } ] } ``` ### Data Fields The data fields are the same among all splits. #### plain_text - `id`: a `string` feature. - `title`: a `string` feature. - `context`: a `string` feature. - `question`: a `string` feature. - `answers`: a dictionary feature containing: - `text`: a `string` feature. - `answer_start`: a `int32` feature. ### Data Splits | name |train|validation| |----------|----:|---------:| |plain_text|92383| 8737| ### Citation Information ``` @MastersThesis{3241c198b3f147faacbc6d8b64ed9419, author = "Kylli{\"a}inen, {Ilmari}", title = "Neural Factoid Question Answering and Question Generation for Finnish", language = "en", address = "Helsinki, Finland", school = "University of Helsinki", year = "2022", month = "jun", day = "15", url = "https://helda.helsinki.fi/handle/10138/344973" } ```
提供机构:
ilmariky
原始信息汇总

数据集卡片 "squad-v2-fi"

数据集概述

机器翻译和规范化后的芬兰语版本的SQuAD-v2.0数据集。

数据集结构

数据实例

示例数据: json { "title": "Josefina (Ruotsin kuningatar)", "paragraphs": [ { "qas": [ { "question": "Milloin Josefina Maximiliana Eugenia Napoleona av Leuchtenberg syntyi?", "id": "2149392872931478957", "answers": [ { "answer_start": 59, "text": "14. maaliskuuta 1807" } ], "is_impossible": false } ], "context": "Josefina Maximiliana Eugenia Napoleona av Leuchtenberg (14. maaliskuuta 1807 − 7. kesäkuuta 1876, Tukholma) oli Ruotsi-Norjan kuningatar ja kuningas Oskar I:n puoliso." } ] }

数据字段

所有分割的数据字段相同。

plain_text

  • id: 字符串特征。
  • title: 字符串特征。
  • context: 字符串特征。
  • question: 字符串特征。
  • answers: 包含以下内容的字典特征:
    • text: 字符串特征。
    • answer_start: 整数特征。

数据分割

name train validation
plain_text 92383 8737

引用信息

plaintext @MastersThesis{3241c198b3f147faacbc6d8b64ed9419, author = "Kylli{"a}inen, {Ilmari}", title = "Neural Factoid Question Answering and Question Generation for Finnish", language = "en", address = "Helsinki, Finland", school = "University of Helsinki", year = "2022", month = "jun", day = "15", url = "https://helda.helsinki.fi/handle/10138/344973" }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作