LLukas22/lfqa_preprocessed

Name: LLukas22/lfqa_preprocessed
Creator: LLukas22
Published: 2023-01-10 14:21:56
License: 暂无描述

Hugging Face2023-01-10 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/LLukas22/lfqa_preprocessed

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - question-answering - sentence-similarity language: - en size_categories: - 100K<n<1M --- # Dataset Card for "lfqa_preprocessed" ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Additional Information](#additional-information) - [Licensing Information](#licensing-information) ## Dataset Description - **Homepage:** [https://towardsdatascience.com/long-form-qa-beyond-eli5-an-updated-dataset-and-approach-319cb841aabb](https://towardsdatascience.com/long-form-qa-beyond-eli5-an-updated-dataset-and-approach-319cb841aabb) ### Dataset Summary This is a simplified version of [vblagoje's](https://huggingface.co/vblagoje) *[lfqa_support_docs](https://huggingface.co/datasets/vblagoje/lfqa_support_docs)* and *[lfqa](https://huggingface.co/datasets/vblagoje/lfqa)* datasets. It was generated by me to have a more straight forward way to train Seq2Seq models on context based long form question answering tasks. ## Dataset Structure ### Data Instances An example of 'train' looks as follows. ```json { "question": "what's the difference between a forest and a wood?", "answer": "They're used interchangeably a lot. You'll get different answers from different resources, but the ...", "context": [ "Wood is divided, according to its botanical origin, into two kinds: softwoods, ...", "Processing and products differs especially with regard to the distinction between softwood and hardwood ..." ] } ``` ### Data Fields The data fields are the same among all splits. - `question`: a `string` feature. - `answer`: a `string` feature. - `context`: a list feature containing `string` features. ### Data Splits | name |train|validation| |----------|----:|---------:| | |226147| 3020| ## Additional Information ### Licensing Information This dataset is distributed under the MIT licence.

提供机构：

LLukas22

原始信息汇总

数据集概述

数据集描述

数据集总结

本数据集是vblagoje的*lfqa_support_docs和lfqa*数据集的简化版本。
旨在为基于上下文的长形式问答任务训练Seq2Seq模型提供更直接的方式。

数据集结构

数据实例

示例包括：
- question: 问题，字符串类型。
- answer: 答案，字符串类型。
- context: 上下文，列表类型，包含字符串类型元素。

数据字段

question: 字符串类型。
answer: 字符串类型。
context: 列表类型，包含字符串类型元素。

数据分割

train: 226147条记录。
validation: 3020条记录。

附加信息

许可信息

本数据集遵循MIT许可。

5,000+

优质数据集

54 个

任务类型

进入经典数据集