five

LLukas22/lfqa_preprocessed

收藏
Hugging Face2023-01-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/LLukas22/lfqa_preprocessed
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - question-answering - sentence-similarity language: - en size_categories: - 100K<n<1M --- # Dataset Card for "lfqa_preprocessed" ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Additional Information](#additional-information) - [Licensing Information](#licensing-information) ## Dataset Description - **Homepage:** [https://towardsdatascience.com/long-form-qa-beyond-eli5-an-updated-dataset-and-approach-319cb841aabb](https://towardsdatascience.com/long-form-qa-beyond-eli5-an-updated-dataset-and-approach-319cb841aabb) ### Dataset Summary This is a simplified version of [vblagoje's](https://huggingface.co/vblagoje) *[lfqa_support_docs](https://huggingface.co/datasets/vblagoje/lfqa_support_docs)* and *[lfqa](https://huggingface.co/datasets/vblagoje/lfqa)* datasets. It was generated by me to have a more straight forward way to train Seq2Seq models on context based long form question answering tasks. ## Dataset Structure ### Data Instances An example of 'train' looks as follows. ```json { "question": "what's the difference between a forest and a wood?", "answer": "They're used interchangeably a lot. You'll get different answers from different resources, but the ...", "context": [ "Wood is divided, according to its botanical origin, into two kinds: softwoods, ...", "Processing and products differs especially with regard to the distinction between softwood and hardwood ..." ] } ``` ### Data Fields The data fields are the same among all splits. - `question`: a `string` feature. - `answer`: a `string` feature. - `context`: a list feature containing `string` features. ### Data Splits | name |train|validation| |----------|----:|---------:| | |226147| 3020| ## Additional Information ### Licensing Information This dataset is distributed under the MIT licence.
提供机构:
LLukas22
原始信息汇总

数据集概述

数据集描述

数据集总结

  • 本数据集是vblagoje的*lfqa_support_docslfqa*数据集的简化版本。
  • 旨在为基于上下文的长形式问答任务训练Seq2Seq模型提供更直接的方式。

数据集结构

数据实例

  • 示例包括:
    • question: 问题,字符串类型。
    • answer: 答案,字符串类型。
    • context: 上下文,列表类型,包含字符串类型元素。

数据字段

  • question: 字符串类型。
  • answer: 字符串类型。
  • context: 列表类型,包含字符串类型元素。

数据分割

  • train: 226147条记录。
  • validation: 3020条记录。

附加信息

许可信息

  • 本数据集遵循MIT许可。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作