emozilla/qasper-pruned-llama-gptneox-8k

Name: emozilla/qasper-pruned-llama-gptneox-8k
Creator: emozilla
Published: 2023-04-29 05:05:12
License: 暂无描述

Hugging Face2023-04-29 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/emozilla/qasper-pruned-llama-gptneox-8k

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: id dtype: string - name: title dtype: string - name: abstract dtype: string - name: full_text sequence: - name: section_name dtype: string - name: paragraphs list: string - name: qas sequence: - name: question dtype: string - name: question_id dtype: string - name: nlp_background dtype: string - name: topic_background dtype: string - name: paper_read dtype: string - name: search_query dtype: string - name: question_writer dtype: string - name: answers sequence: - name: answer struct: - name: unanswerable dtype: bool - name: extractive_spans sequence: string - name: yes_no dtype: bool - name: free_form_answer dtype: string - name: evidence sequence: string - name: highlighted_evidence sequence: string - name: annotation_id dtype: string - name: worker_id dtype: string - name: figures_and_tables sequence: - name: caption dtype: string - name: file dtype: string splits: - name: train num_bytes: 24427288.12162162 num_examples: 762 - name: validation num_bytes: 9089856.918149466 num_examples: 258 - name: test num_bytes: 13925108.735576924 num_examples: 374 download_size: 20505240 dataset_size: 47442253.77534801 --- # Dataset Card for "emozillaqasper-pruned-llama-gptneox-8k" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

emozilla

原始信息汇总

数据集概述

数据集特征

id：字符串类型
title：字符串类型
abstract：字符串类型
full_text：序列类型，包含：
- section_name：字符串类型
- paragraphs：字符串列表
qas：序列类型，包含：
- question：字符串类型
- question_id：字符串类型
- nlp_background：字符串类型
- topic_background：字符串类型
- paper_read：字符串类型
- search_query：字符串类型
- question_writer：字符串类型
- answers：序列类型，包含：
  - answer：结构类型，包含：
    - unanswerable：布尔类型
    - extractive_spans：字符串序列
    - yes_no：布尔类型
    - free_form_answer：字符串类型
    - evidence：字符串序列
    - highlighted_evidence：字符串序列
  - annotation_id：字符串类型
  - worker_id：字符串类型
figures_and_tables：序列类型，包含：
- caption：字符串类型
- file：字符串类型

数据集分割

train：762个样本，占用24427288.12162162字节
validation：258个样本，占用9089856.918149466字节
test：374个样本，占用13925108.735576924字节

数据集大小

下载大小：20505240字节
数据集大小：47442253.77534801字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集