gigant/robust_long_abstractive_human_annotation

Name: gigant/robust_long_abstractive_human_annotation
Creator: gigant
Published: 2024-04-25 22:03:55
License: 暂无描述

Hugging Face2024-04-25 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/gigant/robust_long_abstractive_human_annotation

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: model_type dtype: string - name: dataset dtype: string - name: factual_consistency dtype: float64 - name: relevance dtype: float64 - name: model_summary dtype: string - name: dataset_id dtype: string splits: - name: test num_bytes: 766812 num_examples: 408 download_size: 234773 dataset_size: 766812 configs: - config_name: default data_files: - split: test path: data/test-* --- [Original repository](https://github.com/huankoh/How-Far-are-We-from-Robust-Long-Abstractive-Summarization/tree/main) # How Far are We from Robust Long Abstractive Summarization? (EMNLP 2022) [[`Paper`]](https://arxiv.org/abs/2210.16732) #### Huan Yee Koh\*, Jiaxin Ju\*, He Zhang, Ming Liu, Shirui Pan #### (\* denotes equal contribution) ## Human Annotation of Model-Generated Summaries | **Data Field** | **Definition** | | :--------: |:---- | | **dataset** | Whether the model-generated summary is from arXiv or GovReport dataset. | | **dataset_id** | ID_ + document ID of the dataset. To match the IDs with original datasets, please remove the "ID_" string. The IDs are from the original dataset of [arXiv](https://github.com/armancohan/long-summarization) and [GovReport](https://gov-report-data.github.io/). | | **model_type** | Model variant which generates the summary. 1K, 4K and 8K represents 1,024, 4096 and 8192 input token limit of the model. For more information, please refer to the original paper. | | **model_summary** | Model-generated summary | | **relevance** | Percentage of the reference summary’s main ideas contained in the generated summary. Higher = Better.| | **factual consistency** | Percentage of factually consistent sentences. Higher = Better. | ## Citation For more information, please refer to: [How Far are We from Robust Long Abstractive Summarization?](https://arxiv.org/abs/2210.16732) ``` @inproceedings{koh-etal-2022-far, title = "How Far are We from Robust Long Abstractive Summarization?", author = "Koh, Huan Yee and Ju, Jiaxin and Zhang, He and Liu, Ming and Pan, Shirui", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", month = dec, year = "2022", address = "Abu Dhabi, United Arab Emirates", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.emnlp-main.172", pages = "2682--2698" } ```

提供机构：

gigant

原始信息汇总

数据集概述

数据集特征

model_type: 字符串类型，表示生成摘要的模型变体，如1K、4K、8K代表模型的输入令牌限制。
dataset: 字符串类型，指示模型生成的摘要来自arXiv或GovReport数据集。
factual_consistency: 浮点数类型，表示生成摘要中事实一致句子的百分比。
relevance: 浮点数类型，表示参考摘要主要思想在生成摘要中的包含百分比。
model_summary: 字符串类型，包含模型生成的摘要。
dataset_id: 字符串类型，为数据集ID，格式为ID_ + 文档ID，需移除"ID_"以匹配原始数据集ID。

数据集分割

test: 包含408个示例，总大小为766812字节。

数据集大小

下载大小: 234773字节
数据集大小: 766812字节

配置

config_name: default
data_files:
- split: test
- path: data/test-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集