five

gigant/robust_long_abstractive_human_annotation

收藏
Hugging Face2024-04-25 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/gigant/robust_long_abstractive_human_annotation
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: model_type dtype: string - name: dataset dtype: string - name: factual_consistency dtype: float64 - name: relevance dtype: float64 - name: model_summary dtype: string - name: dataset_id dtype: string splits: - name: test num_bytes: 766812 num_examples: 408 download_size: 234773 dataset_size: 766812 configs: - config_name: default data_files: - split: test path: data/test-* --- [Original repository](https://github.com/huankoh/How-Far-are-We-from-Robust-Long-Abstractive-Summarization/tree/main) # How Far are We from Robust Long Abstractive Summarization? (EMNLP 2022) [[`Paper`]](https://arxiv.org/abs/2210.16732) #### Huan Yee Koh<sup>\*</sup>, Jiaxin Ju<sup>\*</sup>, He Zhang, Ming Liu, Shirui Pan #### (\* denotes equal contribution) ## Human Annotation of Model-Generated Summaries | **Data Field** | **Definition** | | :--------: |:---- | | **dataset** | Whether the model-generated summary is from arXiv or GovReport dataset. | | **dataset_id** | ID_ + document ID of the dataset. To match the IDs with original datasets, please remove the "ID_" string. The IDs are from the original dataset of [arXiv](https://github.com/armancohan/long-summarization) and [GovReport](https://gov-report-data.github.io/). | | **model_type** | Model variant which generates the summary. 1K, 4K and 8K represents 1,024, 4096 and 8192 input token limit of the model. For more information, please refer to the original paper. | | **model_summary** | Model-generated summary | | **relevance** | Percentage of the reference summary’s main ideas contained in the generated summary. Higher = Better.| | **factual consistency** | Percentage of factually consistent sentences. Higher = Better. | ## Citation For more information, please refer to: [<i>How Far are We from Robust Long Abstractive Summarization?</i>](https://arxiv.org/abs/2210.16732) ``` @inproceedings{koh-etal-2022-far, title = "How Far are We from Robust Long Abstractive Summarization?", author = "Koh, Huan Yee and Ju, Jiaxin and Zhang, He and Liu, Ming and Pan, Shirui", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", month = dec, year = "2022", address = "Abu Dhabi, United Arab Emirates", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.emnlp-main.172", pages = "2682--2698" } ```
提供机构:
gigant
原始信息汇总

数据集概述

数据集特征

  • model_type: 字符串类型,表示生成摘要的模型变体,如1K、4K、8K代表模型的输入令牌限制。
  • dataset: 字符串类型,指示模型生成的摘要来自arXiv或GovReport数据集。
  • factual_consistency: 浮点数类型,表示生成摘要中事实一致句子的百分比。
  • relevance: 浮点数类型,表示参考摘要主要思想在生成摘要中的包含百分比。
  • model_summary: 字符串类型,包含模型生成的摘要。
  • dataset_id: 字符串类型,为数据集ID,格式为ID_ + 文档ID,需移除"ID_"以匹配原始数据集ID。

数据集分割

  • test: 包含408个示例,总大小为766812字节。

数据集大小

  • 下载大小: 234773字节
  • 数据集大小: 766812字节

配置

  • config_name: default
  • data_files:
    • split: test
    • path: data/test-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作