Blaise-g/SumPubmed
收藏Hugging Face2022-07-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Blaise-g/SumPubmed
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
paperswithcode_id:
pretty_name: SumPubmed
train-eval-index:
- config: Blaise-g--SumPubmed
task: summarization
task_id: summarization
splits:
eval_split: test
col_mapping:
text: text
abstract: target
---
# Dataset Card for "SumPubmed"
## Original Dataset Description
- **Repository:** [https://github.com/vgupta123/sumpubmed](https://github.com/vgupta123/sumpubmed)
- **Paper:** [More Information Needed](https://vgupta123.github.io/docs/121_paper.pdf)
## Description of dataset processing
5 rows were dropped from the original dataset taken from KAGGLE as they were missing the respective 'shorter_abstract' entries.
The 'line_text' and 'filename_text' columns were left untouched while the remaining ones were processed to remove the '\n' (many repetitions of those present in the original dataset), '\<dig\>', '\<cit\>', 'BACKGROUND', 'RESULTS' and 'CONCLUSIONS' matching strings which were deemed not necessary for the purpose of summarization. Additionally, extra spaces were removed and spacing around punctuations was fixed.
提供机构:
Blaise-g
原始信息汇总
数据集概述:SumPubmed
数据集基本信息
- 名称: SumPubmed
- 语言: 英语
- 任务: 文本摘要
- 任务ID: summarization
- 评估分割: 测试集
数据集处理
- 原始数据集来自KAGGLE,删除了5行缺少shorter_abstract条目的数据。
- 处理过程中移除了 、<dig>、<cit>、BACKGROUND、RESULTS和CONCLUSIONS等字符串,并修正了多余的空格和标点符号周围的空格。
- line_text和filename_text列未作改动。
数据集字段映射
- text: 文本
- abstract: 目标(摘要)



