five

Blaise-g/SumPubmed

收藏
Hugging Face2022-07-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Blaise-g/SumPubmed
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en paperswithcode_id: pretty_name: SumPubmed train-eval-index: - config: Blaise-g--SumPubmed task: summarization task_id: summarization splits: eval_split: test col_mapping: text: text abstract: target --- # Dataset Card for "SumPubmed" ## Original Dataset Description - **Repository:** [https://github.com/vgupta123/sumpubmed](https://github.com/vgupta123/sumpubmed) - **Paper:** [More Information Needed](https://vgupta123.github.io/docs/121_paper.pdf) ## Description of dataset processing 5 rows were dropped from the original dataset taken from KAGGLE as they were missing the respective 'shorter_abstract' entries. The 'line_text' and 'filename_text' columns were left untouched while the remaining ones were processed to remove the '\n' (many repetitions of those present in the original dataset), '\<dig\>', '\<cit\>', 'BACKGROUND', 'RESULTS' and 'CONCLUSIONS' matching strings which were deemed not necessary for the purpose of summarization. Additionally, extra spaces were removed and spacing around punctuations was fixed.
提供机构:
Blaise-g
原始信息汇总

数据集概述:SumPubmed

数据集基本信息

  • 名称: SumPubmed
  • 语言: 英语
  • 任务: 文本摘要
  • 任务ID: summarization
  • 评估分割: 测试集

数据集处理

  • 原始数据集来自KAGGLE,删除了5行缺少shorter_abstract条目的数据。
  • 处理过程中移除了 、<dig>、<cit>、BACKGROUND、RESULTS和CONCLUSIONS等字符串,并修正了多余的空格和标点符号周围的空格。
  • line_text和filename_text列未作改动。

数据集字段映射

  • text: 文本
  • abstract: 目标(摘要)
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作