five

InMedData/Cardio_v1

收藏
Hugging Face2024-03-12 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/InMedData/Cardio_v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 language: - en tags: - biology - medical size_categories: - 100K<n<1M --- # Dataset Card <!-- Provide a quick summary of the dataset. --> This dataset consists of abstracts from heart-related papers collected from PubMed. It can be used for pre-training a language model specialized in cardiology. The dataset was collected through the PubMed API, based on the names of heart-related journals and a glossary of cardiology terms. # Dataset ## Data Sources - **[Pubmed](https://pubmed.ncbi.nlm.nih.gov/)**: PubMed is a database that provides abstracts of research papers related to life sciences, biomedical fields, health psychology, and health and welfare. Among these, we have collected abstracts of papers related to the heart. ## Keywords Sources - **[Scimago Journal & Country Rank](https://www.scimagojr.com/journalrank.php?category=2705#google_vignette)** : We used a list of cardiology-related journals provided by SJR as keywords for data collection. - **[National Institutes of Health](https://www.nia.nih.gov/health/heart-health/heart-health-glossary)** : We used a glossary provided by NIH as keywords for data collection. - **[The Texas Heart Institute](https://www.texasheart.org/heart-health/heart-information-center/topics/a-z)** : We used a glossary provided by Texas Heart Institute as keywords for data collection. - **[Aiken Physicians Alliance](https://aikenphysicians.com/services/cardiology/cardiology-glossary-of-terms)** : We used a glossary provided by Aiken Physicians Alliance as keywords for data collection. ## Dataset Field | Field | Data Type | Description | | --- | --- | --- | | title | string | The title of the paper. | | abst | string | The abstract of the paper. | ## Dataset Structure ```python DatasetDict({ train: Dataset({ features: ['title', 'abst'], num_rows: 2600900 }) }) ``` ## Use ```python from datasets import load_dataset dataset = load_dataset("InMedData/Cardio_v1") ``` ### Dataset Contact khs1220@inmed-data.com
提供机构:
InMedData
原始信息汇总

数据集概述

数据集描述

  • 内容: 该数据集包含从PubMed收集的心脏相关论文的摘要,适用于预训练专注于心脏病学的语言模型。
  • 收集方式: 通过PubMed API,基于心脏病学相关期刊名称和心脏病学术语词汇进行数据收集。

数据来源

  • PubMed: 提供生命科学、生物医学领域、健康心理学和健康福利相关的研究论文摘要数据库,本数据集收集了与心脏相关的论文摘要。

关键词来源

  • Scimago Journal & Country Rank: 使用SJR提供的心脏病学相关期刊列表作为关键词进行数据收集。
  • National Institutes of Health: 使用NIH提供的心脏健康词汇作为关键词进行数据收集。
  • The Texas Heart Institute: 使用Texas Heart Institute提供的心脏信息中心词汇作为关键词进行数据收集。
  • Aiken Physicians Alliance: 使用Aiken Physicians Alliance提供的心脏病学术语词汇作为关键词进行数据收集。

数据集字段

字段 数据类型 描述
title string 论文标题
abst string 论文摘要

数据集结构

  • 训练集: 包含2600900条记录,特征包括title和abst。

使用方法

python from datasets import load_dataset

dataset = load_dataset("InMedData/Cardio_v1")

联系信息

  • 邮箱: khs1220@inmed-data.com
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作