InMedData/Cardio_v1
收藏Hugging Face2024-03-12 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/InMedData/Cardio_v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-sa-4.0
language:
- en
tags:
- biology
- medical
size_categories:
- 100K<n<1M
---
# Dataset Card
<!-- Provide a quick summary of the dataset. -->
This dataset consists of abstracts from heart-related papers collected from PubMed. It can be used for pre-training a language model specialized in cardiology.
The dataset was collected through the PubMed API, based on the names of heart-related journals and a glossary of cardiology terms.
# Dataset
## Data Sources
- **[Pubmed](https://pubmed.ncbi.nlm.nih.gov/)**: PubMed is a database that provides abstracts of research papers related to life sciences, biomedical fields, health psychology, and health and welfare. Among these, we have collected abstracts of papers related to the heart.
## Keywords Sources
- **[Scimago Journal & Country Rank](https://www.scimagojr.com/journalrank.php?category=2705#google_vignette)** : We used a list of cardiology-related journals provided by SJR as keywords for data collection.
- **[National Institutes of Health](https://www.nia.nih.gov/health/heart-health/heart-health-glossary)** : We used a glossary provided by NIH as keywords for data collection.
- **[The Texas Heart Institute](https://www.texasheart.org/heart-health/heart-information-center/topics/a-z)** : We used a glossary provided by Texas Heart Institute as keywords for data collection.
- **[Aiken Physicians Alliance](https://aikenphysicians.com/services/cardiology/cardiology-glossary-of-terms)** : We used a glossary provided by Aiken Physicians Alliance as keywords for data collection.
## Dataset Field
| Field | Data Type | Description |
| --- | --- | --- |
| title | string | The title of the paper. |
| abst | string | The abstract of the paper. |
## Dataset Structure
```python
DatasetDict({
train: Dataset({
features: ['title', 'abst'],
num_rows: 2600900
})
})
```
## Use
```python
from datasets import load_dataset
dataset = load_dataset("InMedData/Cardio_v1")
```
### Dataset Contact
khs1220@inmed-data.com
提供机构:
InMedData
原始信息汇总
数据集概述
数据集描述
- 内容: 该数据集包含从PubMed收集的心脏相关论文的摘要,适用于预训练专注于心脏病学的语言模型。
- 收集方式: 通过PubMed API,基于心脏病学相关期刊名称和心脏病学术语词汇进行数据收集。
数据来源
- PubMed: 提供生命科学、生物医学领域、健康心理学和健康福利相关的研究论文摘要数据库,本数据集收集了与心脏相关的论文摘要。
关键词来源
- Scimago Journal & Country Rank: 使用SJR提供的心脏病学相关期刊列表作为关键词进行数据收集。
- National Institutes of Health: 使用NIH提供的心脏健康词汇作为关键词进行数据收集。
- The Texas Heart Institute: 使用Texas Heart Institute提供的心脏信息中心词汇作为关键词进行数据收集。
- Aiken Physicians Alliance: 使用Aiken Physicians Alliance提供的心脏病学术语词汇作为关键词进行数据收集。
数据集字段
| 字段 | 数据类型 | 描述 |
|---|---|---|
| title | string | 论文标题 |
| abst | string | 论文摘要 |
数据集结构
- 训练集: 包含2600900条记录,特征包括title和abst。
使用方法
python from datasets import load_dataset
dataset = load_dataset("InMedData/Cardio_v1")
联系信息
- 邮箱: khs1220@inmed-data.com



