Pile-NIH_ExPorter
收藏魔搭社区2025-03-21 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/OmniData/Pile-NIH_ExPorter
下载链接
链接失效反馈官方服务:
资源简介:
displayName: Pile-NIH_ExPorter
license:
- MIT
taskTypes:
- Natural Language Generation
- Language Modelling
mediaTypes:
- Text
labelTypes:
- English Corpus
tags: []
publisher:
- EleutherAI
publishDate: '2023-07-18'
publishUrl: https://pile.eleuther.ai/
paperUrl: ''
---
# 数据介绍
## 简介
Pile-NIH ExPorter数据集是一个基于NIH ExPorter数据库构建的大规模医学文献集合。该数据集汇集了来自NIH(美国国立卫生研究院)的医学研究项目的摘要和元数据信息,涵盖了从基础科学到临床研究的各种医学领域。
Pile-NIH ExPorter数据集为研究人员和开发者提供了一个丰富的医学研究信息资源。它可以用于医学文本分析、研究趋势分析、科学发现等应用,推动医学领域的研究和创新。
## 数据内容
### 数据说明
Pile-NIH ExPorter数据集涵盖了1.7G的数据。
### 数据示例
```
{
"id": "134603122",
"source_id": "",
"doc_id": "419370",
"data_type": "text",
"data_source": "pile",
"data_url": "enwiki-c4-pile-ccnews",
"content": "Receptors for transplantation antigens may be visualized directly on unsensitized cells with the use of an anti-idiotypic antisera. It should be possible to eliminate these cells specifically by treatment of the unsensitized population with sera directed at the specific binding site. Sera have been raised to cells cytotoxic for transplantation alloantigen in strain combinations selected so that only the variable portion of the antigen specific receptor could be recognized. Bona fide anti-idiotypic sera have not been raised. Refinements in the approach under way include 1) use of cytotoxic populations as immunogen with a demonstrably high proportion of receptor-bearing cells, 2) use of parent strains differing only by a point mutation to restrict the heterogeneity of the immune response of one against the other, and 3) use of continuously cultured cytotoxic cell lines to increase the homogeneity of the receptors used as immunogen.\n",
"remark": {
"pile_set_name": "NIH ExPorter"
},
"sub_path": "nih-exporter/test"
}
```
## 引文
```
@misc{conghui2022opendatalab,
title={OpenDataLab: Empowering General Artificial Intelligence with Open Datasets},
author={Conghui He, Wei Li, Zhenjiang Jin, Bin Wang, Chao Xu, Dahua Lin},
journal={https://opendatalab.com/},
year={2022}
}
```
## Download dataset
:modelscope-code[]{type="git"}
displayName: Pile-NIH_ExPorter
license:
- MIT许可证
taskTypes:
- 自然语言生成
- 语言建模
mediaTypes:
- 文本
labelTypes:
- 英语语料库
tags: []
publisher:
- EleutherAI
publishDate: '2023-07-18'
publishUrl: https://pile.eleuther.ai/
paperUrl: ''
---
# 数据集介绍
## 简介
Pile-NIH ExPorter数据集是基于NIH(美国国立卫生研究院,National Institutes of Health)ExPorter数据库构建的大规模医学文献集合。该数据集汇聚了来自NIH的医学研究项目摘要与元数据信息,覆盖基础科学至临床研究等全领域医学研究范畴。
Pile-NIH ExPorter数据集为研究者与开发者提供了丰富的医学研究信息资源,可应用于医学文本分析、研究趋势挖掘、科学发现等场景,助力医学领域的研究创新与发展。
## 数据内容
### 数据说明
Pile-NIH ExPorter数据集规模达1.7 GB。
### 数据示例
{
"id": "134603122",
"source_id": "",
"doc_id": "419370",
"data_type": "text",
"data_source": "pile",
"data_url": "enwiki-c4-pile-ccnews",
"content": "Receptors for transplantation antigens may be visualized directly on unsensitized cells with the use of an anti-idiotypic antisera. It should be possible to eliminate these cells specifically by treatment of the unsensitized population with sera directed at the specific binding site. Sera have been raised to cells cytotoxic for transplantation alloantigen in strain combinations selected so that only the variable portion of the antigen specific receptor could be recognized. Bona fide anti-idiotypic sera have not been raised. Refinements in the approach under way include 1) use of cytotoxic populations as immunogen with a demonstrably high proportion of receptor-bearing cells, 2) use of parent strains differing only by a point mutation to restrict the heterogeneity of the immune response of one against the other, and 3) use of continuously cultured cytotoxic cell lines to increase the homogeneity of the receptors used as immunogen.
",
"remark": {
"pile_set_name": "NIH ExPorter"
},
"sub_path": "nih-exporter/test"
}
## 参考文献
@misc{conghui2022opendatalab,
title={OpenDataLab: Empowering General Artificial Intelligence with Open Datasets},
author={Conghui He, Wei Li, Zhenjiang Jin, Bin Wang, Chao Xu, Dahua Lin},
journal={https://opendatalab.com/},
year={2022}
}
## 数据集下载
:modelscope-code[]{type="git"}
提供机构:
maas
创建时间:
2024-07-09



