slaf-project/Parse-10M
收藏Hugging Face2026-01-30 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/slaf-project/Parse-10M
下载链接
链接失效反馈官方服务:
资源简介:
---
viewer: true
license: cc-by-nc-sa-4.0
configs:
- config_name: cells
data_dir: "cells.lance"
- config_name: expression
data_dir: "expression.lance"
- config_name: genes
data_dir: "genes.lance"
language:
- en
tags:
- biology
- genomics
- PBMC
- RNA
- single-cell
- lance
- slaf
pretty_name: Parse-10M
---
# Parse 10M PBMC Dataset (SLAF Format)
## Attribution
**This is a re-release of data originally generated by [Parse Biosciences](https://www.parsebiosciences.com/).**
- **Original Dataset**: Parse 10M PBMC 12donor 90cytokines dataset
- **Original Format**: H5AD file
- **Original Source**: https://www.parsebiosciences.com/datasets/10-million-human-pbmcs-in-a-single-experiment/
- **This Release**: Same data in SLAF (Sparse Lazy Array Format) for SLAF tool compatibility
- **License**: CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0)
For detailed information about the dataset and methodology, please refer to the original source.
## About This Release
This release provides the Parse 10M PBMC dataset in SLAF format, enabling direct use with SLAF tools and libraries. The data is identical to the original release, just in a different storage format.
## Dataset Description
Parse 10M PBMC is a single-cell RNA sequencing dataset containing 10 million peripheral blood mononuclear cells (PBMCs) from 12 donors across 90 cytokine conditions. This release provides the same data in SLAF format for compatibility with SLAF tools.
## Usage
This dataset is in [SLAF (Sparse Lazy Array Format)](https://slaf-project.github.io/slaf/) format, which uses the [Lance](https://lance.org/) table format for storage.
You can use it with Hugging Face Datasets (for Parquet access), the `slaf` library (for SLAF format), or `pylance` library (for direct Lance access).
### Using SLAF (Recommended for SLAF Format)
```bash
pip install slafdb
```
```python
hf_path = 'hf://datasets/slaf-project/Parse-10M'
from slaf import SLAFArray
slaf_array = SLAFArray(hf_path)
slaf_array.query("SELECT * FROM cells LIMIT 5")
```
### Using Lance Directly
```bash
pip install pylance
```
```python
import lance
hf_path = 'hf://datasets/slaf-project/Parse-10M'
ds = lance.dataset(f"{hf_path}/cells.lance")
ds.sample(10)
```
## Dataset Structure
The dataset contains single-cell RNA sequencing data from 10 million PBMC (Peripheral Blood Mononuclear Cell) samples across 12 donors with 90 cytokine conditions.
For more detailed information about the dataset structure and metadata, please refer to the original source documentation.
## Citation
If you use this dataset, please cite the original Parse Biosciences dataset and this re-release:
```bibtex
@dataset{parse_10m_pbmc_2024,
title={Parse 10M PBMC 12donor 90cytokines Dataset},
author={Parse Biosciences},
year={2024},
url=https://www.parsebiosciences.com/datasets/10-million-human-pbmcs-in-a-single-experiment/
}
```
提供机构:
slaf-project



