nkuznet/CNV-Finder
收藏Hugging Face2026-03-13 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/nkuznet/CNV-Finder
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
# CNV-Finder Dataset
Example data for the [CNV-Finder](https://github.com/nvk23/CNV-Finder) pipeline — an LSTM-based tool for large-scale identification of copy number variants (CNVs) from SNP array data.
## Dataset Contents
### SNP Metrics (`snp_metrics/`)
Per-sample signal intensity files containing **Log R Ratio (LRR)** and **B Allele Frequency (BAF)** extracted from genotyping arrays. Files are Hive-partitioned Parquet format organized by barcode, sample, and chromosome.
```
snp_metrics/
├── {barcode}/
│ └── {barcode}_{sample}/
│ ├── chromosome=1/
│ │ └── *.parquet
│ ├── chromosome=2/
│ ├── ...
│ ├── chromosome=22/
│ ├── chromosome=X/
│ ├── chromosome=Y/
│ └── chromosome=M/
```
Includes 20 samples across 2 cohorts (TEST1: barcode 2231, TEST2: barcode 4784), with 25 chromosomes each.
### NBA Metadata (`NBA_metadata/`)
Reference metadata files containing repeating per-SNP values (SNP ID, position, GenTrain score) partitioned by chromosome. Used during the data preparation step of the pipeline.
```
NBA_metadata/
├── CHROM=1/
│ └── part.0.parquet
├── CHROM=2/
├── ...
└── CHROM=25/
```
## Usage
### Quick Download (Python)
```python
from huggingface_hub import snapshot_download
# Download SNP metrics into example_data/
snapshot_download(
repo_id="nkuznet/CNV-Finder",
repo_type="dataset",
allow_patterns="snp_metrics/**",
local_dir="example_data"
)
# Download NBA metadata into ref_files/
snapshot_download(
repo_id="nkuznet/CNV-Finder",
repo_type="dataset",
allow_patterns="NBA_metadata/**",
local_dir="ref_files"
)
```
See `run_pipeline.ipynb` in the [main repository](https://github.com/nvk23/CNV-Finder) for a full walkthrough.
## Related
- **GitHub Repository:** [nvk23/CNV-Finder](https://github.com/nvk23/CNV-Finder)
- **SNP Metrics Generation:** [nvk23/SNP_metrics](https://github.com/nvk23/SNP_metrics)
提供机构:
nkuznet



