five

nkuznet/CNV-Finder

收藏
Hugging Face2026-03-13 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/nkuznet/CNV-Finder
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- # CNV-Finder Dataset Example data for the [CNV-Finder](https://github.com/nvk23/CNV-Finder) pipeline — an LSTM-based tool for large-scale identification of copy number variants (CNVs) from SNP array data. ## Dataset Contents ### SNP Metrics (`snp_metrics/`) Per-sample signal intensity files containing **Log R Ratio (LRR)** and **B Allele Frequency (BAF)** extracted from genotyping arrays. Files are Hive-partitioned Parquet format organized by barcode, sample, and chromosome. ``` snp_metrics/ ├── {barcode}/ │ └── {barcode}_{sample}/ │ ├── chromosome=1/ │ │ └── *.parquet │ ├── chromosome=2/ │ ├── ... │ ├── chromosome=22/ │ ├── chromosome=X/ │ ├── chromosome=Y/ │ └── chromosome=M/ ``` Includes 20 samples across 2 cohorts (TEST1: barcode 2231, TEST2: barcode 4784), with 25 chromosomes each. ### NBA Metadata (`NBA_metadata/`) Reference metadata files containing repeating per-SNP values (SNP ID, position, GenTrain score) partitioned by chromosome. Used during the data preparation step of the pipeline. ``` NBA_metadata/ ├── CHROM=1/ │ └── part.0.parquet ├── CHROM=2/ ├── ... └── CHROM=25/ ``` ## Usage ### Quick Download (Python) ```python from huggingface_hub import snapshot_download # Download SNP metrics into example_data/ snapshot_download( repo_id="nkuznet/CNV-Finder", repo_type="dataset", allow_patterns="snp_metrics/**", local_dir="example_data" ) # Download NBA metadata into ref_files/ snapshot_download( repo_id="nkuznet/CNV-Finder", repo_type="dataset", allow_patterns="NBA_metadata/**", local_dir="ref_files" ) ``` See `run_pipeline.ipynb` in the [main repository](https://github.com/nvk23/CNV-Finder) for a full walkthrough. ## Related - **GitHub Repository:** [nvk23/CNV-Finder](https://github.com/nvk23/CNV-Finder) - **SNP Metrics Generation:** [nvk23/SNP_metrics](https://github.com/nvk23/SNP_metrics)
提供机构:
nkuznet
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作