five

just-dna-seq/ensembl_variations

收藏
Hugging Face2026-01-15 更新2025-08-30 收录
下载链接:
https://hf-mirror.com/datasets/just-dna-seq/ensembl_variations
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 task_categories: - tabular-classification tags: - biology - genomics - variant-annotation - ensembl - vcf - variants - parquet - bioinformatics pretty_name: Ensembl Variations (Parquet) size_categories: - 10G<n<100G --- # Ensembl Variations (Parquet Format) This dataset contains Ensembl human genetic variations converted to Parquet format for fast and efficient VCF annotation. ## Dataset Description - **Purpose**: Fast annotation of VCF files with Ensembl variation data - **Format**: Apache Parquet (columnar storage) - **Source**: [Ensembl Variation Database](https://www.ensembl.org/info/genome/variation/) - **Updated: 2026-01-15** - **Total Files**: 25 - **Total Size**: ~13.7 GB ## Usage ### With Polars (Recommended) ```python import polars as pl # Load variants for chromosome 21 df = pl.scan_parquet("hf://datasets/just-dna-seq/ensembl_variations/data/homo_sapiens-chr21.parquet") # Filter variants by position variants = df.filter( (pl.col("POS") >= 10000000) & (pl.col("POS") <= 20000000) ).collect() print(variants) ``` ## License This dataset is released under Apache 2.0 license. The original Ensembl data is available under their terms of use. ## Maintenance This dataset is maintained by the GenoBear project. - GitHub: [https://github.com/dna-seq/just-dna-lite](https://github.com/dna-seq/just-dna-lite) - HuggingFace: [https://huggingface.co/just-dna-seq](https://huggingface.co/just-dna-seq)
提供机构:
just-dna-seq
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作