five

Yoshigold/plant-msyn-data

收藏
Hugging Face2026-01-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Yoshigold/plant-msyn-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - other tags: - biology - genomics - synteny - plants pretty_name: Plant Microsynteny Data size_categories: - 1B<n<10B --- # Plant Microsynteny Dataset (plant-msyn-data) Pre-computed MCscan protein-based synteny analysis results for 30+ plant genomes. ## Dataset Structure ``` plant-msyn-data/ ├── mcscan_results/ │ ├── bed_files/ # Gene position files (*.bed) + database_genomes.txt whitelist │ ├── i1_blocks/ # Synteny block files (*.i1.blocks) │ ├── lifted_anchors/ # Syntenic anchor pair files (*.lifted.anchors) │ ├── last_filtered_by_i1/ # I1-filtered LAST alignments │ ├── pep_files/ # Protein FASTA files (*.pep) │ ├── custom_meta/ # Custom genome metadata (user uploads) │ └── custom_synteny_meta/ # Custom synteny project metadata ├── sql/ │ ├── search_catalogs/ # Per-genome SQLite catalogs for fast gene lookups │ └── plantmsyn_metadata.db # Central metadata database └── annotations/ └── [species_name]/ └── gene_annotation.tsv # Gene functional descriptions ``` ## Supported Genomes The dataset includes synteny data for 30+ plant genomes including: - Barley (*Hordeum vulgare*) - Wheat (*Triticum aestivum*) - Rice (*Oryza sativa*) - Maize (*Zea mays*) - Arabidopsis (*Arabidopsis thaliana*) - And 25+ more species ## Usage This dataset is designed to be used with the [Plant-mSyn web application](https://huggingface.co/spaces/yoshigold/plant-msyn). To use programmatically: ```python from huggingface_hub import snapshot_download # Download the full dataset local_path = snapshot_download( repo_id="yoshigold/plant-msyn-data", repo_type="dataset" ) ``` ## File Formats ### BED Files (`*.bed`) Tab-separated files with gene positions: - Column 1: Chromosome - Column 2: Start position - Column 3: End position - Column 4: Gene ID - Column 5: Score - Column 6: Strand ### Blocks Files (`*.i1.blocks`) Tab-separated synteny block definitions linking genes between species. ### Annotation Files (`gene_annotation.tsv`) Tab-separated files with: - Column 1: gene (Gene ID) - Column 2: description (Functional annotation) ## Citation If you use this dataset, please cite: - MCscan/JCVI: Tang et al. (2008) Synteny and Collinearity in Plant Genomes
提供机构:
Yoshigold
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作