Lake Malawi cichlid pangenome graph reveals extensive structural variation driven by transposable elements
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14029308
下载链接
链接失效反馈官方服务:
资源简介:
This repository represents a data snapshot associated with the manuscript Lake Malawi cichlid pangenome graph reveals extensive structural variation driven by transposable elements. The manuscript is currently available as a preprint on bioRxiv.
Under an Access and Benefit Sharing agreement, these data are made available on an open access basis for research use only. Any person who wishes to use these data for any form of commercial purpose must first enter into a commercial licensing and benefit sharing arrangement with the Government of Malawi. For further information, contact the Access and Benefit-sharing National Focal Point (ABS NFP) for Malawi registered with CBD at https://www.cbd.int/information/nfp.shtml.
Data description
Provided in this repository are the FASTA files of six new genome assemblies.
troMau: Tropheops sp. ‘mauve’, PacBio Sequel II
aulStu: Aulonocara stuartgranti, PacBio Sequel II
rhaChi: Rhamphochromis sp. ‘chillingali’ (male), PacBio Sequel II
otoArg: Otopharynx argyrosoma, R9 MinION
copChr: Copadichromis chrysonotus, R9 MinION
rhaChi2: Rhamphochromis sp. ‘chillingali’ (female), R9 MinION
Two previously published genomes from Ensembl v103 were also included in the pangenome graph: Astatotilapia calliptera (fAstCal1.2, GCF_900246225.1) and Maylandia zebra (M_zebra_UMD2a, GCA_000238955.4).
Other files that are also included:
malawi_haplochromines-graph.gfa: pangenome graph in GFA format constructed using the minigraph software package
malawi_haplochromines-variants.xlsx: detected structural variants, as defined on the fAstCal1.2 reference coordinates
malawi_haplochromines-genelists.xlsx: genes that overlap and do not overlap with structural variants
Access information for raw reads
Raw reads used to generate the new assemblies are accessible on NCBI.
Sample
BioProject
Genome
Biosample
Run ID(s)
troMau
PRJEB80840
GCA_964274065.1
SAMEA11293786
ERR12954135
aulStu
PRJEB80765
GCA_964273965.1
SAMEA115846654
ERR13382500
rhaChi
PRJEB80761
GCA_964273455.1
SAMEA115846655
ERR13382499
otoArg
PRJNA1144831
GCA_046255105.1
SAMN43044617
SRR30633342
copChr
PRJNA1144838
-
SAMN43044710
SRR30633337, SRR30633338
rhaChi2
PRJNA1144843
-
SAMN43044956
SRR30633436, SRR30633437, SRR30633438
Notes
Some of the assemblies are in the process of being uploaded to NCBI, which have flagged a few contigs as part of their quality checks:
ctg00001557 in otoArg (mitochondria)
ctg00005350 in copChr (BLAST hits to amphibian and fish E3 SUMO-protein ligase)
ctg00002210 in rhaChi2 (“worm” contaminant)
It is very likely that these contigs will be removed from the final NCBI assemblies. However, none of these contigs are included in the pangenome graph, and therefore, the findings from the paper remain unaltered. A mapping between the Zenodo contigs and their NCBI counterparts will be provided at a later stage to facilitate coordinate conversions.
创建时间:
2025-02-11



