Datasets and Scripts associated with "Cryptic endogenous retrovirus subfamilies in the primate lineage"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10016499
下载链接
链接失效反馈官方服务:
资源简介:
Scripts and datasets included here were used for the main analyses in the study of "Cryptic endogenous retrovirus subfamilies in the primate lineage".
Study abstract:
Many endogenous retroviruses (ERVs) in the human genome are primate-specific and have contributed novel cis-regulatory elements and transcripts. However, current approaches for classifying and annotating ERVs and their long terminal repeats (LTRs) have limited resolution and are inaccurate. Here, we developed a new annotation based on phylogenetic analysis and cross-species conservation. Focusing on the evolutionarily young LTR subfamilies known as MER11A/B/C, we revealed the presence of four ‘new subfamilies’, that better explain the epigenetic heterogeneity observed within the MER11 instances, suggesting a new annotation for 412 (19.8%) of these repeat elements. Furthermore, we functionally validated the regulatory potential of these four new subfamilies using a massively parallel reporter assay (MPRA), which also identified motifs associated with their differential activities. Combining MPRA with new annotations across primates revealed an ape-specific gain of SOX-related motifs through a single-nucleotide deletion. Lastly, by applying our approach across 53 simian-enriched LTR subfamilies, we defined a total of 75 new subfamilies and found that 3,807 (30.0%) instances from 26 LTR subfamilies could be categorized into a novel annotation, many of which with a distinct epigenetic profile. Thus, with our refined annotation of simian-enriched LTRs, it will be possible to better understand the evolution in primate genomes and potentially identify new roles for ERVs and their LTRs in the hosts.
创建时间:
2025-03-04



