Epigenetic silencing and genome dynamics determine the fate of giant virus endogenizations in Acanthamoeba
收藏Zenodo2025-05-23 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.15269025
下载链接
链接失效反馈官方服务:
资源简介:
This is a repository of data files used and produced in the bioinformatic analyses for the publication "Epigenetic silencing and genome dynamics determine the fate of giant virus endogenizations in Acanthamoeba." The code associated with the files stored here can be found on github.
Genome assembly and gene predictions
The genome assemblies and gene models that form the backbone of this publication were generated as part of work for the Acanthamoeba genome paper by Matthey-Doret, Colp et al. (2022). Because the two projects ran partly in parallel, the assemblies and gene models used here represent different stages of the work by Matthey-Doret, Colp et al. (2022), and do not perfectly match the records publically provided for this publication. As such, we provide here the version of the files used for this study.
Archive: Assemblies_and_genes_Colp_Matthey-Doret.tar.gz
Protein sequences and function
This directory includes fastas for predicted protein (based on the gene models discussed above) and intergenic ORFs (identifiable from the "extraction" in their ID). Interproscan files for both Neff and C3 are also included.
Archive: Proteins_and_function.tar.gz
Viral genes, viral regions, divergent regions
This repository holds the relevant coordinates and information on viral genes. The "*_all_genes_info_virus_conservation.bed" files have the coordinates for all genes and specifies whether they are viral or non viral, and whether they are conserved, degraded or unconserved (for the purpose of all analyses, degraded genes are treated as conserved). The "*_mummer_concatenated_divergent_regions.gff" files have the coordinates of all divergent regions in Neff and C3, and specifies which ones contain all least one full viral gene. The "*_viral_regions_curated.gff" files show the coordinates for manually curated viral regions based on the divergent regions.
Archive: Viral_annotations.tar.gz
Genome alignment information
This repository holds the output files for the mummer4 Neff-C3 genome alignments.
Archive: mummer4_genome_alignments.tar.gz
Methylation
Methylation frequency at every CpG site in Neff and C3. Calls are based on raw reads provided by Colp and Matthey-Doret. Due to the large size of raw nanopore fast5 files, the raw nanopore files are only available on demand.
Archive: Methylation.tar.gz
Custom python input
The customized .gff files indicating whether a gene is viral are in the format necessary to be used with the script mummer_concat_for_publication_inputs.py.
Archive: Custom_python_input.tar.gz
BLAST hits and taxonomy
The initial output file for the Neff and C3 diamond blast searches (diamond_*_v_NCBI.tsv), modified to include the taxonomy of subject proteins, as well as the filtered version including only viral candidates (diamond_*_viral_candidates.tsv). Files breaking down the taxonomy of viral proteins are also included.
Archive: BLAST.tar.gz
Viral hallmark genes
Output files for the ViralRecall hmm viral mhallmark gene search.
Archive: ViralRecall.tar.gz
Expression
Transcriptome fasta for Neff and C3, based on the exon sequences of the Colp and Matthey-Doret gene models and the nucleotide sequence of viral intergenic ORFs. Kallisto output files are also included. These files are modified to include information about which genes are viral or not and conserved or not.
Archive: Expression.tar.gz
Mobile elements
Repeatmasker .out files, with simple and low complexity repeats filtered out, and modified to be easier to process in R.
Archive: Mobile_elements.tar.gz
提供机构:
Zenodo
创建时间:
2025-04-24



