Data files for 'The fate of artificial transgenes in Acanthamoeba castellanii'
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15019834
下载链接
链接失效反馈官方服务:
资源简介:
Data associated with the publication "The fate of artificial trangenes in Acanthamoeba castellanii". The record contains an archive for the two main bioinformatic analyses performed in the study. More information on the implementation of all analyses can be found at https://github.com/morgancolp/Acastellanii_transgene_analysis.
Intermediate files are included in these archives, but full sequencing readsets are not, due to large file sizes. The SRA accession numbers to the reads will be noted at the relevant points below.
Search for candidate transgene integrations
Files generated from each step of the approach to identifying potential genomic insertions of artificial transgenes. The archive contains a directory for each of the separate isolates included in the analysis. In the top level of the archive are a FASTA file containing the wild-type A. castellanii genome assembly (12_Ac_Neff_rcor_polish.fa), a FASTA file containing the plasmid sequence (pGAPDH-EGFP.fasta), and an Oracle Grid Engine submission script for minimap2 with placeholders in place of input/output file names and number of threads to use (minimap2.sh). The assembly is roughly equivalent to the A. castellanii Neff reference genome sequence (Matthey-Doret, Colp, et al., 2022), but many small 'junk' scaffolds have not been removed in this version. This is not expected to significantly affect our inferences due to the nature of the removed scaffolds (size and sequence content).
The archive for this analysis is found below: integration_search_files.tar.gz
Each subdirectory of this archive contains: a tabular BLAST output identifying reads with hits to the plasmid, a FASTA file containing the identified reads, a BAM file of those reads mapped to the wild-type genome (the 12_Ac_Neff_rcor_polish.fa file in the archive), and the index file for that BAM. The Clone 1 archive contains these files for the initial barcoded Clone 1 data (shallow) and the later, more deeply covered sequencing run (deep).
Reads:
Clone 1: SRX27968418 (shallow), SRX7813525 (deep)Clone LT6: SRX27968413Clone LT8: SRX27968414Clone LT9: SRX27968415
Estimating rate of chimeric reads
Files generated to estimate the rate of chimeric reads in each sequencing run. The archive contains directories for each of the transformed clones for which chimerism was assessed (Clones 1, LT6, LT8, LT9). In the top level of the archive are an Oracle Grid Engine submission script for minimap2 with placeholders in place of input/output file names and number of threads to use (minimap2.sh), a FASTA file containing the same genome assembly described above but with the plasmid sequence added as an additional scaffold (WT_genome_plus_plasmid.fasta), and a perl script to identify reads meeting our definition of chimerism (chim_reads.pl). The scaffold corresponding to the plasmid is the one with header name 'c350_g1_i2 len=5858 path=[354:0-71 354:72-143 @426@!:144-1707 11747:1708-2441 2533:2442-5857]'.
The archive for this analysis is found below: chimerism_files.tar.gz
Each subdirectory contains: a PAF file of nanopore reads mapped against the genome assembly + plasmid, the same PAF file with the plasmid-mapping lines removed, a list of all genome-mapping read IDs and a count of how unique mappings they have, and a list retaining only the reads from the previous list that map exactly twice. The nanopore reads used for mapping are the same as indicated above. For Clone 1, only the deep-sequenced readset was used.
创建时间:
2025-03-21



