five

Pocillopora and Cladocopium gene expression levels and Cladocopium SNPs

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/6341760
下载链接
链接失效反馈
官方服务:
资源简介:
Pocillopora holobiont gene expression levels The following files contain the genes expression levels of Pocillopora and its Cladocopium photosymbiont for102 Pocillopora coral colonies collected in the framework of Tara Pacific expedition. Pocillopora_MetaT_ReadCount.tab : Pocillopora raw read counts Pocillopora_MetaT_TPM.tab : Pocillopora normalized read counts CladocopiumC1_MetaT_ReadCount.tab : Cladocopium raw read counts CladocopiumC1_MetaT_TPM.tab : Cladocopium normalized read counts Methods: Pocillopora fragments from 102 colonies were processed to extract then sequence RNA. Metatranscriptomic reads (Illumina-generated 150-bp, paired-end) were separately aligned to predicted coding sequences (CDS) of the Pocillopora meandrina coral host reference genome, the CDS of the Cladocopium goreaui genome, and a Durusdinium transcriptome using Burrows–Wheeler Transform Aligner (BWA-mem, v0.7.15) with the default settings. Host- and symbiont-mapped reads were then sorted and processed using SAMtools v1.10.282 to generate respective bam files. A read was considered a host contig if its sequence aligned to the P. meandrina predicted coding sequence with ≥ 95% of sequence identity and with ≥ 50% of the sequence aligned. Reads aligned to Cladocopium goreaui coding sequences with a cutoff of ≥ 98% of sequence identity over ≥ 80% of the read length were retained as symbiont reads. Reads were further filtered to remove those in which more than 75% of the read length was low complexity or less than 30% was high complexity. Read counts were normalized as transcript per million (TPM). Cladocopium Variants : The following file contains Cladocopium variants called from metatranscriptomic reads of 82 samples aligned on Cladocopium goreaui coding sequences. Cladocopium_FilteredSNPs_4x.vcf.gz : Filtered variants in each sample in vcf format Cladocopium_FilteredSNPs_4x.freq.tab : Alternative allele frequencies of filtered variants in each sample Method: For Pocillopora colonies hosting Cladocopium (84 colonies) we further investigated their population structure using single nucleotide polymorphism (SNP) distributions across their coding sequences. Briefly, we identified a set of transcriptome-wide single nucleotide polymorphisms (SNPs) from metatranscriptomic reads mapped to the Cladocopium goreaui CDS using the Genome Analysis Toolkit tool (GATK, v3.7.0). We followed a modified version of the best practices guide for variant discovery with GATK which included indexing of the genomic reference (picardtools v2.6.0, CreateSequenceDictionary), followed by identification of realignment targets (GATK RealignerTargetCreator) and realignment around detected indels (GATK, IndelRealigner). Variants were called for each colony individually (GATK, HaplotypeCaller) and resulting variant call files (VCFs) were merged into island-specific, multi-sample, cohort files (GATK, CombineGVCFs) before performing joint genotyping across all 11 islands (GATK, GenotypeGVCFs) with polyploidy defined at 1. Joint analysis of multiple samples (i.e., joint genotyping) is recommended for discovery of germline SNPs and indels as it provides information regarding population-wide variance across a cohort of multiple samples. We excluded from this analysis colonies containing a large proportion (>25%) of a second ITS2 profile and potentially affecting the SNP calling. SNPs were filtered using VCFtools (v0.1.12) to include only biallelic SNPs with a quality score ≥ 30 and a coverage ≥ 4
创建时间:
2022-12-13
二维码
社区交流群
二维码
科研交流群
商业服务