Investigation of Psylliodes chrysocephala aestivation by RNA-seq
收藏DataCite Commons2023-11-10 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/Investigation_of_Psylliodes_chrysocephala_aestivation_by_RNA-seq/24085815
下载链接
链接失效反馈官方服务:
资源简介:
Library preparation: The total RNA from pre-aestivation (5-day-old), aestivation (30-day-old), and post-aestivation (55-day-old) female beetles were extracted using ZYMO Quick-RNA Tissue/Insect Kit (ZYMO Research, Irvine, CA, USA) and cleaned using TURBO DNA-free™ kit (Thermo Fisher Scientific, Langenselbold, Germany) according to the manufacturer’s instructions. We opted to sample only the females to eliminate sex-related variations. RNA quantity was determined using a Nanodrop ND-1000 UV/Vis spectrophotometer (Thermo Fisher Scientific). The integrity of the RNA samples was determined using the Agilent 2100 Bioanalyzer and an RNA 6000 Nano Kit (Agilent Technologies, Santa Clara, CA, USA). RIN values ≥ 7.0 were considered appropriate for mRNA library preparation. In total, 10 libraries (4, 3, and 3 libraries respectively per pre-aestivation, aestivation, and post-aestivation stages) were prepared using NEBNext® Poly(A) mRNA Magnetic Isolation Module kit (NEB E7490, New England Biolabs) according to the manufacturer’s instructions. The qualities of the libraries were checked via RNA fragment analysis conducted on the Agilent 2100 Bioanalyzer using the Agilent DNF-935 Reagent Kit (Agilent Technologies). The libraries were pooled based on their concentration, and an overall concentration of 3.4 ng/µL was obtained. The sequencing service was provided by BGI Genomics Tech Solutions Co. Ltd (Hong Kong) on a DNBSEQ-T7 platform. The ten raw read files were deposited at Sequence Read Archive (SRA) database of NCBI under the accessions SAMN33022552 - SAMN33022561.De novo assembly and functional annotation: Erroneous k-mers from paired read ends were removed using r-Corrector (v1.0.5) the with default options (Song & Florea, 2015), and the unfixable reads were discarded using the “FilterUncorrectabledPEfastq.py” function in Transcriptome Assembly Tools (Song & Florea, 2015). The adaptor sequences from the reads were removed, and the reads having a quality score above 30 were retained using TrimGalore! (v0.6.7). The cleaned reads (n = 3 per three adult phases) were de novo assembled using Trinity with default options. In total, 224 million bases covering 341,670 transcripts, including putative isoforms, were successfully assembled. The de novo assembly had an N50 value of 1532 and a BUSCO (v5.4.2) completeness score of 96.7% when compared against the endopterygota lineage (BUSCO.v4 datasets). Furthermore, the putative isoforms were combined to obtain a supertranscriptome that contained 189,229 transcripts in total. The supertranscriptome was deposited at GeneBank as a Transcriptome Shotgun Assembly (TSA) under the accession GKIH00000000.1. The transcriptome (including isoforms) was annotated using Trinotate (v3.2.2), which combines the outputs of NCBI BLAST+ (v2.13.0; nucleotide and predicted protein BLAST), TransDecoder (v5.5.0; coding region prediction), signal (v4.0; signal peptide prediction), TmHMM (v2.0; transmembrane domain prediction), and HMMER (v3.3.2; homology search) packages into an SQLite annotation database. The latest uniport_sprot (04/2022) and Pfam-A (11/2015) databases were downloaded using Trinotate, and the default E-value thresholds were used during the searches with BLAST+ and HMMER, respectively. The obtained annotation database was used to extract gene ontology (GO) terms associated with individual genes using the “extract_GO_assignments_from_Trinotate_xls.pl” whereas the signals and TmHMM outputs were manually extracted using Excel spreadsheets. The longest protein-coding regions in the super transcript data predicted by TransDecoder were subjected to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation via GhostKoala v2.2 (https://www.kegg.jp/ghostkoala/). The annotation database was made available publicly on Figshare (https://doi.org/10.6084/m9.figshare.21922938). Differentially expressed genes: The read counts per putative genes were calculated using Salmon (v1.9) by mapping the cleaned reads onto our de novo transcriptome. Genes that had less than 15 read counts across all samples were filtered, and R package “DeSeq2“ (v4.2) was used to identify the differentially expressed genes in the following comparisons; aestivation vs. pre-aestivation, aestivation vs. post-aestivation, and pre-aestivation vs. post-aestivation (DeSeq2 was also allowed to conduct the default filtering). For each comparison, the genes having adjusted P values — which tested for the null hypothesis that the Log2 Fold change (LFC) was 0 — below 0.05 in addition to LFC values below -1 and above 1 were accepted as significantly down- and up-regulated, respectively. Enrichment analyses: The “enricher” function in the R package ”ProfileClusterer” was used to analyze the enrichment status of GO terms and KEGG pathways associated with the differentially expressed genes in the three pair-wise comparisons. All the genes that had passed the filtration before the DeSeq2 analysis served as the background. Importantly, we did not distinguish between up- and down-regulation during the enrichment analyses due to the ambiguous nature of the term and pathway annotations. We selected the top 14 most significantly enriched GO terms and the top 3 most significantly enriched KEGG pathways to be shown in the bubble plots (full enrichment results were provided in Fig. S). The dataset was also investigated in terms of the number of genes predicted to have signal peptides, transmembrane domains, both, or neither. The number of genes belonging to each category was determined by manually investigating the SQLite annotation database, and Chi-squared tests were performed to compare the proportion of each category among differentially expressed genes with that among the background genes. Here, the upregulated and downregulated genes were separately analyzed, and Bonferroni correction was applied (P < .05/18 = .002). The gene hits from significantly enriched GO terms of interest were selected for the visualization of their expressions at three adult stages. A custom R script was used to Z-normalize the expression of each gene across the three adult stages and GraphPad Prism v10.0 was used to construct the heat maps. The names of the genes were extracted from the annotation database constructed in this study.
提供机构:
figshare
创建时间:
2023-11-10



