Genomic Variants, Transcriptome and Genome Annotation of I. puberulus
收藏DataCite Commons2024-06-10 更新2024-08-19 收录
下载链接:
https://figshare.com/articles/dataset/Genomic_Variants_Transcriptome_and_Genome_Annotation_of_I_puberulus/25607139/1
下载链接
链接失效反馈官方服务:
资源简介:
Transcriptome annotationThe transcriptome was annotated using OmicsBox v3.0.30 (BioBam, Valencia, Spain). Initially, we filtered contigs to include only those longer than 300 bp. These were then aligned against the NCBI non-redundant NR protein database for Viridiplantae using a stringent e-value cutoff of 10e−5. Further annotations were performed using InterProScan to identify protein domains, and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database to map Enzyme Commission (EC) terms and associated metabolic pathways. This is a tab-delimited text file.Genome Annotation (Gene Prediction) of I. puberulus Using Haplophase 1Repetitive regions in the genome were identified and soft-masked using RepeatModeler and RepeatMasker. RNA-seq data from two individuals per sampled population of I. puberulus were aligned to the genome using BBMap to aid in genome annotation. The structural annotation was conducted using the BRAKER2 pipeline, utilizing RNA-seq data as extrinsic evidence. The process focused on the 13 largest scaffolds of the I. puberulus genome. Two files are provided here one with the annotation of all the assembled scaffolds, and a second one with only the annotation of the 13 largest, chromossome-scale scaffolds, both in GTF format.Genomic Variants of I. puberulus from RNA-Seq AnalysisThe data was generated using the STAR aligner for read mapping and the Genome Analysis Toolkit (GATK) for SNP calling and filtering. The dataset aims to provide comprehensive insights into the genetic diversity and structure of I. puberulus populations. Filtered RNA-Seq reads were aligned to the haplophase 1 (BioProject PRJNA1095439) reference genome of I. puberulus using the STAR v2.7.9 software, following a two-pass mapping strategy to enhance the accuracy of splice junction recovery and mapping. Variant calling was performed using GATK’s HaplotypeCaller, followed by variant filtration to ensure high-quality SNP discovery. Subsequent analyses for population genetic assessments were conducted using VCFtools. Quality control measures included duplicate marking, read group addition, and stringent filtering criteria for SNPs to exclude clusters and low-confidence variants. Only high-confidence, biallelic SNPs with sufficient read depth and allele frequency were retained. This dataset is in VCf format.
提供机构:
figshare
创建时间:
2024-04-15



