five

Genomic Variants, Transcriptome and Genome Annotation of I. puberulus

收藏
DataCite Commons2024-06-10 更新2024-08-19 收录
下载链接:
https://figshare.com/articles/dataset/Genomic_Variants_Transcriptome_and_Genome_Annotation_of_I_puberulus/25607139/1
下载链接
链接失效反馈
官方服务:
资源简介:
Transcriptome annotationThe transcriptome was annotated using OmicsBox v3.0.30 (BioBam, Valencia, Spain). Initially, we filtered contigs to include only those longer than 300 bp. These were then aligned against the NCBI non-redundant NR protein database for Viridiplantae using a stringent e-value cutoff of 10e−5. Further annotations were performed using InterProScan to identify protein domains, and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database to map Enzyme Commission (EC) terms and associated metabolic pathways. This is a tab-delimited text file.Genome Annotation (Gene Prediction) of I. puberulus Using Haplophase 1Repetitive regions in the genome were identified and soft-masked using RepeatModeler and RepeatMasker. RNA-seq data from two individuals per sampled population of I. puberulus were aligned to the genome using BBMap to aid in genome annotation. The structural annotation was conducted using the BRAKER2 pipeline, utilizing RNA-seq data as extrinsic evidence. The process focused on the 13 largest scaffolds of the I. puberulus genome. Two files are provided here one with the annotation of all the assembled scaffolds, and a second one with only the annotation of the 13 largest, chromossome-scale scaffolds, both in GTF format.Genomic Variants of I. puberulus from RNA-Seq AnalysisThe data was generated using the STAR aligner for read mapping and the Genome Analysis Toolkit (GATK) for SNP calling and filtering. The dataset aims to provide comprehensive insights into the genetic diversity and structure of I. puberulus populations. Filtered RNA-Seq reads were aligned to the haplophase 1 (BioProject PRJNA1095439) reference genome of I. puberulus using the STAR v2.7.9 software, following a two-pass mapping strategy to enhance the accuracy of splice junction recovery and mapping. Variant calling was performed using GATK’s HaplotypeCaller, followed by variant filtration to ensure high-quality SNP discovery. Subsequent analyses for population genetic assessments were conducted using VCFtools. Quality control measures included duplicate marking, read group addition, and stringent filtering criteria for SNPs to exclude clusters and low-confidence variants. Only high-confidence, biallelic SNPs with sufficient read depth and allele frequency were retained. This dataset is in VCf format.
提供机构:
figshare
创建时间:
2024-04-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作