Halyomorpha halys Official Gene Set v1.2
收藏NIAID Data Ecosystem2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/Halyomorpha_halys_Official_Gene_Sets_v1_0_and_v1_1/25060844
下载链接
链接失效反馈官方服务:
资源简介:
This dataset presents the Halyomorpha halys Official Gene Set (OGS) v1.2. OGSv1.2 is an update of Halyomorpha halys OGSv1.1 (https://doi.org/10.15482/USDA.ADC/1504240) to the coordinates of genome assembly GCA_000696795.3 (https://www.ncbi.nlm.nih.gov/assembly/GCA_000696795.3) using https://github.com/NAL-i5K/coordinates_conversion/.
The original OGSv1.0 is an integration of automatic gene predictions from NCBI's eukaryotic annotation pipeline, NCBI Halyomorpha halys Annotation Release 100 (https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Halyomorpha_halys/100/; ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/696/795/GCF_000696795.1_Hhal_1.0), with manual annotations by the research community (performed via the Apollo manual curation software, http://genomearchitect.org/). Manual annotations performed by the community were downloaded from Apollo, QC'd, and merged with NCBI Halyomorpha halys Annotation Release 100 using the GFF3toolkit software (https://github.com/NAL-i5K/GFF3toolkit/releases/tag/v1.4.4). The resulting merged dataset was formatted for ingest into the i5k Workspace and GenBank databases, resulting in Halyomorpha halys Official Gene Set (OGS) v1.0.
Halyomorpha Official Gene Set halhal_OGSv1.1 is a minor update of halhal_OGSv1.0: Alias attributes were added to all manually annotated cathepsin models; six models from contaminated scaffolds were removed; and notes were added to 3 models located on possibly contaminated scaffolds.
Resources in this dataset:
Resource Title: Halymorpha halys Official Gene Set OGSv1.2.
File Name: halhal_OGSv1.2.tar.gz
Resource Description: The attached tar.gz archive (halhal_OGSv1.2.tar.gz) contains the following files:
halhal_OGSv1.2.gff. Gff3 of all gene predictions of Halymorpha halys genome annotations OGSv1.2
halhal_OGSv1.2_CDS.fa. CDS sequences of Halymorpha halys genome annotations OGSv1.2
halhal_OGSv1.2_pep.fa. Amino acid sequences of Halymorpha halys genome annotations OGSv1.2
halhal_OGSv1.2_trans.fa. Transcript sequences of Halymorpha halys genome annotations OGSv1.2
readme. Readme file describing Halymorpha halys genome annotations OGSv1.2
本数据集收录茶翅蝽(Halymorpha halys)官方基因集(Official Gene Set, OGS)v1.2版本。OGSv1.2是基于基因组组装GCA_000696795.3(https://www.ncbi.nlm.nih.gov/assembly/GCA_000696795.3)的坐标,通过https://github.com/NAL-i5K/coordinates_conversion/工具,对茶翅蝽OGSv1.1(https://doi.org/10.15482/USDA.ADC/1504240)进行的更新版本。
原始OGSv1.0整合了两类注释资源:一是NCBI真核生物注释流程生成的自动基因预测结果,即NCBI茶翅蝽注释版本100(https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Halyomorpha_halys/100/; ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/696/795/GCF_000696795.1_Hhal_1.0);二是研究群体通过Apollo手动注释软件(http://genomearchitect.org/)完成的手动注释结果。研究群体的手动注释数据从Apollo平台下载后,经质量控制(QC)流程校验,再借助GFF3toolkit软件(https://github.com/NAL-i5K/GFF3toolkit/releases/tag/v1.4.4)与NCBI茶翅蝽注释版本100进行合并。合并后的数据集经格式化处理后,提交至i5k工作台与GenBank数据库,最终形成茶翅蝽官方基因集v1.0。
茶翅蝽官方基因集halhal_OGSv1.1是halhal_OGSv1.0的小幅更新版本:为所有手动注释的组织蛋白酶(cathepsin)模型添加了别名属性;移除了来自污染支架序列(scaffold)的6个基因模型;并为位于疑似污染支架序列上的3个模型补充了注释说明。
本数据集包含的资源如下:
资源标题:Halymorpha halys官方基因集OGSv1.2
文件名:halhal_OGSv1.2.tar.gz
资源描述:附带的tar.gz压缩归档文件(halhal_OGSv1.2.tar.gz)包含以下文件:
halhal_OGSv1.2.gff:茶翅蝽基因组注释OGSv1.2的全部基因预测结果的GFF3格式文件
halhal_OGSv1.2_CDS.fa:茶翅蝽基因组注释OGSv1.2的编码序列(CDS)文件
halhal_OGSv1.2_pep.fa:茶翅蝽基因组注释OGSv1.2的氨基酸序列文件
halhal_OGSv1.2_trans.fa:茶翅蝽基因组注释OGSv1.2的转录本序列文件
readme:描述茶翅蝽基因组注释OGSv1.2的说明文档
创建时间:
2020-06-03



