five

Data_Sheet_1_An Educational Bioinformatics Project to Improve Genome Annotation.docx

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_1_An_Educational_Bioinformatics_Project_to_Improve_Genome_Annotation_docx/13339274
下载链接
链接失效反馈
官方服务:
资源简介:
Scientific advancement is hindered without proper genome annotation because biologists lack a complete understanding of cellular protein functions. In bacterial cells, hypothetical proteins (HPs) are open reading frames with unknown functions. HPs result from either an outdated database or insufficient experimental evidence (i.e., indeterminate annotation). While automated annotation reviews help keep genome annotation up to date, often manual reviews are needed to verify proper annotation. Students can provide the manual review necessary to improve genome annotation. This paper outlines an innovative classroom project that determines if HPs have outdated or indeterminate annotation. The Hypothetical Protein Characterization Project uses multiple well-documented, freely available, web-based, bioinformatics resources that analyze an amino acid sequence to (1) detect sequence similarities to other proteins, (2) identify domains, (3) predict tertiary structure including active site characterization and potential binding ligands, and (4) determine cellular location. Enough evidence can be generated from these analyses to support re-annotation of HPs or prioritize HPs for experimental examinations such as structural determination via X-ray crystallography. Additionally, this paper details several approaches for selecting HPs to characterize using the Hypothetical Protein Characterization Project. These approaches include student- and instructor-directed random selection, selection using differential gene expression from mRNA expression data, and selection based on phylogenetic relations. This paper also provides additional resources to support instructional use of the Hypothetical Protein Characterization Project, such as example assignment instructions with grading rubrics, links to training videos in YouTube, and several step-by-step example projects to demonstrate and interpret the range of achievable results that students might encounter. Educational use of the Hypothetical Protein Characterization Project provides students with an opportunity to learn and apply knowledge of bioinformatic programs to address scientific questions. The project is highly customizable in that HP selection and analysis can be specifically formulated based on the scope and purpose of each student’s investigations. Programs used for HP analysis can be easily adapted to course learning objectives. The project can be used in both online and in-seat instruction for a wide variety of undergraduate and graduate classes as well as undergraduate capstone, honor’s, and experiential learning projects.

若缺乏完善的基因组注释(genome annotation),科学研究进展将受到阻碍,因为生物学家无法完整理解细胞内蛋白质的功能。在细菌细胞中,假设蛋白(Hypothetical Proteins, HPs)指功能未知的开放阅读框(open reading frames)。这类蛋白的产生源于数据库过时或实验证据不足,即注释不确定。尽管自动化注释审核(automated annotation reviews)有助于维持基因组注释的时效性,但通常仍需人工审核(manual reviews)来验证注释的正确性。学生可通过参与此类必要的人工审核工作,助力基因组注释质量的提升。本文阐述了一项创新性课堂项目,旨在探究假设蛋白的注释是否存在过时或不确定的问题。该假设蛋白表征项目(Hypothetical Protein Characterization Project)采用多种经过充分验证、可免费获取的基于网络的生物信息学资源,对氨基酸序列(amino acid sequence)展开分析,具体包括:(1) 检测与其他蛋白质的序列相似性(sequence similarities);(2) 识别蛋白质结构域(domains);(3) 预测蛋白质三级结构(tertiary structure),包括活性位点(active site)特征与潜在结合配体(binding ligands);(4) 确定蛋白质的细胞定位(cellular location)。通过上述分析可获得充足证据,用以支持假设蛋白的重新注释(re-annotation),或是筛选出需优先开展实验验证的假设蛋白,例如通过X射线晶体学(X-ray crystallography)完成结构解析。此外,本文详细介绍了基于该假设蛋白表征项目筛选待表征假设蛋白的多种方法,包括学生自主与教师指导下的随机筛选、基于mRNA表达数据(mRNA expression data)的差异基因表达(differential gene expression)筛选,以及基于系统发育关系(phylogenetic relations)的筛选。本文还提供了多项辅助资源,以支持该项目的教学应用,例如带评分细则(grading rubrics)的示例作业指导、YouTube平台上的培训视频链接,以及多个分步示例项目,用以演示并解读学生在研究中可能遇到的各类可实现结果。将该假设蛋白表征项目应用于教学,可为学生提供学习并运用生物信息学工具解决科学问题的机会。该项目具备高度可定制性:假设蛋白的筛选与分析可根据每位学生研究的范围与目标进行针对性设计;用于假设蛋白分析的工具亦可轻松适配课程的学习目标。本项目既可应用于线上与线下教学,适用于各类本科及研究生课程,也可用于本科毕业综合项目(capstone)、荣誉项目(honor's)及体验式学习(experiential learning)项目。
创建时间:
2020-12-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作