新公开的真菌生物合成基因簇数据集
收藏arXiv2020-01-10 更新2024-06-21 收录
下载链接:
https://github.com/bioinfoUQAM/fungalbgcdata
下载链接
链接失效反馈官方服务:
资源简介:
本研究提出了一个名为新公开的真菌生物合成基因簇数据集,由加拿大蒙特利尔大学创建。该数据集包含200个真菌生物合成基因簇实例,用于支持通过监督学习方法进行基因簇发现。数据集内容涵盖多种真菌属和生物合成基因簇类型,旨在通过模拟真菌基因组概况,提高监督学习在真菌基因簇分类任务中的性能。数据集的创建过程涉及从MIBiG数据库提取正实例,并使用OrthoDB数据库中的同源基因生成合成负实例。该数据集的应用领域主要集中在通过监督学习方法改进真菌生物合成基因簇的预测性能,以促进新生物活性化合物的发现。
This study presents a newly publicly available dataset of fungal biosynthetic gene clusters (BGCs), developed by the University of Montreal in Canada. This dataset comprises 200 instances of fungal biosynthetic gene clusters, intended to support supervised learning-driven gene cluster discovery. Encompassing diverse fungal genera and various types of biosynthetic gene clusters, this dataset aims to enhance the performance of supervised learning models for fungal gene cluster classification tasks by simulating fungal genome profiles. The dataset was constructed by extracting positive instances from the MIBiG database, and generating synthetic negative instances using homologous genes from the OrthoDB database. Its core application focuses on improving the predictive performance of fungal biosynthetic gene clusters via supervised learning methods, so as to facilitate the discovery of novel bioactive compounds.
提供机构:
加拿大蒙特利尔大学
创建时间:
2020-01-10



