Generative Artificial Intelligence GPT‑4 Accelerates Knowledge Mining and Machine Learning for Synthetic Biology
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/Generative_Artificial_Intelligence_GPT_4_Accelerates_Knowledge_Mining_and_Machine_Learning_for_Synthetic_Biology/24106943
下载链接
链接失效反馈官方服务:
资源简介:
Knowledge
mining from synthetic biology journal articles for machine
learning (ML) applications is a labor-intensive process. The development
of natural language processing (NLP) tools, such as GPT-4, can accelerate
the extraction of published information related to microbial performance
under complex strain engineering and bioreactor conditions. As a proof
of concept, we proposed prompt engineering for a GPT-4 workflow pipeline
to extract knowledge from 176 publications on two oleaginous yeasts
(Yarrowia lipolytica and Rhodosporidium toruloides). After human intervention,
the pipeline obtained a total of 2037 data instances. The structured
data sets and feature selections enabled ML approaches (e.g., a random
forest model) to predict Yarrowia fermentation
titers with decent accuracy (R2 of 0.86
for unseen test data). Via transfer learning, the trained model could
assess the production potential of the engineered nonconventional
yeast, R. toruloides, for which there
are fewer published reports. This work demonstrated the potential
of generative artificial intelligence to streamline information extraction
from research articles, thereby facilitating fermentation predictions
and biomanufacturing development.
创建时间:
2023-09-08



