EnzChemRED
收藏arXiv2024-04-22 更新2024-06-21 收录
下载链接:
https://ftp.expasy.org/databases/rhea/nlp/
下载链接
链接失效反馈官方服务:
资源简介:
EnzChemRED数据集由美国国家生物技术信息中心(NCBI)创建,包含1210篇经过专家筛选的PubMed摘要,专注于酶化学关系提取。数据集内容涵盖了酶及其催化化学反应的详细注释,使用UniProt知识库(UniProtKB)和化学生物学兴趣实体(ChEBI)的标识符。创建过程中,专家们使用TeamTat工具对摘要中的化学转换和催化它们的酶进行了精细的注释。该数据集主要应用于支持自然语言处理(NLP)方法的发展,如大型语言模型,以帮助酶的注释工作,并解决酶功能在科学文献中的知识提取问题。
The EnzChemRED dataset was created by the National Center for Biotechnology Information (NCBI). It contains 1,210 expert-curated PubMed abstracts focused on enzymatic chemical relation extraction. The dataset includes detailed annotations of enzymes and their catalytic chemical reactions, using identifiers from the UniProt Knowledgebase (UniProtKB) and the Chemical Entities of Biological Interest (ChEBI) database. During the dataset's creation, experts conducted fine-grained annotation of chemical conversions in the abstracts and the enzymes that catalyze these conversions using the TeamTat tool. This dataset is primarily used to support the development of natural language processing (NLP) methods, such as large language models (LLMs), to assist with enzyme annotation work and address the challenge of extracting knowledge about enzyme functions from scientific literature.
提供机构:
美国国家生物技术信息中心(NCBI)
创建时间:
2024-04-22



