BactInt Sentence Corpus
收藏arXiv2023-04-27 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2305.07468v1
下载链接
链接失效反馈官方服务:
资源简介:
BactInt Sentence Corpus是由TCS研究和IIT德里的专家团队创建的,旨在从生物医学文本中自动提取细菌间相互作用的数据集。该数据集包含约1400个经过生物医学专家手动标注的句子,涵盖了从人类宿主到海洋生态系统等多种环境中的细菌相互作用。数据集的创建过程涉及从PubMed中筛选包含多个细菌提及的摘要,并通过特定的分类器进一步过滤,最终由专家使用BRAT工具进行标注。该数据集的应用领域包括医学、农业和营养学等,旨在通过深入理解微生物群落结构及其在环境中的作用,为工业和医疗健康领域提供潜在的益处。
The BactInt Sentence Corpus was developed by a team of experts from TCS Research and IIT Delhi, designed as a dataset for automatic extraction of inter-bacterial interactions from biomedical literature. This corpus contains approximately 1,400 manually annotated sentences by biomedical domain experts, covering bacterial interactions across diverse environments ranging from human hosts to marine ecosystems. The dataset's construction workflow includes screening PubMed abstracts with multiple bacterial mentions, further filtering via specialized classifiers, and final manual annotation conducted by experts using the BRAT tool. Its application domains span medicine, agriculture, nutrition and other related fields, aiming to deliver potential benefits for industrial and healthcare sectors by gaining in-depth insights into microbial community structures and their functional roles in various environments.
提供机构:
TCS研究, 浦那, 印度 和 信息科技学院, IIT德里, 印度
创建时间:
2023-04-27



