five

Using text mining to link journal articles to neuroanatomical databases

收藏
DataONE2019-03-05 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:c66a121db740498ab263fd4125ac5f0aff89aabb94e99f7a6cb335ef5fb4b2cf
下载链接
链接失效反馈
官方服务:
资源简介:
The electronic linking of neuroscience information, including data embedded in the primary literature, would permit powerful queries and analyses driven by structured databases. This task would be facilitated by automated procedures that can identify biological concepts in journals. Here we apply an approach for automatically mapping formal identifiers of neuroanatomical regions to text found in journal abstracts, applying it to a large body of abstracts from the Journal of Comparative Neurology (JCN). The analyses yield over 100,000 brain region mentions, which we map to 8,225 brain region concepts in multiple organisms. Based on the analysis of a manually annotated corpus, we estimate mentions are mapped at 95% precision and 63% recall. Our results provide insights into the patterns of publication on brain regions and species of study in JCN but also point to important challenges in the standardization of neuroanatomical nomenclatures. We find that many terms in the formal terminologies never appear in a JCN abstract, and, conversely, many terms that authors use are not reflected in the terminologies. To improve the terminologies, we deposited 136 unrecognized brain regions into the Neuroscience Lexicon (NeuroLex). The training data, terminologies, normalizations, evaluations, and annotated journal abstracts are freely available at http://www.chibi.ubc.ca/WhiteText/.

将神经科学信息(包括嵌入于原始文献中的数据)进行电子化关联,可实现由结构化数据库驱动的高效查询与分析。借助可识别期刊文献中生物学概念的自动化流程,该任务的完成将更为便捷。本研究中,我们采用了一种可将神经解剖学区域的正式标识符自动映射至期刊摘要文本的方法,并将其应用于《比较神经学杂志》(Journal of Comparative Neurology, JCN)的海量摘要数据集。本次分析共识别出超过10万处脑区域提及内容,并将其映射至跨多个物种的8225个脑区域概念中。基于手动标注语料库的分析,我们估算得到该映射任务的精确率为95%,召回率为63%。本研究结果不仅揭示了《比较神经学杂志》中脑区域与研究物种的发表模式,同时也指出了神经解剖学术语标准化进程中存在的关键挑战。我们发现,正式术语体系中的诸多术语从未在该刊摘要中出现;反之,作者实际使用的大量术语也未被纳入现有术语体系。为完善术语体系,我们将136个未被现有术语体系收录的脑区域提交至神经科学词典(Neuroscience Lexicon, NeuroLex)。本研究的训练数据、术语体系、归一化处理结果、评估数据集以及标注后的期刊摘要均可在http://www.chibi.ubc.ca/WhiteText/ 免费获取。
创建时间:
2023-12-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作