five

MilkOligoCorpus, a rich semantic annotated resource for milk oligosaccharide complex information extraction

收藏
Recherche Data Gouv France2025-01-01 更新2026-04-09 收录
下载链接:
https://entrepot.recherche.data.gouv.fr/citation?persistentId=doi:10.57745/LFXGFO
下载链接
链接失效反馈
官方服务:
资源简介:
The MilkOligoCorpus is a dataset of 30 Pubmed abstracts and full-text extracts from scientific articles on the composition of milk oligosaccharides in mammalian species, manually annotated for training and evaluating information extraction tools. This corpus is designed to support the development and assessment of tools for named entity recognition, entity linking and relation extraction to extract the variability of milk oligosaccharides profiles. Named entity linking is essential for integrating information from diverse sources by mapping entity mentions to standard categories and associating them with unique identifiers. Thus, along with the corpus annotation we developed four semantic resources to address the absence of existing ontologies for several entities: (i) the Female parity thesaurus, (ii) the sample thesaurus, (iii) the MO methods thesaurus, (iv) the Oligo type thesaurus available at https://doi.org/10.57745/RA5DAC. An annotation schema was also developed, that identifies the entities of interest and establishes relations between them. This annotation schema serves as the foundation for the manual annotations along with guidelines, a 66-pages document that dictates the instructions on how to perform the annotations, available in the repository Z. This archive includes: (i) the HoloOligo corpus dataset, (ii) the list of the document annotated in the HoloOligo corpus, (iii) the three thesaurus required for the manual annotation, which are not available elsewhere, (iv) the annotation schema. An article detailing the development of the annotation schema and the creation of the gold standard corpus will be submited to PLOS One.
创建时间:
2025-01-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作