five

Food and Drugs

收藏
arXiv2022-04-20 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2204.09081v1
下载链接
链接失效反馈
官方服务:
资源简介:
本研究创建了名为Food and Drugs的数据集,由阿尔伯塔大学开发,旨在通过半自动方式从Wikipedia中提取部分标注的数据集,用于新类别的命名实体识别。该数据集包含500个句子,分别针对食品和药品类别。创建过程中,利用Wikipedia的类别系统进行文章和句子的筛选,以确保数据的相关性。此数据集主要用于测试和验证从部分标注数据中训练NER模型的方法,解决传统完全手动标注数据集耗时且成本高的问题。

This study developed a dataset named Food and Drugs, which was constructed by the University of Alberta. The dataset is intended to semi-automatically extract partially annotated text corpora from Wikipedia for named entity recognition (NER) of novel categories. It comprises 500 sentences focused on the food and drug categories respectively. During the dataset construction process, Wikipedia's category system was employed to filter relevant articles and sentences, ensuring data quality and relevance. This dataset is primarily utilized to test and validate methods for training NER models using partially annotated data, aiming to resolve the long-standing challenges of excessive time consumption and high costs associated with traditional fully manually annotated datasets.
提供机构:
阿尔伯塔大学
创建时间:
2022-04-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作