Food and Drugs

Name: Food and Drugs
Creator: 阿尔伯塔大学
Published: 2022-04-20 02:17:09
License: 暂无描述

arXiv2022-04-20 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2204.09081v1

下载链接

链接失效反馈

官方服务：

资源简介：

本研究创建了名为Food and Drugs的数据集，由阿尔伯塔大学开发，旨在通过半自动方式从Wikipedia中提取部分标注的数据集，用于新类别的命名实体识别。该数据集包含500个句子，分别针对食品和药品类别。创建过程中，利用Wikipedia的类别系统进行文章和句子的筛选，以确保数据的相关性。此数据集主要用于测试和验证从部分标注数据中训练NER模型的方法，解决传统完全手动标注数据集耗时且成本高的问题。

This study developed a dataset named Food and Drugs, which was constructed by the University of Alberta. The dataset is intended to semi-automatically extract partially annotated text corpora from Wikipedia for named entity recognition (NER) of novel categories. It comprises 500 sentences focused on the food and drug categories respectively. During the dataset construction process, Wikipedia's category system was employed to filter relevant articles and sentences, ensuring data quality and relevance. This dataset is primarily utilized to test and validate methods for training NER models using partially annotated data, aiming to resolve the long-standing challenges of excessive time consumption and high costs associated with traditional fully manually annotated datasets.

提供机构：

阿尔伯塔大学

创建时间：

2022-04-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集