Data for: Improving Named Entity Recognition in Noisy User-generated Text with Local Distance Neighbor Feature
收藏Mendeley Data2020-03-31 更新2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/nsfdt6m47j
下载链接
链接失效反馈官方服务:
资源简介:
NUToT Dataset (Noisy User-generated Text on Tor) Name: Noisy User-generated Text on Tor Acronym: NUToT Description: The data is annotated for Named Entity Recognition (NER) task, and it involves six categories: Person, Location, Group, Creative work, Corporation, and Product. The Text comes from the domains of two categories of DUTA dataset (DUTA DATASET: http://gvis.unileon.es/dataset/duta-darknet-usage-text-addresses/). They are Drugs and Weapons. The dataset has 851 Sentences with 1200 named entities. The dataset is also available on our group website: http://gvis.unileon.es/dataset/nutot/
NUToT数据集(Noisy User-generated Text on Tor),全称为Noisy User-generated Text on Tor,缩写为NUToT。本数据集专为命名实体识别(Named Entity Recognition, NER)任务标注,涵盖六大实体类别:人物(Person)、地点(Location)、群组(Group)、创作作品(Creative work)、企业(Corporation)与产品(Product)。其文本来源于DUTA数据集(DUTA DATASET: http://gvis.unileon.es/dataset/duta-darknet-usage-text-addresses/)的两个领域:毒品与武器领域。该数据集共包含851条句子,总计1200个命名实体。本数据集亦可在我们团队的官方网站获取:http://gvis.unileon.es/dataset/nutot/
创建时间:
2020-03-31



