five

Data from: A Corpus for Entity Profiling in Microblog Posts

收藏
Research Data Australia2024-12-14 收录
下载链接:
https://researchdata.edu.au/from-corpus-entity-microblog-posts/1307638
下载链接
链接失效反馈
官方服务:
资源简介:
In this page you can find the datasets presented in the paper A Corpus for Entity Profiling in Microblog Posts. It includes two manually annotated corpora to evaluate the task of identifying aspects on Twitter, both of them based upon the WePS-3 ORM task dataset. The aspects dataset has been annotated using a pooling methodology, for which we have implemented various methods for automatically extracting aspects from tweets that are relevant for an entity. The dataset is organized in the three following files: 1. aspects_terms_annotations.tsv: A tab-separated values file including the annotations. Each line corresponds to a term, while the columns include the entity name, the term itself, and the assesments given by the three judges (J1,J2 and J3). Assessments are encoded as follows: 1 = relevant, 2 = not relevant, 3 = competitor, 4 = unknown. 2. aspects_goldstandard_qrels: This file contains the terms annotated as relevant/competitor by two or more judges. It is a typical TREC qrels file, so it can be used as goldstandard in evaluation tools such as trec_eval. 3. aspects_queries_ids.tsv: A table that maps each query_id used in the qrels file above to the company name in the WePS-3 ORM task dataset.
提供机构:
RMIT University, Australia
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作