Data from: A Corpus for Entity Profiling in Microblog Posts
收藏Research Data Australia2024-12-14 收录
下载链接:
https://researchdata.edu.au/from-corpus-entity-microblog-posts/1307638
下载链接
链接失效反馈官方服务:
资源简介:
In this page you can find the datasets presented in the paper A Corpus for Entity Profiling in Microblog Posts. It includes two manually annotated corpora to evaluate the task of identifying aspects on Twitter, both of them based upon the WePS-3 ORM task dataset.
The aspects dataset has been annotated using a pooling methodology, for which we have implemented various methods for automatically extracting aspects from tweets that are relevant for an entity.
The dataset is organized in the three following files:
1. aspects_terms_annotations.tsv: A tab-separated values file including the annotations. Each line corresponds to a term, while the columns include the entity name, the term itself, and the assesments given by the three judges (J1,J2 and J3). Assessments are encoded as follows: 1 = relevant, 2 = not relevant, 3 = competitor, 4 = unknown.
2. aspects_goldstandard_qrels: This file contains the terms annotated as relevant/competitor by two or more judges. It is a typical TREC qrels file, so it can be used as goldstandard in evaluation tools such as trec_eval.
3. aspects_queries_ids.tsv: A table that maps each query_id used in the qrels file above to the company name in the WePS-3 ORM task dataset.
提供机构:
RMIT University, Australia



