cardiffnlp/tweet_ter

Name: cardiffnlp/tweet_ter
Creator: cardiffnlp
Published: 2024-05-21 02:27:26
License: 暂无描述

Hugging Face2024-05-21 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/cardiffnlp/tweet_ter

下载链接

链接失效反馈

官方服务：

资源简介：

TweetTER（Tweet Target Entity Retrieval）是一个新颖的基准数据集，旨在解决实体链接中的挑战，特别是在社交媒体等噪声较大的领域。与依赖全面知识库的传统实体链接任务不同，TweetTER将实体链接重新定义为二元实体检索任务。这种方法允许在不依赖传统知识库的情况下评估语言模型的性能，为评估语言模型在实体检索任务中的有效性提供了一个更实用和通用的框架。

提供机构：

cardiffnlp

原始信息汇总

数据集概述

名称: TweetTER

描述: TweetTER（Tweet Target Entity Retrieval）是一个针对社交媒体等噪声领域中实体链接挑战设计的新型基准。它将实体链接重新定义为二元实体检索任务，不依赖传统知识库，为评估语言模型在实体检索任务中的效能提供了一个更实用和多功能的框架。

语言: 英语

许可证: 未知

多语言性: 单语种

大小: 小于50K

来源: 扩展自其他数据集

任务类别: 其他

任务ID: 命名实体识别

标签: Tweet_ter, 自然语言处理

数据集结构

配置:

config_name: tweet_ter
data_files:
- split: train, path: data/train.tsv
- split: test, path: data/test.tsv
- split: validation, path: data/val.tsv

数据集特征

target (字符串): 目标命名实体。
context (字符串): 目标实体出现的推文。
start (整数): 目标在提供上下文中的起始字符索引。
end (整数): 目标在提供上下文中的结束字符索引。
definition (字符串): 从Wikidata收集的可能候选定义，与目标实体匹配。
date (字符串): 推文的日期。
label (整数): 二元标签（0或1），指示提供的定义是否与目标实体匹配（1）或不匹配（0）。

引用信息

若使用此数据集，请引用以下论文:

bibtex @inproceedings{rezaee-etal-2024-tweetter-benchmark, title = "{T}weet{TER}: A Benchmark for Target Entity Retrieval on {T}witter without Knowledge Bases", author = "Rezaee, Kiamehr and Camacho-Collados, Jose and Pilehvar, Mohammad Taher", editor = "Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen", booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)", month = may, year = "2024", address = "Torino, Italy", publisher = "ELRA and ICCL", url = "https://aclanthology.org/2024.lrec-main.1468", pages = "16890--16896", abstract = "Entity linking is a well-established task in NLP consisting of associating entity mentions with entries in a knowledge base. Current models have demonstrated competitive performance in standard text settings. However, when it comes to noisy domains such as social media, certain challenges still persist. Typically, to evaluate entity linking on existing benchmarks, a comprehensive knowledge base is necessary and models are expected to possess an understanding of all the entities contained within the knowledge base. However, in practical scenarios where the objective is to retrieve sentences specifically related to a particular entity, strict adherence to a complete understanding of all entities in the knowledge base may not be necessary. To address this gap, we introduce TweetTER (Tweet Target Entity Retrieval), a novel benchmark that aims to bridge the challenges in entity linking. The distinguishing feature of this benchmark is its approach of re-framing entity linking as a binary entity retrieval task. This enables the evaluation of language models{} performance without relying on a conventional knowledge base, providing a more practical and versatile evaluation framework for assessing the effectiveness of language models in entity retrieval tasks.", }

5,000+

优质数据集

54 个

任务类型

进入经典数据集