cardiffnlp/tweet_ter
收藏数据集概述
名称: TweetTER
描述: TweetTER(Tweet Target Entity Retrieval)是一个针对社交媒体等噪声领域中实体链接挑战设计的新型基准。它将实体链接重新定义为二元实体检索任务,不依赖传统知识库,为评估语言模型在实体检索任务中的效能提供了一个更实用和多功能的框架。
语言: 英语
许可证: 未知
多语言性: 单语种
大小: 小于50K
来源: 扩展自其他数据集
任务类别: 其他
任务ID: 命名实体识别
标签: Tweet_ter, 自然语言处理
数据集结构
配置:
- config_name: tweet_ter
- data_files:
- split: train, path: data/train.tsv
- split: test, path: data/test.tsv
- split: validation, path: data/val.tsv
数据集特征
- target (字符串): 目标命名实体。
- context (字符串): 目标实体出现的推文。
- start (整数): 目标在提供上下文中的起始字符索引。
- end (整数): 目标在提供上下文中的结束字符索引。
- definition (字符串): 从Wikidata收集的可能候选定义,与目标实体匹配。
- date (字符串): 推文的日期。
- label (整数): 二元标签(0或1),指示提供的定义是否与目标实体匹配(1)或不匹配(0)。
引用信息
若使用此数据集,请引用以下论文:
bibtex @inproceedings{rezaee-etal-2024-tweetter-benchmark, title = "{T}weet{TER}: A Benchmark for Target Entity Retrieval on {T}witter without Knowledge Bases", author = "Rezaee, Kiamehr and Camacho-Collados, Jose and Pilehvar, Mohammad Taher", editor = "Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen", booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)", month = may, year = "2024", address = "Torino, Italy", publisher = "ELRA and ICCL", url = "https://aclanthology.org/2024.lrec-main.1468", pages = "16890--16896", abstract = "Entity linking is a well-established task in NLP consisting of associating entity mentions with entries in a knowledge base. Current models have demonstrated competitive performance in standard text settings. However, when it comes to noisy domains such as social media, certain challenges still persist. Typically, to evaluate entity linking on existing benchmarks, a comprehensive knowledge base is necessary and models are expected to possess an understanding of all the entities contained within the knowledge base. However, in practical scenarios where the objective is to retrieve sentences specifically related to a particular entity, strict adherence to a complete understanding of all entities in the knowledge base may not be necessary. To address this gap, we introduce TweetTER (Tweet Target Entity Retrieval), a novel benchmark that aims to bridge the challenges in entity linking. The distinguishing feature of this benchmark is its approach of re-framing entity linking as a binary entity retrieval task. This enables the evaluation of language models{} performance without relying on a conventional knowledge base, providing a more practical and versatile evaluation framework for assessing the effectiveness of language models in entity retrieval tasks.", }



