five

Reddit Entity Linking Dataset

收藏
arXiv2021-02-26 更新2024-06-21 收录
下载链接:
https://doi.org/10.5281/zenodo.3970806
下载链接
链接失效反馈
官方服务:
资源简介:
Reddit实体链接数据集是由圣母大学创建的一个公开可用数据集,包含17,316个链接实体,每个实体由三位人工标注者标注,并根据标注者间的一致性分为金、银、铜三个等级。该数据集主要用于分析不同标注者之间的错误和分歧,并测试现有实体链接模型在社交媒体数据上的表现。数据集内容涵盖多个子论坛的帖子及其评论,旨在解决社交媒体文本中的实体链接问题,特别是在面对俚语、语法错误、词汇格式不一致等挑战时的应用。

The Reddit Entity Linking Dataset is a publicly available dataset developed by the University of Notre Dame. It comprises 17,316 linked entities, each annotated by three human annotators and classified into three tiers—Gold, Silver, and Bronze—based on inter-annotator agreement. This dataset is primarily utilized to analyze errors and disagreements across different annotators, as well as to assess the performance of existing entity linking models on social media data. The dataset includes posts and comments from multiple Reddit subreddits, targeting the resolution of entity linking tasks in social media text, particularly for scenarios involving challenges such as slang, grammatical errors, and inconsistent lexical formatting.
提供机构:
圣母大学
创建时间:
2021-01-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作