TweetNER7
收藏arXiv2022-11-15 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/tner/tweetner7
下载链接
链接失效反馈官方服务:
资源简介:
TweetNER7是由卡迪夫大学和Snap Inc.共同创建的一个针对Twitter的命名实体识别数据集,包含从2019年9月至2021年8月的11,382条推文,涵盖七种实体类型。数据集通过筛选每周热门关键词的推文构建,确保了话题的多样性和时效性。创建过程中,数据集经过了严格的去重和筛选,以提高数据质量。TweetNER7主要用于分析社交媒体中的短期时间变化,特别是在命名实体识别任务中,旨在解决语言模型在动态社交媒体环境中性能下降的问题。
TweetNER7 is a Twitter-focused named entity recognition (NER) dataset co-created by Cardiff University and Snap Inc. It contains 11,382 tweets posted between September 2019 and August 2021, covering seven entity types. The dataset was constructed by filtering tweets with weekly trending keywords, ensuring topic diversity and timeliness. Strict deduplication and screening procedures were implemented during the dataset creation process to improve data quality. TweetNER7 is mainly used for analyzing short-term temporal changes in social media, particularly for named entity recognition tasks, with the goal of addressing the performance degradation of language models in dynamic social media environments.
提供机构:
卡迪夫大学计算机科学与信息学院
创建时间:
2022-10-08



