EnCBP
收藏arXiv2022-03-28 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2203.14498v1
下载链接
链接失效反馈官方服务:
资源简介:
EnCBP是一个基于新闻的英语文化背景预测数据集,由达特茅斯学院计算机科学系创建。该数据集包含2000篇文章,涵盖五个英语国家和美国四个州的五个热门争议话题。数据集通过抽样、标注和手动验证构建,确保了标注质量和文化背景的准确性。EnCBP旨在解决现有NLP研究中文化建模过于粗粒度的问题,通过提供细粒度的文化背景信息,增强NLP模型的性能。数据集的应用领域广泛,包括语言建模、语义分析和心理语言学任务,旨在通过文化背景信息提升模型在自然语言理解中的表现。
EnCBP is a news-based English cultural background prediction dataset, developed by the Department of Computer Science at Dartmouth College. This dataset consists of 2,000 articles covering five hotly debated topics across five English-speaking countries and four U.S. states. It is constructed through sampling, annotation and manual verification to guarantee the quality of annotations and the accuracy of cultural background information. EnCBP aims to address the overly coarse-grained cultural modeling problem in existing NLP research, and enhance the performance of NLP models by providing fine-grained cultural background details. The dataset has a wide range of application scenarios, including language modeling, semantic analysis and psycholinguistics tasks, and is intended to improve model performance in natural language understanding by leveraging cultural background information.
提供机构:
达特茅斯学院计算机科学系
创建时间:
2022-03-28



