five

PiC

收藏
arXiv2023-02-02 更新2024-06-21 收录
下载链接:
https://phrase-in-context.github.io
下载链接
链接失效反馈
官方服务:
资源简介:
PiC数据集是由奥本大学和Adobe研究共同创建的,包含约28,000个名词短语及其上下文维基百科页面,用于训练和评估短语嵌入。该数据集旨在填补现有短语理解基准的空白,特别是缺乏上下文相关性的问题。PiC数据集通过严格的标注和验证过程,由语言学专家和非专家共同完成,确保数据质量。该数据集的应用领域包括语义搜索、短语相似性比较和短语歧义消除,旨在提高机器对短语在特定上下文中含义的理解能力。

The PiC dataset was co-created by Auburn University and Adobe Research, containing approximately 28,000 noun phrases and their corresponding contextual Wikipedia pages, and is used for training and evaluating phrase embeddings. This dataset aims to fill the gaps in existing phrase understanding benchmarks, especially the issue of insufficient contextual relevance. The PiC dataset adopted a strict annotation and validation workflow, completed collaboratively by linguistic experts and non-experts, to ensure high data quality. Its application domains include semantic search, phrase similarity comparison, and phrase sense disambiguation, with the objective of improving machines' capability to understand the contextual meanings of phrases.
提供机构:
奥本大学和Adobe研究
创建时间:
2022-07-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作