illuin-conteb/football
收藏Hugging Face2025-05-30 更新2025-10-18 收录
下载链接:
https://hf-mirror.com/datasets/illuin-conteb/football
下载链接
链接失效反馈官方服务:
资源简介:
ConTEB - Football数据集是上下文感知文本嵌入基准(ConTEB)的一部分,专注于体育主题,尤其是足球。该数据集由301个原始文档、6259个文本块和2682个查询组成,旨在评估上下文嵌入模型的能力。数据集通过从著名足球运动员的维基百科页面收集信息,然后使用GPT-4o改写段落来构建,以去除对原始文档主题的明确提及,从而强化上下文的需求。查询是使用GPT-4o生成的,明确提及人物名称,但不包括其他命名实体,如日期或专有名词。
The ConTEB - Football dataset is part of the Context-aware Text Embedding Benchmark (ConTEB), focusing on the sports theme, particularly football. It consists of 301 original documents, 6259 chunks, and 2682 queries, designed to evaluate the capabilities of contextual embedding models. The dataset is constructed by collecting information from Wikipedia pages of famous footballers and then using GPT-4o to rewrite paragraphs to remove explicit mentions of the original documents theme, thereby enforcing the need for context. Queries are generated using GPT-4o, explicitly mentioning the persons name but excluding other named entities such as dates or proper nouns.
提供机构:
illuin-conteb



