zilliz/gooaq-context-relevance-130k-context-relevance-with-think

Name: zilliz/gooaq-context-relevance-130k-context-relevance-with-think
Creator: zilliz
Published: 2026-01-06 07:44:15
License: 暂无描述

Hugging Face2026-01-06 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/zilliz/gooaq-context-relevance-130k-context-relevance-with-think

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集用于训练`zilliz/semantic-highlight-bilingual-v1`模型，用于在检索增强生成（RAG）系统中进行语义高亮。数据集包含带有相关注释的查询-上下文对，注释有助于识别文档中哪些部分与查询语义相关，即使它们不包含精确的关键字匹配。关键特征包括：`context_spans`（指示上下文文本中分段跨度的位置）、`context_spans_relevance`（二进制标签，指示每个跨度是否应高亮显示，即是否为回答查询提供关键信息）和`think_process`（包含注释过程中使用的推理过程，以确保更准确的上下文跨度相关标签，并提高可观察性和可解释性）。数据集使用Qwen3-8B进行注释，完整的思考过程保存在`think_process`字段中。

This dataset is used for training the `zilliz/semantic-highlight-bilingual-v1` model for semantic highlighting in RAG (Retrieval-Augmented Generation) systems. The dataset contains query-context pairs with relevance annotations for context spans. The annotations help identify which parts of a document are semantically relevant to a query, even when they dont contain exact keyword matches. Key features include: `context_spans` (indicates the positions of segmented spans within the context text), `context_spans_relevance` (binary labels indicating whether each span should be highlighted, i.e., provides key information for answering the query), and `think_process` (contains the reasoning process used during annotation to ensure more accurate context span relevance labels and improve observability and interpretability). The dataset was annotated using Qwen3-8B, with the complete thinking process preserved in the `think_process` field.

提供机构：

zilliz

5,000+

优质数据集

54 个

任务类型

进入经典数据集