zilliz/dureader-context-relevance-with-think

Name: zilliz/dureader-context-relevance-with-think
Creator: zilliz
Published: 2026-01-06 07:45:46
License: 暂无描述

Hugging Face2026-01-06 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/zilliz/dureader-context-relevance-with-think

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集用于训练`zilliz/semantic-highlight-bilingual-v1`模型，用于在RAG（检索增强生成）系统中进行语义高亮。数据集包含查询-上下文对，带有上下文跨度的相关性标注，帮助识别文档中与查询语义相关的部分，即使不包含精确关键词匹配。关键特征包括：`context_spans`（上下文跨度位置）、`context_spans_relevance`（相关性标签）和`think_process`（标注时的思考过程）。数据集使用Qwen3-8B进行标注，并保留了完整的思考过程。数据来源于`sentence-transformers/dureader`，遵循原始数据集的许可。

This dataset is used for training the `zilliz/semantic-highlight-bilingual-v1` model for semantic highlighting in RAG (Retrieval-Augmented Generation) systems. The dataset contains query-context pairs with relevance annotations for context spans, helping to identify parts of a document semantically relevant to a query, even without exact keyword matches. Key features include: `context_spans` (positions of segmented spans), `context_spans_relevance` (binary relevance labels), and `think_process` (reasoning process during annotation). The dataset was annotated using Qwen3-8B, preserving the complete thinking process. The data is derived from `sentence-transformers/dureader` and follows the original datasets license.

提供机构：

zilliz

5,000+

优质数据集

54 个

任务类型

进入经典数据集