zilliz/dureader-context-relevance-with-think
收藏Hugging Face2026-01-06 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/zilliz/dureader-context-relevance-with-think
下载链接
链接失效反馈官方服务:
资源简介:
该数据集用于训练`zilliz/semantic-highlight-bilingual-v1`模型,用于在RAG(检索增强生成)系统中进行语义高亮。数据集包含查询-上下文对,带有上下文跨度的相关性标注,帮助识别文档中与查询语义相关的部分,即使不包含精确关键词匹配。关键特征包括:`context_spans`(上下文跨度位置)、`context_spans_relevance`(相关性标签)和`think_process`(标注时的思考过程)。数据集使用Qwen3-8B进行标注,并保留了完整的思考过程。数据来源于`sentence-transformers/dureader`,遵循原始数据集的许可。
This dataset is used for training the `zilliz/semantic-highlight-bilingual-v1` model for semantic highlighting in RAG (Retrieval-Augmented Generation) systems. The dataset contains query-context pairs with relevance annotations for context spans, helping to identify parts of a document semantically relevant to a query, even without exact keyword matches. Key features include: `context_spans` (positions of segmented spans), `context_spans_relevance` (binary relevance labels), and `think_process` (reasoning process during annotation). The dataset was annotated using Qwen3-8B, preserving the complete thinking process. The data is derived from `sentence-transformers/dureader` and follows the original datasets license.
提供机构:
zilliz



