five

zilliz/dureader-context-relevance-with-think

收藏
Hugging Face2026-01-06 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/zilliz/dureader-context-relevance-with-think
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集用于训练`zilliz/semantic-highlight-bilingual-v1`模型,用于在RAG(检索增强生成)系统中进行语义高亮。数据集包含查询-上下文对,带有上下文跨度的相关性标注,帮助识别文档中与查询语义相关的部分,即使不包含精确关键词匹配。关键特征包括:`context_spans`(上下文跨度位置)、`context_spans_relevance`(相关性标签)和`think_process`(标注时的思考过程)。数据集使用Qwen3-8B进行标注,并保留了完整的思考过程。数据来源于`sentence-transformers/dureader`,遵循原始数据集的许可。

This dataset is used for training the `zilliz/semantic-highlight-bilingual-v1` model for semantic highlighting in RAG (Retrieval-Augmented Generation) systems. The dataset contains query-context pairs with relevance annotations for context spans, helping to identify parts of a document semantically relevant to a query, even without exact keyword matches. Key features include: `context_spans` (positions of segmented spans), `context_spans_relevance` (binary relevance labels), and `think_process` (reasoning process during annotation). The dataset was annotated using Qwen3-8B, preserving the complete thinking process. The data is derived from `sentence-transformers/dureader` and follows the original datasets license.
提供机构:
zilliz
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作