Chinese CQA Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/wangrui6/Zhihu-KOL
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个精心构建的集合,包含了对话式的问答对,并对其进行了问题重构、关键词提取以及检索段落有用性的标注。此外,数据集还包括了重构后的问题、提取的关键词以及评估检索段落在对话式问答中的应用效果的特征。该数据集的规模为1229组问答对,旨在推动对话式问答(CQA)领域的研究,特别是针对RAG(一种问答系统)的增强研究。
This is a meticulously curated dataset containing conversational question-answer pairs, with annotations for question rewriting, keyword extraction, and the utility of retrieved passages. Additionally, the dataset includes rewritten questions, extracted keywords, and features for evaluating the effectiveness of retrieved passages in conversational question answering. With a total of 1229 sets of question-answer pairs, this dataset is designed to advance research in the field of conversational question answering (CQA), especially enhanced research on Retrieval-Augmented Generation (RAG) systems.
提供机构:
Authors of the paper
搜集汇总
数据集介绍

背景与挑战
背景概述
Chinese CQA Dataset是一个通过抓取知乎网站数据构建的数据集,旨在为Open Assistant LLM项目提供支持。项目采用Python技术栈,包括Playwright、Ray等工具,实现了从话题到问答内容的多层次数据抓取和存储。
以上内容由遇见数据集搜集并总结生成



