five

LCQMC

收藏
魔搭社区2026-05-20 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/C-MTEB/LCQMC
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for "LCQMC" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

displayName: LCQMC (Large-scale Chinese Question Matching Corpus) labelTypes: - Chinese Corpus license: - LCQMC Custom paperUrl: https://aclanthology.org/C18-1166.pdf publishDate: "2018-06-06" publishUrl: http://icrc.hitsz.edu.cn/info/1037/1146.htm publisher: - Harbin Institute of Technology - Alibaba tags: - Chinese --- # Dataset Introduction ## Overview Question matching is a fundamental task in QA, which is generally considered a semantic matching task and sometimes a paraphrase recognition task. The goal of this task is to retrieve questions with similar intent to the input query from existing databases. We introduce a large-scale Chinese question matching corpus named LCQMC. Unlike paraphrase corpora, LCQMC is more general as it focuses on intent matching rather than paraphrase. The corpus contains 260,068 manually annotated question pairs, which are split into three subsets: a training set with 238,766 question pairs, a development set with 8,802 question pairs, and a test set with 12,500 question pairs. We evaluated several state-of-the-art sentence matching methods on this corpus. The experimental results not only verify the excellent quality of LCQMC but also provide reliable baseline performance for further research on this corpus. ## Download Dataset :modelscope-code[]{type="git"}
提供机构:
maas
创建时间:
2024-09-06
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
LCQMC是一个大规模中文问题匹配语料库,由哈尔滨工业大学和阿里巴巴于2018年发布,包含260,068个人工标注的问题对,专注于问题意图匹配任务。该数据集分为训练集、开发集和测试集,旨在支持语义匹配和问答系统研究,提供了可靠的基线性能评估。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务