Chinese Text Semantic Matching Dataset
收藏科学数据银行2025-10-09 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=10dd817f8f774569aae4e1a413b64e5f
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is designed for Chinese text semantic matching. It incorporates the original ATEC dataset and additional data we collected ourselves. The content covers everyday life, the financial sector, daily conversations, idioms and proverbs, web slang, and other semantic-matching scenarios.Size: 70,860 KBTraining pairs: 735,956Test-set size: 5,137 KBTest pairs: 59,523All data were gathered automatically from the open web, so the topical span is very broad.Format: A tabular file with three columnsColumn A: first text to be matchedColumn B: second text to be matchedColumn C: label (1 = match, 0 = no match)Reference dataset cited:ATEC: Alibaba Taobao E-commerce Click-Through Rate Prediction Dataset. Alibaba Group. 20 Oct 2023. Available at: https://www.atecup.cn/ods. https://www.atecup.cn/ods
提供机构:
Xihua University
创建时间:
2025-09-30



