CPRet-data

Name: CPRet-data
Creator: maas
Published: 2025-12-04 16:40:07
License: 暂无描述

魔搭社区2025-12-04 更新2025-07-05 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/CPRet-data

下载链接

链接失效反馈

官方服务：

资源简介：

# CPRet-data This repository hosts the datasets for **CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming**. [![GitHub Repo](https://img.shields.io/badge/GitHub-coldchair%2FCPRet-181717?logo=github)](https://github.com/coldchair/CPRet) [![🤗 Hugging Face](https://img.shields.io/badge/HuggingFace-CPRet-yellow?logo=huggingface)](https://huggingface.co/collections/coldchair16/cpret-682451276f05c5988fcbdf34) [![arXiv](https://img.shields.io/badge/arXiv-2505.12925-b31b1b.svg)](https://arxiv.org/abs/2505.12925) Visit https://cpret.online/ to try out CPRet in action for competitive programming problem retrieval. --- ## 💡 CPRet Benchmark Tasks The CPRet dataset supports **four retrieval tasks** relevant to competitive programming: 1. **Text-to-Code Retrieval** Retrieve relevant code snippets based on a natural language problem description. 2. **Code-to-Code Retrieval** Retrieve semantically similar code snippets, often corresponding to related problems. 3. **Problem-to-Duplicate Retrieval** Retrieve near-duplicate or reused problems. 4. **Simplified-to-Full Retrieval** Retrieve the full original problem descriptions based on simplified or abridged versions. --- ## 💾 Contents This repository includes all datasets used in the CPRet project, organized as follows: * **Training datasets** Used to train retrieval models with contrastive or supervised learning strategies. * **Evaluation datasets** Benchmarks for the four retrieval tasks above, with relevance annotations. --- ## 🚀 How to Use ### 🔧 Code To use these datasets for training or evaluation, please refer to the main CPRet repository: 👉 [**CPRet GitHub**](https://github.com/coldchair/CPRet) It provides: * Preprocessing and training scripts * Evaluation pipelines for all retrieval tasks ### 🤗 Hugging Face Collection All CPRet-related resources are available in a unified Hugging Face Collection: 👉 [**CPRet on Hugging Face**](https://huggingface.co/collections/coldchair16/cpret-682451276f05c5988fcbdf34) This includes: * 🔹 Trained models (`CPRetriever-Code`, `CPRetriever-Prob`, etc.) * 🔹 Datasets for training and benchmarking * 🔹 Precomputed embeddings along with full problem descriptions, titles, platforms, and URLs, supporting plug-and-play deployment of retrieval services. --- ## 📜 Citation If you use CPRet in your work, please cite: ``` @misc{deng2025cpretdatasetbenchmarkmodel, title = {CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming}, author = {Han Deng and Yuan Meng and Shixiang Tang and Wanli Ouyang and Xinzhu Ma}, year = {2025}, eprint = {2505.12925}, archivePrefix = {arXiv}, primaryClass = {cs.SE}, url = {https://arxiv.org/abs/2505.12925} } ```

# CPRet-data 本仓库托管面向**CPRet：竞赛编程检索数据集、基准测试与模型**的相关数据集。 [![GitHub 仓库](https://img.shields.io/badge/GitHub-coldchair%2FCPRet-181717?logo=github)](https://github.com/coldchair/CPRet) [![🤗 Hugging Face 集合](https://img.shields.io/badge/HuggingFace-CPRet-yellow?logo=huggingface)](https://huggingface.co/collections/coldchair16/cpret-682451276f05c5988fcbdf34) [![arXiv](https://img.shields.io/badge/arXiv-2505.12925-b31b1b.svg)](https://arxiv.org/abs/2505.12925) 访问 https://cpret.online/ 可体验CPRet在竞赛编程题目检索中的实际应用效果。 --- ## 💡 CPRet 基准测试任务 CPRet数据集支持**四项与竞赛编程相关的检索任务**： 1. **文本到代码检索（Text-to-Code Retrieval）** 基于自然语言描述的题目需求，检索相关的代码片段。 2. **代码到代码检索（Code-to-Code Retrieval）** 检索语义相似的代码片段，通常对应同类竞赛题目。 3. **题目到重复题检索（Problem-to-Duplicate Retrieval）** 检索近似重复或被复用的竞赛题目。 4. **简化描述到完整描述检索（Simplified-to-Full Retrieval）** 基于简化或节选版本的题目描述，检索完整的原始题目内容。 --- ## 💾 数据集内容本仓库包含CPRet项目所使用的全部数据集，组织形式如下： * **训练数据集** 用于通过对比学习或监督学习策略训练检索模型。 * **评估数据集** 用于上述四项检索任务的基准测试集，附带相关性标注信息。 --- ## 🚀 使用方法 ### 🔧 代码使用方式若要使用这些数据集开展模型训练或评估，请参考主CPRet仓库： 👉 [**CPRet GitHub 仓库**](https://github.com/coldchair/CPRet) 该仓库提供： * 数据预处理与模型训练脚本 * 适配所有检索任务的评估流水线 ### 🤗 Hugging Face 集合资源所有与CPRet相关的资源均收录于统一的Hugging Face集合中： 👉 [**CPRet on Hugging Face**](https://huggingface.co/collections/coldchair16/cpret-682451276f05c5988fcbdf34) 其中包含： * 🔹 预训练检索模型（如`CPRetriever-Code`、`CPRetriever-Prob`等） * 🔹 用于模型训练与基准测试的数据集 * 🔹 预计算完成的向量嵌入，附带完整题目描述、标题、平台信息与链接，支持检索服务的即插即用部署。 --- ## 📜 引用格式若您在研究工作中使用CPRet，请引用如下文献： @misc{deng2025cpretdatasetbenchmarkmodel, title = {CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming}, author = {Han Deng and Yuan Meng and Shixiang Tang and Wanli Ouyang and Xinzhu Ma}, year = {2025}, eprint = {2505.12925}, archivePrefix = {arXiv}, primaryClass = {cs.SE}, url = {https://arxiv.org/abs/2505.12925} }

提供机构：

maas

创建时间：

2025-07-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集