CPRet-data
收藏魔搭社区2025-12-04 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/CPRet-data
下载链接
链接失效反馈官方服务:
资源简介:
# CPRet-data
This repository hosts the datasets for **CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming**.
[](https://github.com/coldchair/CPRet)
[](https://huggingface.co/collections/coldchair16/cpret-682451276f05c5988fcbdf34)
[](https://arxiv.org/abs/2505.12925)
Visit https://cpret.online/ to try out CPRet in action for competitive programming problem retrieval.
---
## 💡 CPRet Benchmark Tasks
The CPRet dataset supports **four retrieval tasks** relevant to competitive programming:
1. **Text-to-Code Retrieval**
Retrieve relevant code snippets based on a natural language problem description.
2. **Code-to-Code Retrieval**
Retrieve semantically similar code snippets, often corresponding to related problems.
3. **Problem-to-Duplicate Retrieval**
Retrieve near-duplicate or reused problems.
4. **Simplified-to-Full Retrieval**
Retrieve the full original problem descriptions based on simplified or abridged versions.
---
## 💾 Contents
This repository includes all datasets used in the CPRet project, organized as follows:
* **Training datasets**
Used to train retrieval models with contrastive or supervised learning strategies.
* **Evaluation datasets**
Benchmarks for the four retrieval tasks above, with relevance annotations.
---
## 🚀 How to Use
### 🔧 Code
To use these datasets for training or evaluation, please refer to the main CPRet repository:
👉 [**CPRet GitHub**](https://github.com/coldchair/CPRet)
It provides:
* Preprocessing and training scripts
* Evaluation pipelines for all retrieval tasks
### 🤗 Hugging Face Collection
All CPRet-related resources are available in a unified Hugging Face Collection:
👉 [**CPRet on Hugging Face**](https://huggingface.co/collections/coldchair16/cpret-682451276f05c5988fcbdf34)
This includes:
* 🔹 Trained models (`CPRetriever-Code`, `CPRetriever-Prob`, etc.)
* 🔹 Datasets for training and benchmarking
* 🔹 Precomputed embeddings along with full problem descriptions, titles, platforms, and URLs, supporting plug-and-play deployment of retrieval services.
---
## 📜 Citation
If you use CPRet in your work, please cite:
```
@misc{deng2025cpretdatasetbenchmarkmodel,
title = {CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming},
author = {Han Deng and Yuan Meng and Shixiang Tang and Wanli Ouyang and Xinzhu Ma},
year = {2025},
eprint = {2505.12925},
archivePrefix = {arXiv},
primaryClass = {cs.SE},
url = {https://arxiv.org/abs/2505.12925}
}
```
# CPRet-data
本仓库托管面向**CPRet:竞赛编程检索数据集、基准测试与模型**的相关数据集。
[](https://github.com/coldchair/CPRet)
[](https://huggingface.co/collections/coldchair16/cpret-682451276f05c5988fcbdf34)
[](https://arxiv.org/abs/2505.12925)
访问 https://cpret.online/ 可体验CPRet在竞赛编程题目检索中的实际应用效果。
---
## 💡 CPRet 基准测试任务
CPRet数据集支持**四项与竞赛编程相关的检索任务**:
1. **文本到代码检索(Text-to-Code Retrieval)**
基于自然语言描述的题目需求,检索相关的代码片段。
2. **代码到代码检索(Code-to-Code Retrieval)**
检索语义相似的代码片段,通常对应同类竞赛题目。
3. **题目到重复题检索(Problem-to-Duplicate Retrieval)**
检索近似重复或被复用的竞赛题目。
4. **简化描述到完整描述检索(Simplified-to-Full Retrieval)**
基于简化或节选版本的题目描述,检索完整的原始题目内容。
---
## 💾 数据集内容
本仓库包含CPRet项目所使用的全部数据集,组织形式如下:
* **训练数据集**
用于通过对比学习或监督学习策略训练检索模型。
* **评估数据集**
用于上述四项检索任务的基准测试集,附带相关性标注信息。
---
## 🚀 使用方法
### 🔧 代码使用方式
若要使用这些数据集开展模型训练或评估,请参考主CPRet仓库:
👉 [**CPRet GitHub 仓库**](https://github.com/coldchair/CPRet)
该仓库提供:
* 数据预处理与模型训练脚本
* 适配所有检索任务的评估流水线
### 🤗 Hugging Face 集合资源
所有与CPRet相关的资源均收录于统一的Hugging Face集合中:
👉 [**CPRet on Hugging Face**](https://huggingface.co/collections/coldchair16/cpret-682451276f05c5988fcbdf34)
其中包含:
* 🔹 预训练检索模型(如`CPRetriever-Code`、`CPRetriever-Prob`等)
* 🔹 用于模型训练与基准测试的数据集
* 🔹 预计算完成的向量嵌入,附带完整题目描述、标题、平台信息与链接,支持检索服务的即插即用部署。
---
## 📜 引用格式
若您在研究工作中使用CPRet,请引用如下文献:
@misc{deng2025cpretdatasetbenchmarkmodel,
title = {CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming},
author = {Han Deng and Yuan Meng and Shixiang Tang and Wanli Ouyang and Xinzhu Ma},
year = {2025},
eprint = {2505.12925},
archivePrefix = {arXiv},
primaryClass = {cs.SE},
url = {https://arxiv.org/abs/2505.12925}
}
提供机构:
maas
创建时间:
2025-07-04



