five

CPRet-data

收藏
魔搭社区2025-12-04 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/CPRet-data
下载链接
链接失效反馈
官方服务:
资源简介:
# CPRet-data This repository hosts the datasets for **CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming**. [![GitHub Repo](https://img.shields.io/badge/GitHub-coldchair%2FCPRet-181717?logo=github)](https://github.com/coldchair/CPRet) [![🤗 Hugging Face](https://img.shields.io/badge/HuggingFace-CPRet-yellow?logo=huggingface)](https://huggingface.co/collections/coldchair16/cpret-682451276f05c5988fcbdf34) [![arXiv](https://img.shields.io/badge/arXiv-2505.12925-b31b1b.svg)](https://arxiv.org/abs/2505.12925) Visit https://cpret.online/ to try out CPRet in action for competitive programming problem retrieval. --- ## 💡 CPRet Benchmark Tasks The CPRet dataset supports **four retrieval tasks** relevant to competitive programming: 1. **Text-to-Code Retrieval** Retrieve relevant code snippets based on a natural language problem description. 2. **Code-to-Code Retrieval** Retrieve semantically similar code snippets, often corresponding to related problems. 3. **Problem-to-Duplicate Retrieval** Retrieve near-duplicate or reused problems. 4. **Simplified-to-Full Retrieval** Retrieve the full original problem descriptions based on simplified or abridged versions. --- ## 💾 Contents This repository includes all datasets used in the CPRet project, organized as follows: * **Training datasets** Used to train retrieval models with contrastive or supervised learning strategies. * **Evaluation datasets** Benchmarks for the four retrieval tasks above, with relevance annotations. --- ## 🚀 How to Use ### 🔧 Code To use these datasets for training or evaluation, please refer to the main CPRet repository: 👉 [**CPRet GitHub**](https://github.com/coldchair/CPRet) It provides: * Preprocessing and training scripts * Evaluation pipelines for all retrieval tasks ### 🤗 Hugging Face Collection All CPRet-related resources are available in a unified Hugging Face Collection: 👉 [**CPRet on Hugging Face**](https://huggingface.co/collections/coldchair16/cpret-682451276f05c5988fcbdf34) This includes: * 🔹 Trained models (`CPRetriever-Code`, `CPRetriever-Prob`, etc.) * 🔹 Datasets for training and benchmarking * 🔹 Precomputed embeddings along with full problem descriptions, titles, platforms, and URLs, supporting plug-and-play deployment of retrieval services. --- ## 📜 Citation If you use CPRet in your work, please cite: ``` @misc{deng2025cpretdatasetbenchmarkmodel, title = {CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming}, author = {Han Deng and Yuan Meng and Shixiang Tang and Wanli Ouyang and Xinzhu Ma}, year = {2025}, eprint = {2505.12925}, archivePrefix = {arXiv}, primaryClass = {cs.SE}, url = {https://arxiv.org/abs/2505.12925} } ```

# CPRet-data 本仓库托管面向**CPRet:竞赛编程检索数据集、基准测试与模型**的相关数据集。 [![GitHub 仓库](https://img.shields.io/badge/GitHub-coldchair%2FCPRet-181717?logo=github)](https://github.com/coldchair/CPRet) [![🤗 Hugging Face 集合](https://img.shields.io/badge/HuggingFace-CPRet-yellow?logo=huggingface)](https://huggingface.co/collections/coldchair16/cpret-682451276f05c5988fcbdf34) [![arXiv](https://img.shields.io/badge/arXiv-2505.12925-b31b1b.svg)](https://arxiv.org/abs/2505.12925) 访问 https://cpret.online/ 可体验CPRet在竞赛编程题目检索中的实际应用效果。 --- ## 💡 CPRet 基准测试任务 CPRet数据集支持**四项与竞赛编程相关的检索任务**: 1. **文本到代码检索(Text-to-Code Retrieval)** 基于自然语言描述的题目需求,检索相关的代码片段。 2. **代码到代码检索(Code-to-Code Retrieval)** 检索语义相似的代码片段,通常对应同类竞赛题目。 3. **题目到重复题检索(Problem-to-Duplicate Retrieval)** 检索近似重复或被复用的竞赛题目。 4. **简化描述到完整描述检索(Simplified-to-Full Retrieval)** 基于简化或节选版本的题目描述,检索完整的原始题目内容。 --- ## 💾 数据集内容 本仓库包含CPRet项目所使用的全部数据集,组织形式如下: * **训练数据集** 用于通过对比学习或监督学习策略训练检索模型。 * **评估数据集** 用于上述四项检索任务的基准测试集,附带相关性标注信息。 --- ## 🚀 使用方法 ### 🔧 代码使用方式 若要使用这些数据集开展模型训练或评估,请参考主CPRet仓库: 👉 [**CPRet GitHub 仓库**](https://github.com/coldchair/CPRet) 该仓库提供: * 数据预处理与模型训练脚本 * 适配所有检索任务的评估流水线 ### 🤗 Hugging Face 集合资源 所有与CPRet相关的资源均收录于统一的Hugging Face集合中: 👉 [**CPRet on Hugging Face**](https://huggingface.co/collections/coldchair16/cpret-682451276f05c5988fcbdf34) 其中包含: * 🔹 预训练检索模型(如`CPRetriever-Code`、`CPRetriever-Prob`等) * 🔹 用于模型训练与基准测试的数据集 * 🔹 预计算完成的向量嵌入,附带完整题目描述、标题、平台信息与链接,支持检索服务的即插即用部署。 --- ## 📜 引用格式 若您在研究工作中使用CPRet,请引用如下文献: @misc{deng2025cpretdatasetbenchmarkmodel, title = {CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming}, author = {Han Deng and Yuan Meng and Shixiang Tang and Wanli Ouyang and Xinzhu Ma}, year = {2025}, eprint = {2505.12925}, archivePrefix = {arXiv}, primaryClass = {cs.SE}, url = {https://arxiv.org/abs/2505.12925} }
提供机构:
maas
创建时间:
2025-07-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作