CoIR
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/coir-team/coir
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为COIR,是一个专为代码检索任务设计的全面基准测试,涵盖了多种编程语言中的广泛检索挑战。COIR包含了8个细粒度的检索子任务,覆盖了14种主要编程语言。数据集总量超过200万条目,由10个不同的数据集组成,致力于代码检索任务。
The dataset named COIR is a comprehensive benchmark specifically developed for code retrieval tasks, covering a wide range of retrieval challenges across multiple programming languages. COIR comprises 8 fine-grained retrieval subtasks spanning 14 major programming languages. The total size of the dataset exceeds 2 million entries, and it is assembled from 10 distinct datasets, all tailored to code retrieval tasks.



