nomic-ai/cornstack-python-v1
收藏Hugging Face2025-03-27 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/nomic-ai/cornstack-python-v1
下载链接
链接失效反馈官方服务:
资源简介:
CoRNStack数据集是一个用于跨多种编程语言代码检索的大规模高质量训练数据集,包含了用于训练的查询、正例和负例三元组。数据集经过精心筛选,确保了文本-代码对的质量,并采用了双重一致性过滤和课程式硬负采样策略,以帮助模型学习并提高代码检索和重排的性能。
The CoRNStack Dataset is a large-scale high-quality training dataset for code retrieval across multiple programming languages, comprising `<query, positive, negative>` triplets for training. The dataset is constructed with careful filtering to ensure the quality of text-code pairs and utilizes dual-consistency filtering and a novel curriculum-based hard negative mining strategy to enhance the models performance in code retrieval and reranking.
提供机构:
nomic-ai



