minishlab/tokenlearn-cornstack-queries-coderankembed
收藏Hugging Face2026-04-30 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/minishlab/tokenlearn-cornstack-queries-coderankembed
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是为训练Model2Vec模型在代码检索任务上而创建的,包含由nomic-ai/CodeRankEmbed生成的平均词嵌入。数据集包含来自CornStack的自然语言查询,涵盖六种编程语言(Python、Java、PHP、Go、JavaScript、Ruby),每种语言有50,000行数据,总计300,000行。数据集结构包括截断的输入文本和平均词嵌入列。
This dataset was created for training Model2Vec models on code retrieval, containing mean token embeddings produced by nomic-ai/CodeRankEmbed. It includes natural language queries from CornStack across six programming languages (Python, Java, PHP, Go, JavaScript, Ruby), with 50,000 rows per language, totaling 300,000 rows. The dataset structure includes columns for truncated input text and mean token embeddings.
提供机构:
minishlab



