five

crumb/functions-656k

收藏
Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/crumb/functions-656k
下载链接
链接失效反馈
官方服务:
资源简介:
Functions-656K数据集包含656,000个代码样本,这些样本标注了输入和输出类型以及使用所需的任何依赖项或库。所有变量名称已被匿名化处理,旨在减少高频特征,使模型空间更平滑。该数据集源自the stack的1,105,000个文件,经过处理后得到1,376,000个带有依赖项的干净类型函数,随后通过minhash和LSH技术进行去重和匿名化处理。README还提供了允许的库列表,用于过滤数据集。

The Functions-656K dataset contains 656,000 samples of code, annotated with input and output types as well as any dependencies or libraries required for their use. All variable names have been anonymized to reduce high-frequency features and make the model space smoother to navigate. The dataset was derived from 1,105,000 files from the stack, resulting in 1,376,000 cleanly typed functions with dependencies, which were then anonymized and de-duplicated using minhash and LSH techniques. The README also provides a detailed list of allowed libraries for filtering the dataset.
提供机构:
crumb
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作