crumb/Functions-139K
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/crumb/Functions-139K
下载链接
链接失效反馈官方服务:
资源简介:
Functions-139K数据集包含13.9万个代码样本,这些样本标注了输入和输出类型以及使用所需的任何依赖项或库。所有变量名均经过匿名化处理,以减少高频特征,使模型空间更平滑易于导航。数据集源自the stack的164.5万个文件,经过处理得到20.4万个带有依赖项的清晰类型函数,再通过minhash和LSH技术去重和匿名化后得到13.9万个样本。README还详细列出了允许用于过滤的库列表。
The Functions-139K dataset contains 139 thousand samples of code, annotated with input and output types as well as any dependencies or libraries required for their use. All variables are anonymized to reduce high-frequency features, aiming for a smoother model space to navigate. The dataset is derived from 1,645k files from the stack, processed down to 204k cleanly typed functions with dependencies, and then further anonymized and de-duplicated to 139k samples using minhash and LSH techniques. The README also provides a detailed list of allowed libraries for filtering purposes.
提供机构:
crumb



