bigcode/starcoder2data-extras
收藏Hugging Face2025-03-19 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/bigcode/starcoder2data-extras
下载链接
链接失效反馈官方服务:
资源简介:
StarCoder2额外数据集,包含用于训练StarCoder2系列模型的多种编程语言和相关文档数据。子集包括Kaggle笔记本文档、StackOverflow对话、处理过的GitHub问题、Open-Web-Math数据集、高质量代码文件集合、英语维基百科子集、ArXiv论文的LaTeX源文件、不同编程语言的中间表示形式以及流行库的文档。
StarCoder2Extras dataset, containing various programming languages and related documentation data for training the StarCoder2 family of models. Subsets include Kaggle notebook documents, StackOverflow conversations, processed GitHub issues, Open-Web-Math dataset, high-quality code file collections, English Wikipedia subset, LaTeX source files from ArXiv papers, intermediate representations of different programming languages, and documentation of popular libraries.
提供机构:
bigcode



