bigcode/the-stack-smol-xl
收藏数据集描述
这是一个来自the-stack数据集的小子集,包含87种编程语言,每种语言有10,000个随机样本。
语言
数据集包含87种编程语言:
ada, agda, alloy, antlr, applescript, assembly, augeas, awk, batchfile, bison, bluespec, c, c++, c-sharp, clojure, cmake, coffeescript, common-lisp, css, cuda, dart, dockerfile, elixir, elm, emacs-lisp,erlang, f-sharp, fortran, glsl, go, groovy, haskell,html, idris, isabelle, java, java-server-pages, javascript, julia, kotlin, lean, literate-agda, literate-coffeescript, literate-haskell, lua, makefile, maple, markdown, mathematica, matlab, ocaml, pascal, perl, php, powershell, prolog, protocol-buffer, python, r, racket, restructuredtext, rmarkdown, ruby, rust, sas, scala, scheme, shell, smalltalk, solidity, sparql, sql, stan, standard-ml, stata, systemverilog, tcl, tcsh, tex, thrift, typescript, verilog, vhdl, visual-basic, xslt, yacc, zig
数据集结构
python
加载数据集示例:
from datasets import load_dataset
load_dataset("bigcode/the-stack-smol-xl", data_dir="data/go")




