BEE-spoke-data/the-stack-smol-xs-all
收藏Hugging Face2024-07-27 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/BEE-spoke-data/the-stack-smol-xs-all
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: content
dtype: string
- name: lang
dtype: string
- name: size
dtype: int64
- name: ext
dtype: string
- name: max_stars_count
dtype: int64
- name: avg_line_length
dtype: float64
- name: max_line_length
dtype: int64
- name: alphanum_fraction
dtype: float64
splits:
- name: train
num_bytes: 94824507
num_examples: 8700
download_size: 32608071
dataset_size: 94824507
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
license: odc-by
task_categories:
- text-generation
- feature-extraction
- text-classification
size_categories:
- 1K<n<10K
source_datasets: bigcode/the-stack-smol-xs
---
# bigcode/the-stack-smol-xs - all configs
All configs from `bigcode/the-stack-smol-xs` concatenated and shuffled. 100 examples each of:
```py
['ada', 'agda', 'alloy', 'antlr', 'applescript', 'assembly', 'augeas', 'awk',
'batchfile', 'bison', 'bluespec', 'c', 'c++', 'c-sharp', 'clojure', 'cmake',
'coffeescript', 'common-lisp', 'css', 'cuda', 'dart', 'dockerfile', 'elixir',
'elm', 'emacs-lisp', 'erlang', 'f-sharp', 'fortran', 'glsl', 'go', 'groovy',
'haskell', 'html', 'idris', 'isabelle', 'java', 'java-server-pages',
'javascript', 'julia', 'kotlin', 'lean', 'literate-agda',
'literate-coffeescript', 'literate-haskell', 'lua', 'makefile', 'maple',
'markdown', 'mathematica', 'matlab', 'ocaml', 'pascal', 'perl', 'php',
'powershell', 'prolog', 'protocol-buffer', 'python', 'r', 'racket',
'restructuredtext', 'rmarkdown', 'ruby', 'rust', 'sas', 'scala', 'scheme',
'shell', 'smalltalk', 'solidity', 'sparql', 'sql', 'stan', 'standard-ml',
'stata', 'systemverilog', 'tcl', 'tcsh', 'tex', 'thrift', 'typescript',
'verilog', 'vhdl', 'visual-basic', 'xslt', 'yacc', 'zig']
```
提供机构:
BEE-spoke-data



