leonli66/compression_mid_train
收藏Hugging Face2025-11-26 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/leonli66/compression_mid_train
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: sft_deepseek-r1
data_files:
- split: train
path: sft_DeepSeek-R1/**/*.parquet
- config_name: sft_deepseek-r1-0528
data_files:
- split: train
path: sft_DeepSeek-R1-0528/**/*.parquet
- config_name: sft_deepseek-v3_deepseek-v3-0324
data_files:
- split: train
path: sft_DeepSeek-V3_DeepSeek-V3-0324/**/*.parquet
- config_name: sft_deepseek-v3_mixtral_8x22b_qwen_2-5_72b
data_files:
- split: train
path: sft_DeepSeek-V3_Mixtral_8x22B_Qwen_2.5_72B/**/*.parquet
- config_name: sft_deepseek_r1_deepseek-v3_qwen_2-5_72b_deepseek-v3-0324
data_files:
- split: train
path: sft_Deepseek_R1_DeepSeek-V3_Qwen_2.5_72B_DeepSeek-V3-0324/**/*.parquet
- config_name: sft_llama-nemotron-post-training-dataset-chat
data_files:
- split: train
path: sft_Llama-Nemotron-Post-Training-Dataset-chat/**/*.parquet
- config_name: sft_llama-nemotron-post-training-dataset-code
data_files:
- split: train
path: sft_Llama-Nemotron-Post-Training-Dataset-code/**/*.parquet
- config_name: sft_llama-nemotron-post-training-dataset-math
data_files:
- split: train
path: sft_Llama-Nemotron-Post-Training-Dataset-math/**/*.parquet
- config_name: sft_llama-nemotron-post-training-dataset-science
data_files:
- split: train
path: sft_Llama-Nemotron-Post-Training-Dataset-science/**/*.parquet
- config_name: sft_magpie-qwen2-5-pro-1m-v0-1
data_files:
- split: train
path: sft_Magpie-Qwen2.5-Pro-1M-v0.1/**/*.parquet
- config_name: sft_mixtral-8x22b-v0-1
data_files:
- split: train
path: sft_Mixtral-8x22B-v0.1/**/*.parquet
- config_name: sft_mixtral-8x22b-v0-1_nemotron_4_340b
data_files:
- split: train
path: sft_Mixtral-8x22B-v0.1_Nemotron_4_340B/**/*.parquet
- config_name: sft_mixtral-8x22b-v0-1_code
data_files:
- split: train
path: sft_Mixtral-8x22B-v0.1_code/**/*.parquet
- config_name: sft_na
data_files:
- split: train
path: sft_NA/**/*.parquet
- config_name: sft_nemotron_4_340b
data_files:
- split: train
path: sft_Nemotron_4_340B/**/*.parquet
- config_name: sft_qwen2-5-72b-instruct
data_files:
- split: train
path: sft_Qwen2.5-72B-Instruct/**/*.parquet
- config_name: sft_qwen3
data_files:
- split: train
path: sft_Qwen3/**/*.parquet
- config_name: sft_qwen3-30b-a3b_qwen3-235b-a22b
data_files:
- split: train
path: sft_Qwen3-30B-A3B_Qwen3-235B-A22B/**/*.parquet
- config_name: sft_cosmopedia-v2
data_files:
- split: train
path: sft_cosmopedia-v2/**/*.parquet
- config_name: sft_ifeval-like-data
data_files:
- split: train
path: sft_ifeval-like-data/**/*.parquet
- config_name: sft_stem
data_files:
- split: train
path: sft_stem/**/*.parquet
- config_name: sft_thinking-code
data_files:
- split: train
path: sft_thinking-code/**/*.parquet
- config_name: sft_thinking-math
data_files:
- split: train
path: sft_thinking-math/**/*.parquet
- config_name: sft_thinking-sft
data_files:
- split: train
path: sft_thinking-sft/**/*.parquet
- config_name: dolmino_tulu
data_files:
- split: train
path: dolmino_tulu/**/*.parquet
- config_name: longmino_2e13
data_files:
- split: train
path: longmino_2e13/**/*.parquet
- config_name: longmino_2e15
data_files:
- split: train
path: longmino_2e15/**/*.parquet
- config_name: longmino_2e16
data_files:
- split: train
path: longmino_2e16/**/*.parquet
- config_name: recon_codeparrot_train
data_files:
- split: train
path: recon_codeparrot_train/**/*.parquet
- config_name: recon_finewiki_en
data_files:
- split: train
path: recon_finewiki_en/**/*.parquet
- config_name: recon_latex_formulas_en_10m
data_files:
- split: train
path: recon_latex_formulas_en_10m/**/*.parquet
- config_name: recon_nemotron_cc_v2_hq
data_files:
- split: train
path: recon_nemotron_cc_v2_hq/**/*.parquet
- config_name: recon_nemotron_cc_v2_hq_synth
data_files:
- split: train
path: recon_nemotron_cc_v2_hq_synth/**/*.parquet
- config_name: recon_nemotron_math_4plus
data_files:
- split: train
path: recon_nemotron_math_4plus/**/*.parquet
- config_name: recon_redpajama_github
data_files:
- split: train
path: recon_redpajama_github/**/*.parquet
- config_name: dolmino_code
data_files:
- split: train
path: dolmino_code/**/*.parquet
- config_name: dolmino_common_crawl
data_files:
- split: train
path: dolmino_common_crawl/**/*.parquet
- config_name: dolmino_cranecode
data_files:
- split: train
path: dolmino_cranecode/**/*.parquet
- config_name: dolmino_cranemath
data_files:
- split: train
path: dolmino_cranemath/**/*.parquet
- config_name: dolmino_dolmino
data_files:
- split: train
path: dolmino_dolmino/**/*.parquet
- config_name: dolmino_dolmino_1
data_files:
- split: train
path: dolmino_dolmino_1/**/*.parquet
- config_name: dolmino_gemini
data_files:
- split: train
path: dolmino_gemini/**/*.parquet
- config_name: dolmino_general_reasoning_mix
data_files:
- split: train
path: dolmino_general_reasoning_mix/**/*.parquet
- config_name: dolmino_llama_nemotron
data_files:
- split: train
path: dolmino_llama_nemotron/**/*.parquet
- config_name: dolmino_math
data_files:
- split: train
path: dolmino_math/**/*.parquet
- config_name: dolmino_megamatt
data_files:
- split: train
path: dolmino_megamatt/**/*.parquet
- config_name: dolmino_nemotron
data_files:
- split: train
path: dolmino_nemotron/**/*.parquet
- config_name: dolmino_olmocr_science_pdfs
data_files:
- split: train
path: dolmino_olmocr_science_pdfs/**/*.parquet
- config_name: dolmino_omr
data_files:
- split: train
path: dolmino_omr/**/*.parquet
- config_name: dolmino_openthoughts2
data_files:
- split: train
path: dolmino_openthoughts2/**/*.parquet
- config_name: dolmino_program_verifiable
data_files:
- split: train
path: dolmino_program_verifiable/**/*.parquet
- config_name: dolmino_qwq
data_files:
- split: train
path: dolmino_qwq/**/*.parquet
- config_name: dolmino_reddit_to_flashcards
data_files:
- split: train
path: dolmino_reddit_to_flashcards/**/*.parquet
- config_name: dolmino_stack_edu
data_files:
- split: train
path: dolmino_stack_edu/**/*.parquet
- config_name: dolmino_stem
data_files:
- split: train
path: dolmino_stem/**/*.parquet
- config_name: dolmino_tinymath
data_files:
- split: train
path: dolmino_tinymath/**/*.parquet
- config_name: dolmino_wiki_to_rcqa
data_files:
- split: train
path: dolmino_wiki_to_rcqa/**/*.parquet
- config_name: longmino_2e14
data_files:
- split: train
path: longmino_2e14/**/*.parquet
- config_name: longmino_2e17
data_files:
- split: train
path: longmino_2e17/**/*.parquet
---
# Dataset
Each example contains `prompt` (chat format) and `target` fields.
```python
from datasets import load_dataset
ds = load_dataset("leonli66/compression_mid_train", "<config_name>")
```
提供机构:
leonli66



