five

leonli66/compression_mid_train

收藏
Hugging Face2025-11-26 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/leonli66/compression_mid_train
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: sft_deepseek-r1 data_files: - split: train path: sft_DeepSeek-R1/**/*.parquet - config_name: sft_deepseek-r1-0528 data_files: - split: train path: sft_DeepSeek-R1-0528/**/*.parquet - config_name: sft_deepseek-v3_deepseek-v3-0324 data_files: - split: train path: sft_DeepSeek-V3_DeepSeek-V3-0324/**/*.parquet - config_name: sft_deepseek-v3_mixtral_8x22b_qwen_2-5_72b data_files: - split: train path: sft_DeepSeek-V3_Mixtral_8x22B_Qwen_2.5_72B/**/*.parquet - config_name: sft_deepseek_r1_deepseek-v3_qwen_2-5_72b_deepseek-v3-0324 data_files: - split: train path: sft_Deepseek_R1_DeepSeek-V3_Qwen_2.5_72B_DeepSeek-V3-0324/**/*.parquet - config_name: sft_llama-nemotron-post-training-dataset-chat data_files: - split: train path: sft_Llama-Nemotron-Post-Training-Dataset-chat/**/*.parquet - config_name: sft_llama-nemotron-post-training-dataset-code data_files: - split: train path: sft_Llama-Nemotron-Post-Training-Dataset-code/**/*.parquet - config_name: sft_llama-nemotron-post-training-dataset-math data_files: - split: train path: sft_Llama-Nemotron-Post-Training-Dataset-math/**/*.parquet - config_name: sft_llama-nemotron-post-training-dataset-science data_files: - split: train path: sft_Llama-Nemotron-Post-Training-Dataset-science/**/*.parquet - config_name: sft_magpie-qwen2-5-pro-1m-v0-1 data_files: - split: train path: sft_Magpie-Qwen2.5-Pro-1M-v0.1/**/*.parquet - config_name: sft_mixtral-8x22b-v0-1 data_files: - split: train path: sft_Mixtral-8x22B-v0.1/**/*.parquet - config_name: sft_mixtral-8x22b-v0-1_nemotron_4_340b data_files: - split: train path: sft_Mixtral-8x22B-v0.1_Nemotron_4_340B/**/*.parquet - config_name: sft_mixtral-8x22b-v0-1_code data_files: - split: train path: sft_Mixtral-8x22B-v0.1_code/**/*.parquet - config_name: sft_na data_files: - split: train path: sft_NA/**/*.parquet - config_name: sft_nemotron_4_340b data_files: - split: train path: sft_Nemotron_4_340B/**/*.parquet - config_name: sft_qwen2-5-72b-instruct data_files: - split: train path: sft_Qwen2.5-72B-Instruct/**/*.parquet - config_name: sft_qwen3 data_files: - split: train path: sft_Qwen3/**/*.parquet - config_name: sft_qwen3-30b-a3b_qwen3-235b-a22b data_files: - split: train path: sft_Qwen3-30B-A3B_Qwen3-235B-A22B/**/*.parquet - config_name: sft_cosmopedia-v2 data_files: - split: train path: sft_cosmopedia-v2/**/*.parquet - config_name: sft_ifeval-like-data data_files: - split: train path: sft_ifeval-like-data/**/*.parquet - config_name: sft_stem data_files: - split: train path: sft_stem/**/*.parquet - config_name: sft_thinking-code data_files: - split: train path: sft_thinking-code/**/*.parquet - config_name: sft_thinking-math data_files: - split: train path: sft_thinking-math/**/*.parquet - config_name: sft_thinking-sft data_files: - split: train path: sft_thinking-sft/**/*.parquet - config_name: dolmino_tulu data_files: - split: train path: dolmino_tulu/**/*.parquet - config_name: longmino_2e13 data_files: - split: train path: longmino_2e13/**/*.parquet - config_name: longmino_2e15 data_files: - split: train path: longmino_2e15/**/*.parquet - config_name: longmino_2e16 data_files: - split: train path: longmino_2e16/**/*.parquet - config_name: recon_codeparrot_train data_files: - split: train path: recon_codeparrot_train/**/*.parquet - config_name: recon_finewiki_en data_files: - split: train path: recon_finewiki_en/**/*.parquet - config_name: recon_latex_formulas_en_10m data_files: - split: train path: recon_latex_formulas_en_10m/**/*.parquet - config_name: recon_nemotron_cc_v2_hq data_files: - split: train path: recon_nemotron_cc_v2_hq/**/*.parquet - config_name: recon_nemotron_cc_v2_hq_synth data_files: - split: train path: recon_nemotron_cc_v2_hq_synth/**/*.parquet - config_name: recon_nemotron_math_4plus data_files: - split: train path: recon_nemotron_math_4plus/**/*.parquet - config_name: recon_redpajama_github data_files: - split: train path: recon_redpajama_github/**/*.parquet - config_name: dolmino_code data_files: - split: train path: dolmino_code/**/*.parquet - config_name: dolmino_common_crawl data_files: - split: train path: dolmino_common_crawl/**/*.parquet - config_name: dolmino_cranecode data_files: - split: train path: dolmino_cranecode/**/*.parquet - config_name: dolmino_cranemath data_files: - split: train path: dolmino_cranemath/**/*.parquet - config_name: dolmino_dolmino data_files: - split: train path: dolmino_dolmino/**/*.parquet - config_name: dolmino_dolmino_1 data_files: - split: train path: dolmino_dolmino_1/**/*.parquet - config_name: dolmino_gemini data_files: - split: train path: dolmino_gemini/**/*.parquet - config_name: dolmino_general_reasoning_mix data_files: - split: train path: dolmino_general_reasoning_mix/**/*.parquet - config_name: dolmino_llama_nemotron data_files: - split: train path: dolmino_llama_nemotron/**/*.parquet - config_name: dolmino_math data_files: - split: train path: dolmino_math/**/*.parquet - config_name: dolmino_megamatt data_files: - split: train path: dolmino_megamatt/**/*.parquet - config_name: dolmino_nemotron data_files: - split: train path: dolmino_nemotron/**/*.parquet - config_name: dolmino_olmocr_science_pdfs data_files: - split: train path: dolmino_olmocr_science_pdfs/**/*.parquet - config_name: dolmino_omr data_files: - split: train path: dolmino_omr/**/*.parquet - config_name: dolmino_openthoughts2 data_files: - split: train path: dolmino_openthoughts2/**/*.parquet - config_name: dolmino_program_verifiable data_files: - split: train path: dolmino_program_verifiable/**/*.parquet - config_name: dolmino_qwq data_files: - split: train path: dolmino_qwq/**/*.parquet - config_name: dolmino_reddit_to_flashcards data_files: - split: train path: dolmino_reddit_to_flashcards/**/*.parquet - config_name: dolmino_stack_edu data_files: - split: train path: dolmino_stack_edu/**/*.parquet - config_name: dolmino_stem data_files: - split: train path: dolmino_stem/**/*.parquet - config_name: dolmino_tinymath data_files: - split: train path: dolmino_tinymath/**/*.parquet - config_name: dolmino_wiki_to_rcqa data_files: - split: train path: dolmino_wiki_to_rcqa/**/*.parquet - config_name: longmino_2e14 data_files: - split: train path: longmino_2e14/**/*.parquet - config_name: longmino_2e17 data_files: - split: train path: longmino_2e17/**/*.parquet --- # Dataset Each example contains `prompt` (chat format) and `target` fields. ```python from datasets import load_dataset ds = load_dataset("leonli66/compression_mid_train", "<config_name>") ```
提供机构:
leonli66
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作