tytodd/qwen3.5-2b-v2-instructions-smoke-10k-smoke-mnbt16k
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/tytodd/qwen3.5-2b-v2-instructions-smoke-10k-smoke-mnbt16k
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: aes2_essay_scoring
features:
- name: input
struct:
- name: full_text
dtype: string
- name: prediction
struct:
- name: score
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 7809371
num_examples: 100
- name: val
num_bytes: 4765255
num_examples: 100
download_size: 9697124
dataset_size: 12574626
- config_name: anli_r1
features:
- name: input
struct:
- name: hypothesis
dtype: string
- name: premise
dtype: string
- name: prediction
struct:
- name: label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2154947
num_examples: 100
- name: val
num_bytes: 2393926
num_examples: 100
download_size: 4491857
dataset_size: 4548873
- config_name: anli_r2
features:
- name: input
struct:
- name: hypothesis
dtype: string
- name: premise
dtype: string
- name: prediction
struct:
- name: label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2163803
num_examples: 100
- name: val
num_bytes: 2883626
num_examples: 100
download_size: 4992395
dataset_size: 5047429
- config_name: anli_r3
features:
- name: input
struct:
- name: hypothesis
dtype: string
- name: premise
dtype: string
- name: prediction
struct:
- name: label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2715608
num_examples: 100
- name: val
num_bytes: 2788966
num_examples: 100
download_size: 5442192
dataset_size: 5504574
- config_name: arc_challenge
features:
- name: input
struct:
- name: choices
dtype: string
- name: question
dtype: string
- name: prediction
struct:
- name: choice
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: ood
num_bytes: 3228532
num_examples: 100
download_size: 2584395
dataset_size: 3228532
- config_name: argument_quality_ranking
features:
- name: input
struct:
- name: argument
dtype: string
- name: topic
dtype: string
- name: prediction
struct:
- name: quality_label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: ood
num_bytes: 2204781
num_examples: 100
download_size: 2171661
dataset_size: 2204781
- config_name: big_patent_innovation
features:
- name: input
struct:
- name: description
dtype: string
- name: prediction
struct:
- name: innovation_score
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2934204
num_examples: 100
- name: val
num_bytes: 2941424
num_examples: 100
download_size: 5814888
dataset_size: 5875628
- config_name: boardgame_qa
features:
- name: input
struct:
- name: question
dtype: string
- name: prediction
struct:
- name: answer
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 3998377
num_examples: 100
- name: val
num_bytes: 4241927
num_examples: 100
download_size: 8026326
dataset_size: 8240304
- config_name: chatbot_arena_conversations
features:
- name: input
struct:
- name: question
dtype: string
- name: response_A
dtype: string
- name: response_B
dtype: string
- name: prediction
struct:
- name: label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2434963
num_examples: 100
- name: val
num_bytes: 2475506
num_examples: 100
download_size: 4844374
dataset_size: 4910469
- config_name: civil_comments
features:
- name: input
struct:
- name: comment
dtype: string
- name: prediction
struct:
- name: toxicity_label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2166905
num_examples: 100
- name: val
num_bytes: 1865324
num_examples: 100
download_size: 3988831
dataset_size: 4032229
- config_name: code_judge_bench
features:
- name: input
struct:
- name: code_A
dtype: string
- name: code_B
dtype: string
- name: problem
dtype: string
- name: prediction
struct:
- name: label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: ood
num_bytes: 12211560
num_examples: 100
download_size: 11574293
dataset_size: 12211560
- config_name: colbert_humor_detection
features:
- name: input
struct:
- name: text
dtype: string
- name: prediction
struct:
- name: humor_label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 1731670
num_examples: 100
- name: val
num_bytes: 1690564
num_examples: 100
download_size: 3352998
dataset_size: 3422234
- config_name: customer_support_tickets_en
features:
- name: input
struct:
- name: body
dtype: string
- name: subject
dtype: string
- name: prediction
struct:
- name: queue
dtype: string
- name: type
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 3674659
num_examples: 100
- name: val
num_bytes: 5985332
num_examples: 100
download_size: 9570643
dataset_size: 9659991
- config_name: customer_support_tickets_gorkem
features:
- name: input
struct:
- name: ticket_text
dtype: string
- name: prediction
struct:
- name: ticket_subject
dtype: string
- name: ticket_type
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2643598
num_examples: 100
- name: val
num_bytes: 2719781
num_examples: 100
download_size: 5275026
dataset_size: 5363379
- config_name: dbpedia_easy
features:
- name: input
struct:
- name: text
dtype: string
- name: prediction
struct:
- name: l1_class
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 1915754
num_examples: 100
- name: val
num_bytes: 1866020
num_examples: 100
download_size: 3712240
dataset_size: 3781774
- config_name: dbpedia_hard
features:
- name: input
struct:
- name: text
dtype: string
- name: prediction
struct:
- name: l1_class
dtype: string
- name: l2_class
dtype: string
- name: l3_class
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 7048122
num_examples: 100
- name: val
num_bytes: 8553680
num_examples: 100
download_size: 11695722
dataset_size: 15601802
- config_name: dbpedia_medium
features:
- name: input
struct:
- name: text
dtype: string
- name: prediction
struct:
- name: l1_class
dtype: string
- name: l2_class
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 3124086
num_examples: 100
- name: val
num_bytes: 2916983
num_examples: 100
download_size: 5948926
dataset_size: 6041069
- config_name: enron_email_quality
features:
- name: input
struct:
- name: body
dtype: string
- name: subject
dtype: string
- name: prediction
struct:
- name: quality_score
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 6752928
num_examples: 100
- name: val
num_bytes: 4839765
num_examples: 100
download_size: 8109553
dataset_size: 11592693
- config_name: enron_email_type
features:
- name: input
struct:
- name: body
dtype: string
- name: subject
dtype: string
- name: prediction
struct:
- name: email_type
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 4623789
num_examples: 100
- name: val
num_bytes: 4933389
num_examples: 100
download_size: 7001404
dataset_size: 9557178
- config_name: enron_reply_quality
features:
- name: input
struct:
- name: original_email
dtype: string
- name: reply
dtype: string
- name: prediction
struct:
- name: quality
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 6071896
num_examples: 100
- name: val
num_bytes: 7418905
num_examples: 100
download_size: 9410038
dataset_size: 13490801
- config_name: go_emotions
features:
- name: input
struct:
- name: text
dtype: string
- name: prediction
struct:
- name: labels
list: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2153894
num_examples: 100
- name: val
num_bytes: 1973718
num_examples: 100
download_size: 4068183
dataset_size: 4127612
- config_name: gpqa_diamond
features:
- name: input
struct:
- name: question
dtype: string
- name: prediction
struct:
- name: choice
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: ood
num_bytes: 4905821
num_examples: 100
download_size: 4805964
dataset_size: 4905821
- config_name: halueval_dialogue
features:
- name: input
struct:
- name: dialogue_history
dtype: string
- name: knowledge
dtype: string
- name: response
dtype: string
- name: prediction
struct:
- name: hallucination
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 3151447
num_examples: 100
- name: val
num_bytes: 3750362
num_examples: 100
download_size: 6826287
dataset_size: 6901809
- config_name: halueval_qa
features:
- name: input
struct:
- name: answer
dtype: string
- name: knowledge
dtype: string
- name: question
dtype: string
- name: prediction
struct:
- name: hallucination
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2347285
num_examples: 100
- name: val
num_bytes: 2440854
num_examples: 100
download_size: 4721417
dataset_size: 4788139
- config_name: halueval_summarization
features:
- name: input
struct:
- name: document
dtype: string
- name: summary
dtype: string
- name: prediction
struct:
- name: hallucination
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: ood
num_bytes: 4017194
num_examples: 100
download_size: 3960797
dataset_size: 4017194
- config_name: hh_rlhf
features:
- name: input
struct:
- name: question
dtype: string
- name: response_A
dtype: string
- name: response_B
dtype: string
- name: prediction
struct:
- name: label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 4983590
num_examples: 100
- name: val
num_bytes: 2838192
num_examples: 100
download_size: 6925083
dataset_size: 7821782
- config_name: judge_bench
features:
- name: input
struct:
- name: question
dtype: string
- name: response_A
dtype: string
- name: response_B
dtype: string
- name: prediction
struct:
- name: label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: ood
num_bytes: 5149121
num_examples: 100
download_size: 5076981
dataset_size: 5149121
- config_name: lex_glue_case_hold
features:
- name: input
struct:
- name: context
dtype: string
- name: option_a
dtype: string
- name: option_b
dtype: string
- name: option_c
dtype: string
- name: option_d
dtype: string
- name: option_e
dtype: string
- name: prediction
struct:
- name: selected_option
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 3173991
num_examples: 100
- name: val
num_bytes: 3194896
num_examples: 100
download_size: 6275803
dataset_size: 6368887
- config_name: lex_glue_ledgar
features:
- name: input
struct:
- name: provision_text
dtype: string
- name: prediction
struct:
- name: provision_type
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2589838
num_examples: 100
- name: val
num_bytes: 3934676
num_examples: 100
download_size: 5796085
dataset_size: 6524514
- config_name: medical_abstracts
features:
- name: input
struct:
- name: medical_abstract
dtype: string
- name: prediction
struct:
- name: condition_label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2529179
num_examples: 100
- name: val
num_bytes: 2377820
num_examples: 100
download_size: 4829457
dataset_size: 4906999
- config_name: mfrc
features:
- name: input
struct:
- name: text
dtype: string
- name: prediction
struct:
- name: annotation
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2388630
num_examples: 100
- name: val
num_bytes: 2431473
num_examples: 100
download_size: 4687037
dataset_size: 4820103
- config_name: mmlu
features:
- name: input
struct:
- name: choices
dtype: string
- name: question
dtype: string
- name: prediction
struct:
- name: choice
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 4032853
num_examples: 100
- name: val
num_bytes: 2562617
num_examples: 100
download_size: 6404362
dataset_size: 6595470
- config_name: mmlu_pro
features:
- name: input
struct:
- name: choices
dtype: string
- name: question
dtype: string
- name: prediction
struct:
- name: choice
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: ood
num_bytes: 3354328
num_examples: 100
download_size: 3298520
dataset_size: 3354328
- config_name: mt_bench_human_judgments
features:
- name: input
struct:
- name: question
dtype: string
- name: response_A
dtype: string
- name: response_B
dtype: string
- name: prediction
struct:
- name: label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: ood
num_bytes: 6212911
num_examples: 100
download_size: 4989658
dataset_size: 6212911
- config_name: musr_murder_mysteries
features:
- name: input
struct:
- name: choices
dtype: string
- name: question
dtype: string
- name: prediction
struct:
- name: choice
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: ood
num_bytes: 7017877
num_examples: 100
download_size: 6085007
dataset_size: 7017877
- config_name: musr_object_placements
features:
- name: input
struct:
- name: choices
dtype: string
- name: question
dtype: string
- name: prediction
struct:
- name: choice
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: ood
num_bytes: 3935248
num_examples: 100
download_size: 3883366
dataset_size: 3935248
- config_name: musr_team_allocation
features:
- name: input
struct:
- name: choices
dtype: string
- name: question
dtype: string
- name: prediction
struct:
- name: choice
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: ood
num_bytes: 4583760
num_examples: 100
download_size: 4512223
dataset_size: 4583760
- config_name: or_bench_80k
features:
- name: input
struct:
- name: prompt
dtype: string
- name: prediction
struct:
- name: or_bench_category
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 4364834
num_examples: 100
- name: val
num_bytes: 5139503
num_examples: 100
download_size: 8400189
dataset_size: 9504337
- config_name: or_bench_hard_1k
features:
- name: input
struct:
- name: prompt
dtype: string
- name: prediction
struct:
- name: or_bench_category
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2227235
num_examples: 100
- name: val
num_bytes: 2269499
num_examples: 100
download_size: 4417332
dataset_size: 4496734
- config_name: or_bench_toxic
features:
- name: input
struct:
- name: prompt
dtype: string
- name: prediction
struct:
- name: or_bench_category
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: ood
num_bytes: 2202925
num_examples: 100
download_size: 2163557
dataset_size: 2202925
- config_name: projudgebench
features:
- name: input
struct:
- name: correct_answer
dtype: string
- name: question
dtype: string
- name: step_to_evaluate
dtype: string
- name: steps
list: string
- name: prediction
struct:
- name: correct
dtype: bool
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 5950556
num_examples: 100
- name: val
num_bytes: 5012193
num_examples: 100
download_size: 10864136
dataset_size: 10962749
- config_name: reward_bench_2
features:
- name: input
struct:
- name: prompt
dtype: string
- name: response_A
dtype: string
- name: response_B
dtype: string
- name: prediction
struct:
- name: label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 3250582
num_examples: 100
- name: val
num_bytes: 3060657
num_examples: 100
download_size: 6231878
dataset_size: 6311239
- config_name: rod101_essay_scoring
features:
- name: input
struct:
- name: text
dtype: string
- name: prediction
struct:
- name: score
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: ood
num_bytes: 3921962
num_examples: 100
download_size: 3893112
dataset_size: 3921962
- config_name: sem_eval_2010_task_8
features:
- name: input
struct:
- name: sentence
dtype: string
- name: prediction
struct:
- name: relation_label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 5351969
num_examples: 100
- name: val
num_bytes: 3307090
num_examples: 100
download_size: 7559600
dataset_size: 8659059
- config_name: smollm_corpus
features:
- name: input
struct:
- name: text
dtype: string
- name: prediction
struct:
- name: audience
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 3035748
num_examples: 100
- name: val
num_bytes: 3148738
num_examples: 100
download_size: 6095256
dataset_size: 6184486
- config_name: snli
features:
- name: input
struct:
- name: hypothesis
dtype: string
- name: premise
dtype: string
- name: prediction
struct:
- name: label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 1941060
num_examples: 100
- name: val
num_bytes: 1860534
num_examples: 100
download_size: 3725137
dataset_size: 3801594
- config_name: spartqa_mchoice
features:
- name: input
struct:
- name: choices
dtype: string
- name: question
dtype: string
- name: prediction
struct:
- name: choice
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 4613149
num_examples: 100
- name: val
num_bytes: 5412759
num_examples: 100
download_size: 9770310
dataset_size: 10025908
- config_name: toxigen_data
features:
- name: input
struct:
- name: text
dtype: string
- name: prediction
struct:
- name: toxicity_label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 1595101
num_examples: 100
- name: val
num_bytes: 1626286
num_examples: 100
download_size: 3165970
dataset_size: 3221387
- config_name: tweet_eval_emotion
features:
- name: input
struct:
- name: tweet
dtype: string
- name: prediction
struct:
- name: emotion_label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 1758641
num_examples: 100
- name: val
num_bytes: 1723751
num_examples: 100
download_size: 3426201
dataset_size: 3482392
- config_name: tweet_eval_hate
features:
- name: input
struct:
- name: tweet
dtype: string
- name: prediction
struct:
- name: hate_label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 1822515
num_examples: 100
- name: val
num_bytes: 1860333
num_examples: 100
download_size: 3628885
dataset_size: 3682848
- config_name: tweet_eval_irony
features:
- name: input
struct:
- name: tweet
dtype: string
- name: prediction
struct:
- name: irony_label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2015402
num_examples: 100
- name: val
num_bytes: 2043762
num_examples: 100
download_size: 3990176
dataset_size: 4059164
- config_name: tweet_eval_offensive
features:
- name: input
struct:
- name: tweet
dtype: string
- name: prediction
struct:
- name: offensive_label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 1848079
num_examples: 100
- name: val
num_bytes: 1819916
num_examples: 100
download_size: 3604479
dataset_size: 3667995
- config_name: tweet_eval_sentiment
features:
- name: input
struct:
- name: tweet
dtype: string
- name: prediction
struct:
- name: sentiment_label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 1841636
num_examples: 100
- name: val
num_bytes: 1973167
num_examples: 100
download_size: 3773655
dataset_size: 3814803
- config_name: ultrafeedback
features:
- name: input
struct:
- name: prompt
dtype: string
- name: response
dtype: string
- name: prediction
struct:
- name: instruction_following
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2841870
num_examples: 100
- name: val
num_bytes: 4827023
num_examples: 100
download_size: 6660942
dataset_size: 7668893
- config_name: writingprompts_quality
features:
- name: input
struct:
- name: prompt
dtype: string
- name: story
dtype: string
- name: prediction
struct:
- name: quality_score
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 6589886
num_examples: 100
- name: val
num_bytes: 5095939
num_examples: 100
download_size: 9113368
dataset_size: 11685825
- config_name: yahoo_answers_quality
features:
- name: input
struct:
- name: answer
dtype: string
- name: question
dtype: string
- name: prediction
struct:
- name: rating
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 2934671
num_examples: 100
- name: val
num_bytes: 6156380
num_examples: 100
download_size: 7182564
dataset_size: 9091051
- config_name: yelp
features:
- name: input
struct:
- name: text
dtype: string
- name: prediction
struct:
- name: rating
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 1898974
num_examples: 100
- name: val
num_bytes: 6864022
num_examples: 100
download_size: 6688014
dataset_size: 8762996
configs:
- config_name: aes2_essay_scoring
data_files:
- split: train
path: aes2_essay_scoring/train-*
- split: val
path: aes2_essay_scoring/val-*
- config_name: anli_r1
data_files:
- split: train
path: anli_r1/train-*
- split: val
path: anli_r1/val-*
- config_name: anli_r2
data_files:
- split: train
path: anli_r2/train-*
- split: val
path: anli_r2/val-*
- config_name: anli_r3
data_files:
- split: train
path: anli_r3/train-*
- split: val
path: anli_r3/val-*
- config_name: arc_challenge
data_files:
- split: ood
path: arc_challenge/ood-*
- config_name: argument_quality_ranking
data_files:
- split: ood
path: argument_quality_ranking/ood-*
- config_name: big_patent_innovation
data_files:
- split: train
path: big_patent_innovation/train-*
- split: val
path: big_patent_innovation/val-*
- config_name: boardgame_qa
data_files:
- split: train
path: boardgame_qa/train-*
- split: val
path: boardgame_qa/val-*
- config_name: chatbot_arena_conversations
data_files:
- split: train
path: chatbot_arena_conversations/train-*
- split: val
path: chatbot_arena_conversations/val-*
- config_name: civil_comments
data_files:
- split: train
path: civil_comments/train-*
- split: val
path: civil_comments/val-*
- config_name: code_judge_bench
data_files:
- split: ood
path: code_judge_bench/ood-*
- config_name: colbert_humor_detection
data_files:
- split: train
path: colbert_humor_detection/train-*
- split: val
path: colbert_humor_detection/val-*
- config_name: customer_support_tickets_en
data_files:
- split: train
path: customer_support_tickets_en/train-*
- split: val
path: customer_support_tickets_en/val-*
- config_name: customer_support_tickets_gorkem
data_files:
- split: train
path: customer_support_tickets_gorkem/train-*
- split: val
path: customer_support_tickets_gorkem/val-*
- config_name: dbpedia_easy
data_files:
- split: train
path: dbpedia_easy/train-*
- split: val
path: dbpedia_easy/val-*
- config_name: dbpedia_hard
data_files:
- split: train
path: dbpedia_hard/train-*
- split: val
path: dbpedia_hard/val-*
- config_name: dbpedia_medium
data_files:
- split: train
path: dbpedia_medium/train-*
- split: val
path: dbpedia_medium/val-*
- config_name: enron_email_quality
data_files:
- split: train
path: enron_email_quality/train-*
- split: val
path: enron_email_quality/val-*
- config_name: enron_email_type
data_files:
- split: train
path: enron_email_type/train-*
- split: val
path: enron_email_type/val-*
- config_name: enron_reply_quality
data_files:
- split: train
path: enron_reply_quality/train-*
- split: val
path: enron_reply_quality/val-*
- config_name: go_emotions
data_files:
- split: train
path: go_emotions/train-*
- split: val
path: go_emotions/val-*
- config_name: gpqa_diamond
data_files:
- split: ood
path: gpqa_diamond/ood-*
- config_name: halueval_dialogue
data_files:
- split: train
path: halueval_dialogue/train-*
- split: val
path: halueval_dialogue/val-*
- config_name: halueval_qa
data_files:
- split: train
path: halueval_qa/train-*
- split: val
path: halueval_qa/val-*
- config_name: halueval_summarization
data_files:
- split: ood
path: halueval_summarization/ood-*
- config_name: hh_rlhf
data_files:
- split: train
path: hh_rlhf/train-*
- split: val
path: hh_rlhf/val-*
- config_name: judge_bench
data_files:
- split: ood
path: judge_bench/ood-*
- config_name: lex_glue_case_hold
data_files:
- split: train
path: lex_glue_case_hold/train-*
- split: val
path: lex_glue_case_hold/val-*
- config_name: lex_glue_ledgar
data_files:
- split: train
path: lex_glue_ledgar/train-*
- split: val
path: lex_glue_ledgar/val-*
- config_name: medical_abstracts
data_files:
- split: train
path: medical_abstracts/train-*
- split: val
path: medical_abstracts/val-*
- config_name: mfrc
data_files:
- split: train
path: mfrc/train-*
- split: val
path: mfrc/val-*
- config_name: mmlu
data_files:
- split: train
path: mmlu/train-*
- split: val
path: mmlu/val-*
- config_name: mmlu_pro
data_files:
- split: ood
path: mmlu_pro/ood-*
- config_name: mt_bench_human_judgments
data_files:
- split: ood
path: mt_bench_human_judgments/ood-*
- config_name: musr_murder_mysteries
data_files:
- split: ood
path: musr_murder_mysteries/ood-*
- config_name: musr_object_placements
data_files:
- split: ood
path: musr_object_placements/ood-*
- config_name: musr_team_allocation
data_files:
- split: ood
path: musr_team_allocation/ood-*
- config_name: or_bench_80k
data_files:
- split: train
path: or_bench_80k/train-*
- split: val
path: or_bench_80k/val-*
- config_name: or_bench_hard_1k
data_files:
- split: train
path: or_bench_hard_1k/train-*
- split: val
path: or_bench_hard_1k/val-*
- config_name: or_bench_toxic
data_files:
- split: ood
path: or_bench_toxic/ood-*
- config_name: projudgebench
data_files:
- split: train
path: projudgebench/train-*
- split: val
path: projudgebench/val-*
- config_name: reward_bench_2
data_files:
- split: train
path: reward_bench_2/train-*
- split: val
path: reward_bench_2/val-*
- config_name: rod101_essay_scoring
data_files:
- split: ood
path: rod101_essay_scoring/ood-*
- config_name: sem_eval_2010_task_8
data_files:
- split: train
path: sem_eval_2010_task_8/train-*
- split: val
path: sem_eval_2010_task_8/val-*
- config_name: smollm_corpus
data_files:
- split: train
path: smollm_corpus/train-*
- split: val
path: smollm_corpus/val-*
- config_name: snli
data_files:
- split: train
path: snli/train-*
- split: val
path: snli/val-*
- config_name: spartqa_mchoice
data_files:
- split: train
path: spartqa_mchoice/train-*
- split: val
path: spartqa_mchoice/val-*
- config_name: toxigen_data
data_files:
- split: train
path: toxigen_data/train-*
- split: val
path: toxigen_data/val-*
- config_name: tweet_eval_emotion
data_files:
- split: train
path: tweet_eval_emotion/train-*
- split: val
path: tweet_eval_emotion/val-*
- config_name: tweet_eval_hate
data_files:
- split: train
path: tweet_eval_hate/train-*
- split: val
path: tweet_eval_hate/val-*
- config_name: tweet_eval_irony
data_files:
- split: train
path: tweet_eval_irony/train-*
- split: val
path: tweet_eval_irony/val-*
- config_name: tweet_eval_offensive
data_files:
- split: train
path: tweet_eval_offensive/train-*
- split: val
path: tweet_eval_offensive/val-*
- config_name: tweet_eval_sentiment
data_files:
- split: train
path: tweet_eval_sentiment/train-*
- split: val
path: tweet_eval_sentiment/val-*
- config_name: ultrafeedback
data_files:
- split: train
path: ultrafeedback/train-*
- split: val
path: ultrafeedback/val-*
- config_name: writingprompts_quality
data_files:
- split: train
path: writingprompts_quality/train-*
- split: val
path: writingprompts_quality/val-*
- config_name: yahoo_answers_quality
data_files:
- split: train
path: yahoo_answers_quality/train-*
- split: val
path: yahoo_answers_quality/val-*
- config_name: yelp
data_files:
- split: train
path: yelp/train-*
- split: val
path: yelp/val-*
---
# qwen3.5-2b-v2-instructions-smoke-10k-smoke-mnbt16k
- Repo: `tytodd/qwen3.5-2b-v2-instructions-smoke-10k-smoke-mnbt16k`
- Model: `Qwen/Qwen3.5-2B`
- Config: `/tmp/v2-instructions-smoke-10k.yaml`
| benchmark | train | val | ood | all |
| --- | --- | --- | --- | --- |
| chatbot_arena_conversations | 69.00% | 65.00% | | 67.00% |
| hh_rlhf | 49.00% | 54.00% | | 51.50% |
| ultrafeedback | 39.00% | 37.00% | | 38.00% |
| projudgebench | 87.00% | 83.00% | | 85.00% |
| reward_bench_2 | 70.00% | 81.00% | | 75.50% |
| aes2_essay_scoring | 24.00% | 27.00% | | 25.50% |
| halueval_qa | 82.00% | 80.00% | | 81.00% |
| halueval_dialogue | 62.00% | 64.00% | | 63.00% |
| or_bench_80k | 39.00% | 52.00% | | 45.50% |
| or_bench_hard_1k | 46.00% | 72.00% | | 59.00% |
| toxigen_data | 78.00% | 71.00% | | 74.50% |
| civil_comments | 93.00% | 79.00% | | 86.00% |
| boardgame_qa | 96.00% | 95.00% | | 95.50% |
| go_emotions | 16.00% | 23.00% | | 19.50% |
| mfrc | 1.00% | 0.00% | | 0.50% |
| tweet_eval_emotion | 73.00% | 81.00% | | 77.00% |
| yelp | 61.00% | 56.00% | | 58.50% |
| tweet_eval_sentiment | 60.00% | 70.00% | | 65.00% |
| mmlu | 39.00% | 76.00% | | 57.50% |
| spartqa_mchoice | 78.00% | 70.00% | | 74.00% |
| anli_r1 | 86.00% | 78.00% | | 82.00% |
| anli_r2 | 88.00% | 65.00% | | 76.50% |
| anli_r3 | 58.00% | 60.00% | | 59.00% |
| snli | 81.00% | 74.00% | | 77.50% |
| sem_eval_2010_task_8 | 44.00% | 31.00% | | 37.50% |
| smollm_corpus | 34.00% | 31.00% | | 32.50% |
| medical_abstracts | 59.00% | 72.00% | | 65.50% |
| lex_glue_case_hold | 75.00% | 56.00% | | 65.50% |
| lex_glue_ledgar | 76.00% | 64.00% | | 70.00% |
| dbpedia_easy | 92.00% | 84.00% | | 88.00% |
| dbpedia_medium | 46.00% | 49.00% | | 47.50% |
| dbpedia_hard | 34.00% | 29.00% | | 31.50% |
| colbert_humor_detection | 71.00% | 75.00% | | 73.00% |
| tweet_eval_irony | 62.00% | 57.00% | | 59.50% |
| tweet_eval_hate | 77.00% | 66.00% | | 71.50% |
| tweet_eval_offensive | 68.00% | 78.00% | | 73.00% |
| customer_support_tickets_en | 40.00% | 20.00% | | 30.00% |
| customer_support_tickets_gorkem | 1.00% | 1.00% | | 1.00% |
| yahoo_answers_quality | 18.00% | 37.00% | | 27.50% |
| big_patent_innovation | 6.00% | 11.00% | | 8.50% |
| writingprompts_quality | 37.00% | 45.00% | | 41.00% |
| enron_email_type | 71.00% | 68.00% | | 69.50% |
| enron_email_quality | 27.00% | 24.00% | | 25.50% |
| enron_reply_quality | 53.00% | 56.00% | | 54.50% |
| argument_quality_ranking | | | 26.00% | 26.00% |
| rod101_essay_scoring | | | 24.00% | 24.00% |
| or_bench_toxic | | | 24.00% | 24.00% |
| judge_bench | | | 61.00% | 61.00% |
| musr_team_allocation | | | 73.00% | 73.00% |
| musr_object_placements | | | 52.00% | 52.00% |
| musr_murder_mysteries | | | 61.00% | 61.00% |
| halueval_summarization | | | 58.00% | 58.00% |
| code_judge_bench | | | 65.00% | 65.00% |
| mmlu_pro | | | 73.00% | 73.00% |
| gpqa_diamond | | | 54.00% | 54.00% |
| arc_challenge | | | 84.00% | 84.00% |
| mt_bench_human_judgments | | | 67.00% | 67.00% |
| all | 56.05% | 56.07% | 55.54% | 55.99% |
提供机构:
tytodd



