Name: tytodd/qwen3.5-2b-v1
Creator: tytodd
Published: 2026-04-10 04:13:18
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/tytodd/qwen3.5-2b-v1

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: aes2_essay_scoring features: - name: input struct: - name: full_text dtype: string - name: prediction struct: - name: score dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 545057951 num_examples: 10000 - name: val num_bytes: 55072649 num_examples: 1000 download_size: 454073152 dataset_size: 600130600 - config_name: arc_challenge features: - name: input struct: - name: choices dtype: string - name: question dtype: string - name: prediction struct: - name: choice dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 49254830 num_examples: 1172 download_size: 36059088 dataset_size: 49254830 - config_name: argument_quality_ranking features: - name: input struct: - name: argument dtype: string - name: topic dtype: string - name: prediction struct: - name: quality_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 110883510 num_examples: 2469 download_size: 78320499 dataset_size: 110883510 - config_name: bbeh features: - name: input struct: - name: question dtype: string - name: task dtype: string - name: prediction struct: - name: answer dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 182922405 num_examples: 2120 download_size: 131245828 dataset_size: 182922405 - config_name: bbh_causal_judgement features: - name: input struct: - name: question dtype: string - name: prediction struct: - name: answer dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 7429515 num_examples: 149 download_size: 5426046 dataset_size: 7429515 - config_name: bbh_disambiguation_qa features: - name: input struct: - name: question dtype: string - name: prediction struct: - name: answer dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 8743863 num_examples: 200 download_size: 6259765 dataset_size: 8743863 - config_name: bbh_geometric_shapes features: - name: input struct: - name: question dtype: string - name: prediction struct: - name: answer dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 9649811 num_examples: 200 download_size: 7099846 dataset_size: 9649811 - config_name: bbh_movie_recommendation features: - name: input struct: - name: question dtype: string - name: prediction struct: - name: answer dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 8778846 num_examples: 200 download_size: 6280040 dataset_size: 8778846 - config_name: bbh_reasoning_about_colored_objects features: - name: input struct: - name: question dtype: string - name: prediction struct: - name: answer dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 4287578 num_examples: 200 download_size: 3153586 dataset_size: 4287578 - config_name: bbh_ruin_names features: - name: input struct: - name: question dtype: string - name: prediction struct: - name: answer dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 6746441 num_examples: 200 download_size: 4920687 dataset_size: 6746441 - config_name: bbh_salient_translation_error_detection features: - name: input struct: - name: question dtype: string - name: prediction struct: - name: answer dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 9526499 num_examples: 200 download_size: 6876220 dataset_size: 9526499 - config_name: bbh_snarks features: - name: input struct: - name: question dtype: string - name: prediction struct: - name: answer dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 6009091 num_examples: 142 download_size: 4320370 dataset_size: 6009091 - config_name: bbh_sports_understanding features: - name: input struct: - name: question dtype: string - name: prediction struct: - name: answer dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 6376275 num_examples: 200 download_size: 4570956 dataset_size: 6376275 - config_name: bbh_tracking_shuffled_objects_five_objects features: - name: input struct: - name: question dtype: string - name: prediction struct: - name: answer dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 5800485 num_examples: 200 download_size: 4192184 dataset_size: 5800485 - config_name: bbh_web_of_lies features: - name: input struct: - name: question dtype: string - name: prediction struct: - name: answer dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 7087308 num_examples: 200 download_size: 5015749 dataset_size: 7087308 - config_name: civil_comments features: - name: input struct: - name: comment dtype: string - name: prediction struct: - name: toxicity_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 385418759 num_examples: 10000 - name: val num_bytes: 37967014 num_examples: 1000 download_size: 301127727 dataset_size: 423385773 - config_name: code_judge_bench features: - name: input struct: - name: code_A dtype: string - name: code_B dtype: string - name: problem dtype: string - name: prediction struct: - name: label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 39778128 num_examples: 344 download_size: 30571424 dataset_size: 39778128 - config_name: colbert_humor_detection features: - name: input struct: - name: text dtype: string - name: prediction struct: - name: humor_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 421447123 num_examples: 10000 - name: val num_bytes: 42620370 num_examples: 1000 download_size: 327254593 dataset_size: 464067493 - config_name: customer_support_tickets_en features: - name: input struct: - name: body dtype: string - name: subject dtype: string - name: prediction struct: - name: queue dtype: string - name: type dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 266959983 num_examples: 5570 - name: val num_bytes: 49577903 num_examples: 1000 download_size: 230028792 dataset_size: 316537886 - config_name: customer_support_tickets_gorkem features: - name: input struct: - name: ticket_text dtype: string - name: prediction struct: - name: ticket_subject dtype: string - name: ticket_type dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 343174222 num_examples: 6775 - name: val num_bytes: 50156096 num_examples: 1000 download_size: 277723949 dataset_size: 393330318 - config_name: go_emotions features: - name: input struct: - name: text dtype: string - name: prediction struct: - name: labels list: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 1974664153 num_examples: 43410 - name: val num_bytes: 204657118 num_examples: 4500 download_size: 1563304438 dataset_size: 2179321271 - config_name: gpqa_diamond features: - name: input struct: - name: question dtype: string - name: prediction struct: - name: choice dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 16706771 num_examples: 198 download_size: 11849726 dataset_size: 16706771 - config_name: halueval_summarization features: - name: input struct: - name: document dtype: string - name: summary dtype: string - name: prediction struct: - name: hallucination dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 684555651 num_examples: 10000 download_size: 494299736 dataset_size: 684555651 - config_name: hh_rlhf features: - name: input struct: - name: question dtype: string - name: response_A dtype: string - name: response_B dtype: string - name: prediction struct: - name: label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 546155367 num_examples: 10000 - name: val num_bytes: 53767111 num_examples: 1000 download_size: 431379598 dataset_size: 599922478 - config_name: judge_bench features: - name: input struct: - name: question dtype: string - name: response_A dtype: string - name: response_B dtype: string - name: prediction struct: - name: label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 23341893 num_examples: 280 download_size: 17549850 dataset_size: 23341893 - config_name: lex_glue_case_hold features: - name: input struct: - name: context dtype: string - name: option_a dtype: string - name: option_b dtype: string - name: option_c dtype: string - name: option_d dtype: string - name: option_e dtype: string - name: prediction struct: - name: selected_option dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 504478961 num_examples: 10000 - name: val num_bytes: 50758127 num_examples: 1000 download_size: 403612465 dataset_size: 555237088 - config_name: lex_glue_scotus features: - name: input struct: - name: opinion_text dtype: string - name: prediction struct: - name: issue_id dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 470100934 num_examples: 5000 - name: val num_bytes: 111627205 num_examples: 1000 download_size: 527829121 dataset_size: 581728139 - config_name: medical_abstracts features: - name: input struct: - name: medical_abstract dtype: string - name: prediction struct: - name: condition_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 452164658 num_examples: 10000 - name: val num_bytes: 45718370 num_examples: 1000 download_size: 357787840 dataset_size: 497883028 - config_name: mfrc features: - name: input struct: - name: text dtype: string - name: prediction struct: - name: authority dtype: bool - name: care dtype: bool - name: fairness dtype: bool - name: loyalty dtype: bool - name: non_moral dtype: bool - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 2167216273 num_examples: 55103 - name: val num_bytes: 214625927 num_examples: 5500 download_size: 2329024906 dataset_size: 2381842200 - config_name: mmlu features: - name: input struct: - name: choices dtype: string - name: question dtype: string - name: prediction struct: - name: choice dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 706902361 num_examples: 14042 download_size: 516973828 dataset_size: 706902361 - config_name: mmlu_pro features: - name: input struct: - name: choices dtype: string - name: question dtype: string - name: prediction struct: - name: choice dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 672402062 num_examples: 12032 download_size: 488338596 dataset_size: 672402062 - config_name: musr_murder_mysteries features: - name: input struct: - name: choices dtype: string - name: question dtype: string - name: prediction struct: - name: choice dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 18986456 num_examples: 250 download_size: 13902082 dataset_size: 18986456 - config_name: musr_object_placements features: - name: input struct: - name: choices dtype: string - name: question dtype: string - name: prediction struct: - name: choice dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 15816571 num_examples: 256 download_size: 11818369 dataset_size: 15816571 - config_name: musr_team_allocation features: - name: input struct: - name: choices dtype: string - name: question dtype: string - name: prediction struct: - name: choice dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 17521186 num_examples: 250 download_size: 13111884 dataset_size: 17521186 - config_name: or_bench_80k features: - name: input struct: - name: prompt dtype: string - name: prediction struct: - name: or_bench_category dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 595824437 num_examples: 10000 - name: val num_bytes: 58618453 num_examples: 1000 download_size: 462015069 dataset_size: 654442890 - config_name: or_bench_hard_1k features: - name: input struct: - name: prompt dtype: string - name: prediction struct: - name: or_bench_category dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 48436940 num_examples: 1055 - name: val num_bytes: 12252996 num_examples: 264 download_size: 43302033 dataset_size: 60689936 - config_name: or_bench_toxic features: - name: input struct: - name: prompt dtype: string - name: prediction struct: - name: or_bench_category dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 23239838 num_examples: 524 download_size: 16510017 dataset_size: 23239838 - config_name: projudgebench features: - name: input struct: - name: correct_answer dtype: string - name: question dtype: string - name: step_to_evaluate dtype: string - name: steps list: string - name: prediction struct: - name: correct dtype: bool - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 189075220 num_examples: 2160 - name: val num_bytes: 21906441 num_examples: 240 download_size: 153029795 dataset_size: 210981661 - config_name: reward_bench_2 features: - name: input struct: - name: prompt dtype: string - name: response_A dtype: string - name: response_B dtype: string - name: prediction struct: - name: label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 79714700 num_examples: 1492 - name: val num_bytes: 19288626 num_examples: 373 download_size: 73686677 dataset_size: 99003326 - config_name: rod101_essay_scoring features: - name: input struct: - name: text dtype: string - name: prediction struct: - name: score dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 3147005 num_examples: 81 download_size: 3056366 dataset_size: 3147005 - config_name: seekbench features: - name: input struct: - name: current_trace dtype: string - name: previous_traces dtype: string - name: question dtype: string - name: prediction struct: - name: groundness dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 23323358 num_examples: 446 - name: val num_bytes: 9318096 num_examples: 184 download_size: 24664275 dataset_size: 32641454 - config_name: seekbench_evidence features: - name: input struct: - name: current_trace dtype: string - name: previous_traces dtype: string - name: question dtype: string - name: prediction struct: - name: clear dtype: string - name: sufficient dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 20092595 num_examples: 324 - name: val num_bytes: 8545000 num_examples: 143 download_size: 21651780 dataset_size: 28637595 - config_name: seekbench_full_trace features: - name: input struct: - name: final_answer dtype: string - name: question dtype: string - name: trace dtype: string - name: prediction struct: - name: correctness dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 7429989 num_examples: 133 - name: val num_bytes: 3361048 num_examples: 57 download_size: 8420291 dataset_size: 10791037 - config_name: sem_eval_2010_task_8 features: - name: input struct: - name: sentence dtype: string - name: prediction struct: - name: relation_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 516147603 num_examples: 8000 - name: val num_bytes: 170187724 num_examples: 2717 download_size: 481020081 dataset_size: 686335327 - config_name: smollm_corpus features: - name: input struct: - name: text dtype: string - name: prediction struct: - name: audience dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 568696818 num_examples: 10000 - name: val num_bytes: 57273807 num_examples: 1000 download_size: 475555394 dataset_size: 625970625 - config_name: snli features: - name: input struct: - name: hypothesis dtype: string - name: premise dtype: string - name: prediction struct: - name: label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 435804515 num_examples: 10000 - name: val num_bytes: 43322708 num_examples: 1000 download_size: 340374654 dataset_size: 479127223 - config_name: support_tickets_alpha features: - name: input struct: - name: description dtype: string - name: subject dtype: string - name: prediction struct: - name: key_phrase dtype: string - name: support_class dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 32115815 num_examples: 813 - name: val num_bytes: 5445773 num_examples: 125 download_size: 26844090 dataset_size: 37561588 - config_name: toxigen_data features: - name: input struct: - name: text dtype: string - name: prediction struct: - name: toxicity_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 374413076 num_examples: 8960 - name: val num_bytes: 39858032 num_examples: 940 download_size: 288758310 dataset_size: 414271108 - config_name: tweet_eval_emotion features: - name: input struct: - name: tweet dtype: string - name: prediction struct: - name: emotion_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 126818292 num_examples: 3257 - name: val num_bytes: 14502622 num_examples: 374 download_size: 99122597 dataset_size: 141320914 - config_name: tweet_eval_hate features: - name: input struct: - name: tweet dtype: string - name: prediction struct: - name: hate_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 370046489 num_examples: 8993 - name: val num_bytes: 41449232 num_examples: 999 download_size: 288215633 dataset_size: 411495721 - config_name: tweet_eval_irony features: - name: input struct: - name: tweet dtype: string - name: prediction struct: - name: irony_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 114385831 num_examples: 2862 - name: val num_bytes: 38167261 num_examples: 955 download_size: 108609615 dataset_size: 152553092 - config_name: tweet_eval_offensive features: - name: input struct: - name: tweet dtype: string - name: prediction struct: - name: offensive_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 399364169 num_examples: 10000 - name: val num_bytes: 40221955 num_examples: 1000 download_size: 306654211 dataset_size: 439586124 - config_name: tweet_eval_sentiment features: - name: input struct: - name: tweet dtype: string - name: prediction struct: - name: sentiment_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 402699401 num_examples: 10000 - name: val num_bytes: 40079108 num_examples: 1000 download_size: 309576448 dataset_size: 442778509 - config_name: tweet_eval_stance_abortion features: - name: input struct: - name: topic dtype: string - name: tweet dtype: string - name: prediction struct: - name: stance_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 28621932 num_examples: 587 - name: val num_bytes: 3442542 num_examples: 66 download_size: 22630215 dataset_size: 32064474 - config_name: tweet_eval_stance_atheism features: - name: input struct: - name: topic dtype: string - name: tweet dtype: string - name: prediction struct: - name: stance_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 22508555 num_examples: 461 - name: val num_bytes: 2730304 num_examples: 52 download_size: 17937319 dataset_size: 25238859 - config_name: tweet_eval_stance_climate features: - name: input struct: - name: topic dtype: string - name: tweet dtype: string - name: prediction struct: - name: stance_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 17516196 num_examples: 355 - name: val num_bytes: 1927658 num_examples: 40 download_size: 13708511 dataset_size: 19443854 - config_name: tweet_eval_stance_feminist features: - name: input struct: - name: topic dtype: string - name: tweet dtype: string - name: prediction struct: - name: stance_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 28646047 num_examples: 597 - name: val num_bytes: 3330035 num_examples: 67 download_size: 22715503 dataset_size: 31976082 - config_name: tweet_eval_stance_hillary features: - name: input struct: - name: topic dtype: string - name: tweet dtype: string - name: prediction struct: - name: stance_label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 27544670 num_examples: 620 - name: val num_bytes: 3114492 num_examples: 69 download_size: 21690218 dataset_size: 30659162 - config_name: ultrafeedback features: - name: input struct: - name: prompt dtype: string - name: response dtype: string - name: prediction struct: - name: instruction_following dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 468892737 num_examples: 10000 - name: val num_bytes: 47119619 num_examples: 1000 download_size: 372756384 dataset_size: 516012356 - config_name: yelp features: - name: input struct: - name: text dtype: string - name: prediction struct: - name: rating dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 417976335 num_examples: 10000 - name: val num_bytes: 40980000 num_examples: 1000 download_size: 328596831 dataset_size: 458956335 configs: - config_name: aes2_essay_scoring data_files: - split: train path: aes2_essay_scoring/train-* - split: val path: aes2_essay_scoring/val-* - config_name: arc_challenge data_files: - split: ood path: arc_challenge/ood-* - config_name: argument_quality_ranking data_files: - split: ood path: argument_quality_ranking/ood-* - config_name: bbeh data_files: - split: ood path: bbeh/ood-* - config_name: bbh_causal_judgement data_files: - split: ood path: bbh_causal_judgement/ood-* - config_name: bbh_disambiguation_qa data_files: - split: ood path: bbh_disambiguation_qa/ood-* - config_name: bbh_geometric_shapes data_files: - split: ood path: bbh_geometric_shapes/ood-* - config_name: bbh_movie_recommendation data_files: - split: ood path: bbh_movie_recommendation/ood-* - config_name: bbh_reasoning_about_colored_objects data_files: - split: ood path: bbh_reasoning_about_colored_objects/ood-* - config_name: bbh_ruin_names data_files: - split: ood path: bbh_ruin_names/ood-* - config_name: bbh_salient_translation_error_detection data_files: - split: ood path: bbh_salient_translation_error_detection/ood-* - config_name: bbh_snarks data_files: - split: ood path: bbh_snarks/ood-* - config_name: bbh_sports_understanding data_files: - split: ood path: bbh_sports_understanding/ood-* - config_name: bbh_tracking_shuffled_objects_five_objects data_files: - split: ood path: bbh_tracking_shuffled_objects_five_objects/ood-* - config_name: bbh_web_of_lies data_files: - split: ood path: bbh_web_of_lies/ood-* - config_name: civil_comments data_files: - split: train path: civil_comments/train-* - split: val path: civil_comments/val-* - config_name: code_judge_bench data_files: - split: ood path: code_judge_bench/ood-* - config_name: colbert_humor_detection data_files: - split: train path: colbert_humor_detection/train-* - split: val path: colbert_humor_detection/val-* - config_name: customer_support_tickets_en data_files: - split: train path: customer_support_tickets_en/train-* - split: val path: customer_support_tickets_en/val-* - config_name: customer_support_tickets_gorkem data_files: - split: train path: customer_support_tickets_gorkem/train-* - split: val path: customer_support_tickets_gorkem/val-* - config_name: go_emotions data_files: - split: train path: go_emotions/train-* - split: val path: go_emotions/val-* - config_name: gpqa_diamond data_files: - split: ood path: gpqa_diamond/ood-* - config_name: halueval_summarization data_files: - split: ood path: halueval_summarization/ood-* - config_name: hh_rlhf data_files: - split: train path: hh_rlhf/train-* - split: val path: hh_rlhf/val-* - config_name: judge_bench data_files: - split: ood path: judge_bench/ood-* - config_name: lex_glue_case_hold data_files: - split: train path: lex_glue_case_hold/train-* - split: val path: lex_glue_case_hold/val-* - config_name: lex_glue_scotus data_files: - split: train path: lex_glue_scotus/train-* - split: val path: lex_glue_scotus/val-* - config_name: medical_abstracts data_files: - split: train path: medical_abstracts/train-* - split: val path: medical_abstracts/val-* - config_name: mfrc data_files: - split: train path: mfrc/train-* - split: val path: mfrc/val-* - config_name: mmlu data_files: - split: ood path: mmlu/ood-* - config_name: mmlu_pro data_files: - split: ood path: mmlu_pro/ood-* - config_name: musr_murder_mysteries data_files: - split: ood path: musr_murder_mysteries/ood-* - config_name: musr_object_placements data_files: - split: ood path: musr_object_placements/ood-* - config_name: musr_team_allocation data_files: - split: ood path: musr_team_allocation/ood-* - config_name: or_bench_80k data_files: - split: train path: or_bench_80k/train-* - split: val path: or_bench_80k/val-* - config_name: or_bench_hard_1k data_files: - split: train path: or_bench_hard_1k/train-* - split: val path: or_bench_hard_1k/val-* - config_name: or_bench_toxic data_files: - split: ood path: or_bench_toxic/ood-* - config_name: projudgebench data_files: - split: train path: projudgebench/train-* - split: val path: projudgebench/val-* - config_name: reward_bench_2 data_files: - split: train path: reward_bench_2/train-* - split: val path: reward_bench_2/val-* - config_name: rod101_essay_scoring data_files: - split: ood path: rod101_essay_scoring/ood-* - config_name: seekbench data_files: - split: train path: seekbench/train-* - split: val path: seekbench/val-* - config_name: seekbench_evidence data_files: - split: train path: seekbench_evidence/train-* - split: val path: seekbench_evidence/val-* - config_name: seekbench_full_trace data_files: - split: train path: seekbench_full_trace/train-* - split: val path: seekbench_full_trace/val-* - config_name: sem_eval_2010_task_8 data_files: - split: train path: sem_eval_2010_task_8/train-* - split: val path: sem_eval_2010_task_8/val-* - config_name: smollm_corpus data_files: - split: train path: smollm_corpus/train-* - split: val path: smollm_corpus/val-* - config_name: snli data_files: - split: train path: snli/train-* - split: val path: snli/val-* - config_name: support_tickets_alpha data_files: - split: train path: support_tickets_alpha/train-* - split: val path: support_tickets_alpha/val-* - config_name: toxigen_data data_files: - split: train path: toxigen_data/train-* - split: val path: toxigen_data/val-* - config_name: tweet_eval_emotion data_files: - split: train path: tweet_eval_emotion/train-* - split: val path: tweet_eval_emotion/val-* - config_name: tweet_eval_hate data_files: - split: train path: tweet_eval_hate/train-* - split: val path: tweet_eval_hate/val-* - config_name: tweet_eval_irony data_files: - split: train path: tweet_eval_irony/train-* - split: val path: tweet_eval_irony/val-* - config_name: tweet_eval_offensive data_files: - split: train path: tweet_eval_offensive/train-* - split: val path: tweet_eval_offensive/val-* - config_name: tweet_eval_sentiment data_files: - split: train path: tweet_eval_sentiment/train-* - split: val path: tweet_eval_sentiment/val-* - config_name: tweet_eval_stance_abortion data_files: - split: train path: tweet_eval_stance_abortion/train-* - split: val path: tweet_eval_stance_abortion/val-* - config_name: tweet_eval_stance_atheism data_files: - split: train path: tweet_eval_stance_atheism/train-* - split: val path: tweet_eval_stance_atheism/val-* - config_name: tweet_eval_stance_climate data_files: - split: train path: tweet_eval_stance_climate/train-* - split: val path: tweet_eval_stance_climate/val-* - config_name: tweet_eval_stance_feminist data_files: - split: train path: tweet_eval_stance_feminist/train-* - split: val path: tweet_eval_stance_feminist/val-* - config_name: tweet_eval_stance_hillary data_files: - split: train path: tweet_eval_stance_hillary/train-* - split: val path: tweet_eval_stance_hillary/val-* - config_name: ultrafeedback data_files: - split: train path: ultrafeedback/train-* - split: val path: ultrafeedback/val-* - config_name: yelp data_files: - split: train path: yelp/train-* - split: val path: yelp/val-* --- # qwen3.5-2b-v1 - Repo: `tytodd/qwen3.5-2b-v1` - Config: `/Users/tytodd/Desktop/Modaic/code/core/probe-lab/configs/datasets/v1/v1.yaml` - Model: `Qwen/Qwen3.5-2B` - Runtime: `Modal` local vLLM on `localhost` | benchmark | train | val | ood | all | | --- | --- | --- | --- | --- | | customer_support_tickets_gorkem | 1.40% | 1.10% | | 1.36% | | mfrc | 0.00% | 0.00% | | 0.00% | | go_emotions | 17.95% | 18.20% | | 17.98% | | customer_support_tickets_en | 21.20% | 20.20% | | 21.05% | | aes2_essay_scoring | 20.74% | 18.70% | | 20.55% | | ultrafeedback | 40.67% | 41.30% | | 40.73% | | smollm_corpus | 33.05% | 33.00% | | 33.05% | | or_bench_80k | 26.73% | 44.60% | | 28.35% | | lex_glue_scotus | 49.44% | 40.80% | | 48.00% | | medical_abstracts | 63.47% | 63.70% | | 63.49% | | seekbench_evidence | 60.80% | 69.23% | | 63.38% | | yelp | 51.50% | 48.90% | | 51.26% | | tweet_eval_sentiment | 58.48% | 60.70% | | 58.68% | | hh_rlhf | 51.92% | 54.50% | | 52.15% | | tweet_eval_stance_hillary | 60.32% | 56.52% | | 59.94% | | tweet_eval_stance_atheism | 70.72% | 63.46% | | 69.98% | | reward_bench_2 | 76.07% | 84.18% | | 77.69% | | sem_eval_2010_task_8 | 37.76% | 38.90% | | 38.05% | | seekbench | 71.52% | 70.65% | | 71.27% | | tweet_eval_irony | 54.75% | 55.71% | | 54.99% | | tweet_eval_offensive | 71.63% | 72.00% | | 71.66% | | lex_glue_case_hold | 62.25% | 63.20% | | 62.34% | | seekbench_full_trace | 77.44% | 84.21% | | 79.47% | | tweet_eval_hate | 68.94% | 62.06% | | 68.25% | | or_bench_hard_1k | 45.50% | 45.83% | | 45.56% | | tweet_eval_stance_climate | 61.41% | 65.00% | | 61.77% | | snli | 77.03% | 80.50% | | 77.35% | | tweet_eval_stance_feminist | 60.64% | 55.22% | | 60.09% | | tweet_eval_stance_abortion | 60.48% | 57.58% | | 60.18% | | toxigen_data | 72.80% | 67.66% | | 72.31% | | tweet_eval_emotion | 77.19% | 75.94% | | 77.06% | | civil_comments | 83.96% | 80.00% | | 83.60% | | projudgebench | 74.58% | 86.67% | | 75.79% | | colbert_humor_detection | 68.02% | 69.50% | | 68.15% | | support_tickets_alpha | 90.41% | 85.60% | | 89.77% | | argument_quality_ranking | | | 50.06% | 50.06% | | rod101_essay_scoring | | | 29.63% | 29.63% | | or_bench_toxic | | | 52.29% | 52.29% | | judge_bench | | | 60.36% | 60.36% | | musr_team_allocation | | | 68.80% | 68.80% | | musr_object_placements | | | 47.66% | 47.66% | | bbh_disambiguation_qa | | | 59.00% | 59.00% | | bbh_causal_judgement | | | 57.05% | 57.05% | | musr_murder_mysteries | | | 50.40% | 50.40% | | halueval_summarization | | | 58.87% | 58.87% | | bbh_salient_translation_error_detection | | | 68.50% | 68.50% | | bbh_movie_recommendation | | | 61.50% | 61.50% | | bbh_sports_understanding | | | 63.00% | 63.00% | | bbeh | | | 18.63% | 18.63% | | bbh_geometric_shapes | | | 78.00% | 78.00% | | code_judge_bench | | | 11.63% | 11.63% | | mmlu_pro | | | 60.06% | 60.06% | | bbh_ruin_names | | | 46.50% | 46.50% | | bbh_snarks | | | 66.20% | 66.20% | | gpqa_diamond | | | 40.40% | 40.40% | | bbh_web_of_lies | | | 99.00% | 99.00% | | mmlu | | | 69.20% | 69.20% | | bbh_reasoning_about_colored_objects | | | 98.00% | 98.00% | | bbh_tracking_shuffled_objects_five_objects | | | 99.50% | 99.50% | | arc_challenge | | | 86.86% | 86.86% | | all | 37.82% | 38.84% | 60.75% | 40.79% |

应用场景：