five

tytodd/qwen3.5-2b-lmsys-arena

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/tytodd/qwen3.5-2b-lmsys-arena
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: chatbot_arena_conversations features: - name: input struct: - name: question dtype: string - name: response_A dtype: string - name: response_B dtype: string - name: prediction struct: - name: label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: train num_bytes: 48572696 num_examples: 1000 - name: val num_bytes: 13135090 num_examples: 250 download_size: 45223912 dataset_size: 61707786 - config_name: mt_bench_human_judgments features: - name: input struct: - name: question dtype: string - name: response_A dtype: string - name: response_B dtype: string - name: prediction struct: - name: label dtype: string - name: reasoning dtype: string - name: messages struct: - name: messages list: - name: content dtype: string - name: role dtype: string - name: outputs struct: - name: reasoning_content dtype: string - name: text dtype: string - name: correct dtype: bool splits: - name: ood num_bytes: 66930133 num_examples: 1000 download_size: 47622543 dataset_size: 66930133 configs: - config_name: chatbot_arena_conversations data_files: - split: train path: chatbot_arena_conversations/train-* - split: val path: chatbot_arena_conversations/val-* - config_name: mt_bench_human_judgments data_files: - split: ood path: mt_bench_human_judgments/ood-* --- # qwen3.5-2b-lmsys-arena - Repo: `tytodd/qwen3.5-2b-lmsys-arena` - Config: `/Users/tytodd/Desktop/Modaic/code/core/probe-lab/configs/datasets/lmsys-arena/lmsys-arena.yaml` - Model: `Qwen/Qwen3.5-2B` - Runtime: `Modal` local vLLM on `localhost` | benchmark | train | val | ood | all | | --- | --- | --- | --- | --- | | chatbot_arena_conversations | 74.00% | 72.00% | | 73.60% | | mt_bench_human_judgments | | | 66.60% | 66.60% | | all | 74.00% | 72.00% | 66.60% | 70.49% |
提供机构:
tytodd
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作