tytodd/qwen3.5-2b-lmsys-arena
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/tytodd/qwen3.5-2b-lmsys-arena
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: chatbot_arena_conversations
features:
- name: input
struct:
- name: question
dtype: string
- name: response_A
dtype: string
- name: response_B
dtype: string
- name: prediction
struct:
- name: label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: train
num_bytes: 48572696
num_examples: 1000
- name: val
num_bytes: 13135090
num_examples: 250
download_size: 45223912
dataset_size: 61707786
- config_name: mt_bench_human_judgments
features:
- name: input
struct:
- name: question
dtype: string
- name: response_A
dtype: string
- name: response_B
dtype: string
- name: prediction
struct:
- name: label
dtype: string
- name: reasoning
dtype: string
- name: messages
struct:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: outputs
struct:
- name: reasoning_content
dtype: string
- name: text
dtype: string
- name: correct
dtype: bool
splits:
- name: ood
num_bytes: 66930133
num_examples: 1000
download_size: 47622543
dataset_size: 66930133
configs:
- config_name: chatbot_arena_conversations
data_files:
- split: train
path: chatbot_arena_conversations/train-*
- split: val
path: chatbot_arena_conversations/val-*
- config_name: mt_bench_human_judgments
data_files:
- split: ood
path: mt_bench_human_judgments/ood-*
---
# qwen3.5-2b-lmsys-arena
- Repo: `tytodd/qwen3.5-2b-lmsys-arena`
- Config: `/Users/tytodd/Desktop/Modaic/code/core/probe-lab/configs/datasets/lmsys-arena/lmsys-arena.yaml`
- Model: `Qwen/Qwen3.5-2B`
- Runtime: `Modal` local vLLM on `localhost`
| benchmark | train | val | ood | all |
| --- | --- | --- | --- | --- |
| chatbot_arena_conversations | 74.00% | 72.00% | | 73.60% |
| mt_bench_human_judgments | | | 66.60% | 66.60% |
| all | 74.00% | 72.00% | 66.60% | 70.49% |
提供机构:
tytodd



