when2rl/mt_bench_human_judgments_reformatted

Name: when2rl/mt_bench_human_judgments_reformatted
Creator: when2rl
Published: 2024-05-02 21:04:51
License: 暂无描述

Hugging Face2024-05-02 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/when2rl/mt_bench_human_judgments_reformatted

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: prompt dtype: string - name: prompt_id dtype: string - name: chosen list: - name: content dtype: string - name: role dtype: string - name: rejected list: - name: content dtype: string - name: role dtype: string - name: messages list: - name: content dtype: string - name: role dtype: string - name: score_chosen dtype: float64 - name: score_rejected dtype: float64 - name: other_info struct: - name: judge dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: question_id dtype: int64 - name: winner dtype: string splits: - name: train_human num_bytes: 12237070 num_examples: 2129 - name: test_human num_bytes: 12237070 num_examples: 2129 - name: train_gpt4_pair num_bytes: 13750311 num_examples: 2353 - name: test_gpt4_pair num_bytes: 13750311 num_examples: 2353 download_size: 4334198 dataset_size: 51974762 configs: - config_name: default data_files: - split: train_human path: data/train_human-* - split: test_human path: data/test_human-* - split: train_gpt4_pair path: data/train_gpt4_pair-* - split: test_gpt4_pair path: data/test_gpt4_pair-* --- # Dataset Card for when2rl/mt_bench_human_judgments_reformatted  Reformatted and deduped (e.g., alpaca13b vs gpt4 may have the same answer pair as alpaca13b vs gpt-3.5-turbo for some questions) from `lmsys/mt_bench_human_judgments`. This can be used as a quick evaluation metric to "measure" MT-bench performance during training. Note the split names are converted to `train_` and `test_`. Although the `train_` splits will NOT be used to train anything, this split name makes some data processing/scripts easier. ## Dataset Details ### Dataset Description  - **Curated by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] ### Dataset Sources [optional]  - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses  ### Direct Use  [More Information Needed] ### Out-of-Scope Use  [More Information Needed] ## Dataset Structure  [More Information Needed] ## Dataset Creation ### Curation Rationale  [More Information Needed] ### Source Data  #### Data Collection and Processing  [More Information Needed] #### Who are the source data producers?  [More Information Needed] ### Annotations [optional]  #### Annotation process  [More Information Needed] #### Who are the annotators?  [More Information Needed] #### Personal and Sensitive Information  [More Information Needed] ## Bias, Risks, and Limitations  [More Information Needed] ### Recommendations  Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation [optional]  **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional]  [More Information Needed] ## More Information [optional] [More Information Needed] ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact [More Information Needed]

提供机构：

when2rl

原始信息汇总

数据集概述

数据集名称

when2rl/mt_bench_human_judgments_reformatted

数据集特征

prompt (字符串)
prompt_id (字符串)
chosen (列表)
- content (字符串)
- role (字符串)
rejected (列表)
- content (字符串)
- role (字符串)
messages (列表)
- content (字符串)
- role (字符串)
score_chosen (浮点数)
score_rejected (浮点数)
other_info (结构体)
- judge (字符串)
- model_a (字符串)
- model_b (字符串)
- question_id (整数)
- winner (字符串)

数据集分割

train_human
- num_bytes: 12237070
- num_examples: 2129
test_human
- num_bytes: 12237070
- num_examples: 2129
train_gpt4_pair
- num_bytes: 13750311
- num_examples: 2353
test_gpt4_pair
- num_bytes: 13750311
- num_examples: 2353

数据集大小

download_size: 4334198
dataset_size: 51974762

配置

config_name: default
data_files:
- split: train_human, test_human, train_gpt4_pair, test_gpt4_pair
- path: data/train_human-, data/test_human-, data/train_gpt4_pair-, data/test_gpt4_pair-

5,000+

优质数据集

54 个

任务类型

进入经典数据集