rwq-elo/rwq-battle-records
收藏Hugging Face2024-03-06 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/rwq-elo/rwq-battle-records
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-nc-4.0
---
# RWQ battle records dataset
The dataset stores the battle records of 24 popular LLMs conduct Elo pairwise battles on [RWQ questions](https://huggingface.co/datasets/rwq-elo/rwq-questions) and use GPT-4 as judger to determine the winner on each round of QA.
## Columns
| Column Name | Data Type | Description |
| -------------- | --------- | ------------------------------------------------------------------------------------------------------------ |
| question | string | The question to ask LLM. |
| model | string | The id/name of LLM. |
| model_a | string | The id/name of model 1 of pairwise LLM to battle facing another on the same question. |
| model_b | string | The id/name of model 2 of pairwise LLM to battle facing another on the same question. |
| winner | string | The winner model valued as one of `model_a, model_b, tie or tie(all bad)` as outcome of one pairwise battle. |
| judger | string | The gpt name with version, such as gpt-4-turbo. |
| tstamp | string | The time battle happens, format as `2023-11-23 02:56:34.433226`. |
| answer_a | string | The answer of model_a. |
| answer_b | string | The answer of model_b. |
| gpt_4_response | string | The reponse text of gpt-4 as judger to evaluate and score the better LLM. |
| gpt_4_score | string | The scores of model_a and model_b with json text, e.g., `{'model_a': '0', 'model_b': '1'}`. |
| is_valid | boolean | The row is valid or not. Set to false, when gpt-4 reject the eval because of policy. |
| elo_rating | float | The elo rating score of LLM. |
## Citation
TODO
提供机构:
rwq-elo
原始信息汇总
RWQ battle records dataset
该数据集存储了24个流行的LLM在RWQ问题上进行Elo配对战斗的记录,并使用GPT-4作为裁判来确定每轮QA的胜者。
列信息
| 列名 | 数据类型 | 描述 |
|---|---|---|
| question | string | 向LLM提出的问题。 |
| model | string | LLM的ID或名称。 |
| model_a | string | 配对战斗中作为模型1的LLM的ID或名称。 |
| model_b | string | 配对战斗中作为模型2的LLM的ID或名称。 |
| winner | string | 配对战斗的结果,值为model_a, model_b, tie或tie(all bad)。 |
| judger | string | GPT的名称及版本,例如gpt-4-turbo。 |
| tstamp | string | 战斗发生的时间,格式为2023-11-23 02:56:34.433226。 |
| answer_a | string | 模型a的回答。 |
| answer_b | string | 模型b的回答。 |
| gpt_4_response | string | GPT-4作为裁判的评估和评分响应文本。 |
| gpt_4_score | string | 模型a和模型b的得分,格式为JSON文本,例如{model_a: 0, model_b: 1}。 |
| is_valid | boolean | 该行是否有效。当GPT-4因策略拒绝评估时,设置为false。 |
| elo_rating | float | LLM的Elo评分。 |



