RLHFlow/CodeUltraFeedback-standard

Name: RLHFlow/CodeUltraFeedback-standard
Creator: RLHFlow
Published: 2024-04-27 22:26:37
License: 暂无描述

Hugging Face2024-04-27 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/RLHFlow/CodeUltraFeedback-standard

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: rejected list: - name: content dtype: string - name: role dtype: string - name: chosen list: - name: content dtype: string - name: role dtype: string - name: chosen_score dtype: string - name: rejected_score dtype: string splits: - name: train num_bytes: 245682332 num_examples: 50156 download_size: 47661064 dataset_size: 245682332 configs: - config_name: default data_files: - split: train path: data/train-* --- The original dataset is coseal/CodeUltraFeedback. We provide the processing script as follows. In summarization, for each prompt, we include all possible comparisons, except those with the same rating. ```python ds = load_dataset("coseal/CodeUltraFeedback", split="train") import itertools data = [] for example in ds: prompt = example['instruction'] responses = {} for row in example['responses']: model_name, response = row['model'], row['response'] responses[model_name] = response annotations = {} for row in example['annotations']: model_name, rating = row['model'], row['rating'] annotations[model_name] = rating preference = 'code-' + example['preference'] all_models = [] for model_name in responses.keys(): if rating == 'N/A': continue all_models.append(model_name) all_combinations = list(itertools.combinations(all_models, 2)) assert len(all_combinations) == len(all_models) * (len(all_models) - 1) / 2 for combination in all_combinations: response1 = responses[combination[0]] rating1 = annotations[combination[0]] response2 = responses[combination[1]] rating2 = annotations[combination[1]] if rating1 == rating2: continue if rating1 > rating2: chosen_message = [ {"content": prompt, "role": "user"}, {"content": response1, "role": "assistant"}, ] rejected_message = [ {"content": prompt, "role": "user"}, {"content": response2, "role": "assistant"}, ] chosen_rating = rating1 rejected_rating = rating2 elif rating1 < rating2: chosen_message = [ {"content": prompt, "role": "user"}, {"content": response2, "role": "assistant"}, ] rejected_message = [ {"content": prompt, "role": "user"}, {"content": response1, "role": "assistant"}, ] chosen_rating = rating2 rejected_rating = rating1 else: print("error") data.append({"rejected": rejected_message, "chosen": chosen_message, "rejected_score": rejected_rating, "chosen_score": chosen_rating}) ```

提供机构：

RLHFlow

原始信息汇总

数据集概述

数据集特征

rejected
- content: 数据类型为字符串
- role: 数据类型为字符串
chosen
- content: 数据类型为字符串
- role: 数据类型为字符串
chosen_score: 数据类型为字符串
rejected_score: 数据类型为字符串

数据集分割

train
- num_bytes: 245682332
- num_examples: 50156

数据集大小

download_size: 47661064
dataset_size: 245682332

配置

config_name: default
- data_files
  - split: train
    - path: data/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集