when2rl/distilabel-capybara-dpo-7k-binarized_reformatted

Name: when2rl/distilabel-capybara-dpo-7k-binarized_reformatted
Creator: when2rl
Published: 2024-04-17 00:44:24
License: 暂无描述

Hugging Face2024-04-17 更新2024-06-22 收录

下载链接：

https://hf-mirror.com/datasets/when2rl/distilabel-capybara-dpo-7k-binarized_reformatted

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en dataset_info: features: - name: prompt dtype: string - name: prompt_id dtype: string - name: chosen list: - name: content dtype: string - name: role dtype: string - name: rejected list: - name: content dtype: string - name: role dtype: string - name: messages list: - name: content dtype: string - name: role dtype: string - name: score_chosen dtype: float64 - name: score_rejected dtype: float64 - name: other_info struct: - name: chosen-model dtype: string - name: generation_prompt sequence: string - name: new_generations sequence: string - name: original_response dtype: string - name: rejected-model dtype: string - name: source dtype: string splits: - name: train num_bytes: 273625666.701309 num_examples: 7562 download_size: 117571506 dataset_size: 273625666.701309 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for distilabel-capybara-dpo-7k-binarized_reformatted  This dataset comes from argilla/distilabel-capybara-dpo-7k-binarized with: 1. changed all rating to rating * 2, because the original ratings are in [1,5], whereas all other DPO pairs typically have ratings [1, 10]. This is to make future data preprocessing easier. 2. reformatted the dataset to be in the same format as HuggingFaceH4/ultrafeedback_binarized, with additional annotations (e.g., source) stored under the `other_info` column. 3. *(new)* removed all rows where the `chosen` is the same as `rejected`. This removed 1 row from the training set. ## Dataset Details ### Dataset Description  - **Curated by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] ### Dataset Sources [optional]  - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses  ### Direct Use  [More Information Needed] ### Out-of-Scope Use  [More Information Needed] ## Dataset Structure  [More Information Needed] ## Dataset Creation ### Curation Rationale  [More Information Needed] ### Source Data  #### Data Collection and Processing  [More Information Needed] #### Who are the source data producers?  [More Information Needed] ### Annotations [optional]  #### Annotation process  [More Information Needed] #### Who are the annotators?  [More Information Needed] #### Personal and Sensitive Information  [More Information Needed] ## Bias, Risks, and Limitations  [More Information Needed] ### Recommendations  Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation [optional]  **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional]  [More Information Needed] ## More Information [optional] [More Information Needed] ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact [More Information Needed]

提供机构：

when2rl

原始信息汇总

数据集卡片 for distilabel-capybara-dpo-7k-binarized_reformatted

数据集详情

数据集描述

语言: 英语
数据集大小: 273625666.701309 字节
下载大小: 117571506 字节
配置: default
数据文件:
- 分割: train
- 路径: data/train-*
- 样本数量: 7562

数据集特征

prompt: 字符串
prompt_id: 字符串
chosen:
- content: 字符串
- role: 字符串
rejected:
- content: 字符串
- role: 字符串
messages:
- content: 字符串
- role: 字符串
score_chosen: 浮点数 (float64)
score_rejected: 浮点数 (float64)
other_info:
- chosen-model: 字符串
- generation_prompt: 字符串序列
- new_generations: 字符串序列
- original_response: 字符串
- rejected-model: 字符串
- source: 字符串

数据集修改

所有评分乘以2，因为原始评分范围是[1,5]，而其他DPO对通常有评分范围[1, 10]。这是为了简化未来的数据预处理。
重新格式化数据集，使其与HuggingFaceH4/ultrafeedback_binarized格式相同，并在other_info列中存储额外的注释（例如，来源）。
移除所有chosen与rejected相同的行。这从训练集中移除了1行。

5,000+

优质数据集

54 个

任务类型

进入经典数据集