heegyu/UltraInteract_pair_longest_multiturn
收藏Hugging Face2024-04-15 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/heegyu/UltraInteract_pair_longest_multiturn
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
dataset_info:
features:
- name: task
dtype: string
- name: dataset
dtype: string
- name: trajectory
list:
- name: from
dtype: string
- name: value
dtype: string
- name: chosen
dtype: string
- name: rejected
dtype: string
- name: id
dtype: string
- name: parent_id
dtype: string
- name: __index_level_0__
dtype: string
- name: __index_level_1__
dtype: int64
splits:
- name: train
num_bytes: 196675986
num_examples: 26658
download_size: 78610248
dataset_size: 196675986
---
# Dataset Card for "UltraInteract_pair_longest_multiturn"
- Original Dataset: [openbmb/UltraInteract_pair](https://huggingface.co/datasets/openbmb/UltraInteract_pair)
- Filtered multiturn instances and longest item for each reasoning tree.
Data processing code:
```python
from datasets import load_dataset, Dataset
dataset = load_dataset("openbmb/UltraInteract_pair")
df = dataset['train'].to_pandas()
df["turns"] = df["trajectory"].apply(lambda x: len(x))
df = df[df.turns > 1]
df = df.groupby("parent_id").apply(lambda x: x[x["turns"] == x["turns"].max()])
print(df)
print(df.shape)
df = df.drop(columns=["turns"])
dataset['train'] = Dataset.from_pandas(df)
dataset.push_to_hub("heegyu/UltraInteract_pair_longest_multiturn")
```
提供机构:
heegyu
原始信息汇总
数据集概述
数据集名称
UltraInteract_pair_longest_multiturn
数据集配置
- config_name: default
- data_files:
- split: train
- path: data/train-*
数据集特征
- task: string
- dataset: string
- trajectory:
- from: string
- value: string
- chosen: string
- rejected: string
- id: string
- parent_id: string
- index_level_0: string
- index_level_1: int64
数据集分割
- name: train
- num_bytes: 196675986
- num_examples: 26658
数据集大小
- download_size: 78610248
- dataset_size: 196675986



