HAERAE-HUB/K2-Feedback

Name: HAERAE-HUB/K2-Feedback
Creator: HAERAE-HUB
Published: 2024-04-26 03:57:19
License: 暂无描述

Hugging Face2024-04-26 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/HAERAE-HUB/K2-Feedback

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: instruction dtype: string - name: criteria dtype: string - name: score1 dtype: string - name: score2 dtype: string - name: score3 dtype: string - name: score4 dtype: string - name: score5 dtype: string - name: reference dtype: string - name: category dtype: string - name: response dtype: string - name: feedback dtype: string - name: score dtype: int64 - name: source dtype: string - name: __index_level_0__ dtype: int64 splits: - name: train num_bytes: 667967023 num_examples: 99655 download_size: 327826358 dataset_size: 667967023 configs: - config_name: default data_files: - split: train path: data/train-* --- Research Paper coming soon! # K^2-Feedback K^2-Feedback is a dataset crafted to enhance fine-grained evaluation capabilities in Korean language models. Building upon the [Feedback-Collection](https://huggingface.co/datasets/kaist-ai/Feedback-Collection), K^2-Feedback incorporates instructions specific to Korean culture and linguistics. # Dataset Overview **K^2-Feedback** includes 100,000 samples divided into two distinct subsets: 1. **Translated Samples (50,000 entries)**: This subset consists of samples directly translated from the Feedback-Collection using the [Seagull-13B translation model](https://huggingface.co/kuotient/Seagull-13b-translation). Post translation, these samples undergo heuristic filtering (length checks and language detection), model-based filtering (cosine similarity), and semantic deduplication. To ensure cultural relevance and avoid Western bias, we leverage GPT-3.5-Turbo to remove instances related to Western culture. 4. **Original Samples (50,000 entries)**: TThe original samples are a collection of instructions deeply rooted in Korean culture. Initially, we collect seed instructions that reflect Korean culture from HAERAE-Bench, CLIcK, and KMMLU. These instructions serve as in-context exemplars to prompt GPT-4 in generating new instructions. ### Ethical Considerations and Usage Recommendations While constructing K^2-Feedback from existing benchmarks raises potential concerns of contamination, it is important to clarify that the primary aim of this dataset is not to train models to excel on these benchmarks but to function as a robust evaluator. Thus, including this dataset into training may result in skewed performance on benchmarks from which seed questions are derived. We strongly advise against using K^2-Feedback for direct training purposes to prevent biases in model performance. ### Point of Contact For any questions contact us via the following email:) ``` spthsrbwls123@yonsei.ac.kr ```

提供机构：

HAERAE-HUB

原始信息汇总

K^2-Feedback 数据集概述

数据集特征

instruction: 数据类型为字符串。
criteria: 数据类型为字符串。
score1: 数据类型为字符串。
score2: 数据类型为字符串。
score3: 数据类型为字符串。
score4: 数据类型为字符串。
score5: 数据类型为字符串。
reference: 数据类型为字符串。
category: 数据类型为字符串。
response: 数据类型为字符串。
feedback: 数据类型为字符串。
score: 数据类型为整数。
source: 数据类型为字符串。
index_level_0: 数据类型为整数。

数据集划分

train: 包含99,655个样本，数据大小为667,967,023字节。

数据集大小

下载大小: 327,826,358字节
数据集大小: 667,967,023字节

数据集配置

config_name: default
data_files:
- split: train
- path: data/train-*

数据集内容

Translated Samples: 50,000个样本，翻译自Feedback-Collection，并经过多重过滤和去重处理。
Original Samples: 50,000个样本，源自韩国文化的种子指令，通过GPT-4生成新指令。

使用建议

不建议使用K^2-Feedback数据集进行直接训练，以避免模型性能的偏差。

5,000+

优质数据集

54 个

任务类型

进入经典数据集