HAERAE-HUB/K2-Feedback
收藏Hugging Face2024-04-26 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/HAERAE-HUB/K2-Feedback
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: instruction
dtype: string
- name: criteria
dtype: string
- name: score1
dtype: string
- name: score2
dtype: string
- name: score3
dtype: string
- name: score4
dtype: string
- name: score5
dtype: string
- name: reference
dtype: string
- name: category
dtype: string
- name: response
dtype: string
- name: feedback
dtype: string
- name: score
dtype: int64
- name: source
dtype: string
- name: __index_level_0__
dtype: int64
splits:
- name: train
num_bytes: 667967023
num_examples: 99655
download_size: 327826358
dataset_size: 667967023
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
Research Paper coming soon!
# K^2-Feedback
K^2-Feedback is a dataset crafted to enhance fine-grained evaluation capabilities in Korean language models.
Building upon the [Feedback-Collection](https://huggingface.co/datasets/kaist-ai/Feedback-Collection), K^2-Feedback incorporates instructions specific to Korean culture and linguistics.
# Dataset Overview
**K^2-Feedback** includes 100,000 samples divided into two distinct subsets:
1. **Translated Samples (50,000 entries)**: This subset consists of samples directly translated from the Feedback-Collection using the [Seagull-13B translation model](https://huggingface.co/kuotient/Seagull-13b-translation).
Post translation, these samples undergo heuristic filtering (length checks and language detection), model-based filtering (cosine similarity), and semantic deduplication.
To ensure cultural relevance and avoid Western bias, we leverage GPT-3.5-Turbo to remove instances related to Western culture.
4. **Original Samples (50,000 entries)**: TThe original samples are a collection of instructions deeply rooted in Korean culture. Initially, we collect seed instructions that reflect Korean culture from HAERAE-Bench, CLIcK, and KMMLU. These instructions serve as in-context exemplars to prompt GPT-4 in generating new instructions.
### Ethical Considerations and Usage Recommendations
While constructing K^2-Feedback from existing benchmarks raises potential concerns of contamination, it is important to clarify that the primary aim of this dataset is not to train models to excel on these benchmarks but to function as a robust evaluator.
Thus, including this dataset into training may result in skewed performance on benchmarks from which seed questions are derived. We strongly advise against using K^2-Feedback for direct training purposes to prevent biases in model performance.
### Point of Contact
For any questions contact us via the following email:)
```
spthsrbwls123@yonsei.ac.kr
```
提供机构:
HAERAE-HUB
原始信息汇总
K^2-Feedback 数据集概述
数据集特征
- instruction: 数据类型为字符串。
- criteria: 数据类型为字符串。
- score1: 数据类型为字符串。
- score2: 数据类型为字符串。
- score3: 数据类型为字符串。
- score4: 数据类型为字符串。
- score5: 数据类型为字符串。
- reference: 数据类型为字符串。
- category: 数据类型为字符串。
- response: 数据类型为字符串。
- feedback: 数据类型为字符串。
- score: 数据类型为整数。
- source: 数据类型为字符串。
- index_level_0: 数据类型为整数。
数据集划分
- train: 包含99,655个样本,数据大小为667,967,023字节。
数据集大小
- 下载大小: 327,826,358字节
- 数据集大小: 667,967,023字节
数据集配置
- config_name: default
- data_files:
- split: train
- path: data/train-*
数据集内容
- Translated Samples: 50,000个样本,翻译自Feedback-Collection,并经过多重过滤和去重处理。
- Original Samples: 50,000个样本,源自韩国文化的种子指令,通过GPT-4生成新指令。
使用建议
- 不建议使用K^2-Feedback数据集进行直接训练,以避免模型性能的偏差。



