five

werty1248/Magpie-Ko-Qwen2.5-Reasoning-Raw

收藏
Hugging Face2024-11-19 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/werty1248/Magpie-Ko-Qwen2.5-Reasoning-Raw
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集主要用于韩语的多步骤推理问题生成任务。数据集包含训练集,共有113,742个样本,每个样本包含问题、种子、温度、top_p、答案和最小邻居距离等特征。数据生成使用了Qwen/Qwen2.5-32B-Instruct模型生成问题,Qwen/Qwen2.5-72B-Instruct模型生成答案。过滤条件包括去除包含中文字符或全英文句子的问答对,以及基于嵌入模型计算的相似度分布去除最小邻居距离小于0.1的样本。此外,还去除了答案长度为4096的重复样本。

This dataset is primarily used for multi-step reasoning question generation tasks in Korean. The dataset includes a training set with 113,742 samples, each containing features such as question, seed, temperature, top_p, answer, and min_neighbor_distance. The data generation process uses the Qwen/Qwen2.5-32B-Instruct model to generate questions and the Qwen/Qwen2.5-72B-Instruct model to generate answers. Filtering criteria include removing question/answer pairs containing Chinese characters or full English sentences, and removing samples with a min_neighbor_distance less than 0.1 based on similarity distribution calculated by an embedding model. Additionally, samples with answer lengths of 4096 are removed to avoid repetition.
提供机构:
werty1248
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作