mimartin1234/uplimit-synthetic-data-week-1-filtered
收藏Hugging Face2025-04-04 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/mimartin1234/uplimit-synthetic-data-week-1-filtered
下载链接
链接失效反馈官方服务:
资源简介:
Uplimit课程中“Synthetic Data Generation for Fine-Tuning”的一部分,这是一个经过清洗和过滤的真实指令调整数据集的子集,用于监督微调(SFT)的实验。目标是通过对现实世界中的指令-响应数据进行语义去重和自动质量评分,生成高质量的合成数据集。数据集包含提示和响应的文本以及预测的教育价值分数。
This dataset is part of the Uplimit course Synthetic Data Generation for Fine-Tuning. It is a cleaned and filtered subset of a real instruction-tuning dataset intended for experimentation with supervised fine-tuning (SFT). The goal was to produce a high-quality synthetic dataset by curating and filtering real-world instruction-response data using semantic deduplication and automated quality scoring. The dataset includes the text of prompts and responses, as well as the predicted educational value score.
提供机构:
mimartin1234



