five

SlimOrca

收藏
魔搭社区2026-05-16 更新2024-06-08 收录
下载链接:
https://modelscope.cn/datasets/swift/SlimOrca
下载链接
链接失效反馈
官方服务:
资源简介:
# Overview This is a new curated subset of our OpenOrca data. This release provides an efficient means of reaching performance on-par with using larger slices of our data, while only including ~500k GPT-4 completions. The key change in this dataset is that we've done an additional pass, using GPT-4 to remove answers which appear wrong based on the human annotations from the FLAN dataset. This reduces the dataset size to only ~500k entries, allowing training to a similar quality level to our previous releases with 2/3 the compute requirement. # Demo Models * https://huggingface.co/openaccess-ai-collective/jackalope-7b * https://huggingface.co/Open-Orca/Mistral-7B-SlimOrca # Citation ```bibtex @misc{SlimOrca, title = {SlimOrca: An Open Dataset of GPT-4 Augmented FLAN Reasoning Traces, with Verification}, author = {Wing Lian and Guan Wang and Bleys Goodson and Eugene Pentland and Austin Cook and Chanvichet Vong and "Teknium"}, year = {2023}, publisher = {HuggingFace}, url = {https://https://huggingface.co/Open-Orca/SlimOrca} } ``` ```bibtex @misc{mukherjee2023orca, title={Orca: Progressive Learning from Complex Explanation Traces of GPT-4}, author={Subhabrata Mukherjee and Arindam Mitra and Ganesh Jawahar and Sahaj Agarwal and Hamid Palangi and Ahmed Awadallah}, year={2023}, eprint={2306.02707}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` ```bibtex @misc{longpre2023flan, title={The Flan Collection: Designing Data and Methods for Effective Instruction Tuning}, author={Shayne Longpre and Le Hou and Tu Vu and Albert Webson and Hyung Won Chung and Yi Tay and Denny Zhou and Quoc V. Le and Barret Zoph and Jason Wei and Adam Roberts}, year={2023}, eprint={2301.13688}, archivePrefix={arXiv}, primaryClass={cs.AI} } ```

# 数据集概览 本数据集为OpenOrca(OpenOrca)数据的全新精选子集。本次发布仅包含约50万条GPT-4生成补全内容,却可实现与使用更大规模数据切片时相当的模型性能,为训练提供了高效路径。 本数据集的核心改进在于新增了一轮筛选流程:借助GPT-4,基于FLAN(FLAN)数据集的人工标注结果,剔除了存在明显错误的回答。此举将数据集规模压缩至约50万条样本,使得训练所需计算资源仅为此前版本的2/3,即可达到相近的模型质量水准。 # 演示模型 * https://huggingface.co/openaccess-ai-collective/jackalope-7b * https://huggingface.co/Open-Orca/Mistral-7B-SlimOrca # 引用文献 bibtex @misc{SlimOrca, title = {SlimOrca:基于GPT-4增强FLAN推理轨迹的开源数据集及验证}, author = {Wing Lian、Guan Wang、Bleys Goodson、Eugene Pentland、Austin Cook、Chanvichet Vong、"Teknium"}, year = {2023}, publisher = {HuggingFace}, url = {https://https://huggingface.co/Open-Orca/SlimOrca} } bibtex @misc{mukherjee2023orca, title={Orca:从GPT-4的复杂解释轨迹中渐进式学习}, author={Subhabrata Mukherjee、Arindam Mitra、Ganesh Jawahar、Sahaj Agarwal、Hamid Palangi、Ahmed Awadallah}, year={2023}, eprint={2306.02707}, archivePrefix={arXiv}, primaryClass={cs.CL} } bibtex @misc{longpre2023flan, title={FLAN合集:面向高效指令微调的数据与方法设计}, author={Shayne Longpre、Le Hou、Tu Vu、Albert Webson、Hyung Won Chung、Yi Tay、Denny Zhou、Quoc V. Le、Barret Zoph、Jason Wei、Adam Roberts}, year={2023}, eprint={2301.13688}, archivePrefix={arXiv}, primaryClass={cs.AI} }
提供机构:
maas
创建时间:
2024-06-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作