five

argilla-warehouse/magpie-ultra-v0.2-test-refined

收藏
Hugging Face2024-09-11 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/argilla-warehouse/magpie-ultra-v0.2-test-refined
下载链接
链接失效反馈
官方服务:
资源简介:
--- size_categories: n<1K dataset_info: features: - name: instruction dtype: string - name: response dtype: string - name: model_name dtype: string splits: - name: train num_bytes: 28266043 num_examples: 10000 download_size: 15311753 dataset_size: 28266043 configs: - config_name: default data_files: - split: train path: data/train-* tags: - synthetic - distilabel - rlaif --- <p align="left"> <a href="https://github.com/argilla-io/distilabel"> <img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/> </a> </p> # Dataset Card for magpie-ultra-v0.2-test-refined This dataset has been created with [distilabel](https://distilabel.argilla.io/). ## Dataset Summary This dataset contains a `pipeline.yaml` which can be used to reproduce the pipeline that generated it in distilabel using the `distilabel` CLI: ```console distilabel pipeline run --config "https://huggingface.co/datasets/argilla-warehouse/magpie-ultra-v0.2-test-refined/raw/main/pipeline.yaml" ``` or explore the configuration: ```console distilabel pipeline info --config "https://huggingface.co/datasets/argilla-warehouse/magpie-ultra-v0.2-test-refined/raw/main/pipeline.yaml" ``` ## Dataset structure The examples have the following structure per configuration: <details><summary> Configuration: default </summary><hr> ```json { "instruction": "You are a 55-year-old male who has been trapped on a deserted island for nearly 20 years. Your hair is grey, your beard is long and wild, and your clothes are tattered and worn. You have built a life on this island, creating a small village of wooden huts, and finding food through fishing and farming the local wildlife. How do you react when you see a boat on the horizon?", "model_name": "meta-llama/Meta-Llama-3.1-405B-Instruct-FP8", "response": "(stunned, eyes fixed on the horizon) Oh... oh my... (pausing, taking a deep breath) It can\u0027t be. After all these years... (my voice cracks with emotion) I\u0027ve given up hope so many times, thought I\u0027d never see the day... (shaking my head, as if to clear the cobwebs) Wait, what if it\u0027s just a mirage? What if I\u0027m just seeing things? (squinting, straining my eyes to get a better look) No, no, I\u0027m sure of it. That\u0027s a boat, all right. (my eyes well up with tears) I\u0027ve got to make sure they see me! (rushing to gather materials) I need a fire, something to signal them! (starting to gather dry leaves, twigs, and other flammable materials) Oh, please, please, please... let them see me... (muttering to myself, as I work to start a fire) This could be my chance... my chance to finally go home... (pausing, looking up at the boat) What if they\u0027re not friendly? What if... (pushing the thought aside) No, no, I won\u0027t think about that now. I just need to get their attention... (focusing on starting the fire, my hands shaking with excitement and nervousness) Come on, come on... please see me..." } ``` This subset can be loaded as: ```python from datasets import load_dataset ds = load_dataset("argilla-warehouse/magpie-ultra-v0.2-test-refined", "default") ``` Or simply as it follows, since there's only one configuration and is named `default`: ```python from datasets import load_dataset ds = load_dataset("argilla-warehouse/magpie-ultra-v0.2-test-refined") ``` </details> ## References ``` @misc{xu2024magpiealignmentdatasynthesis, title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin}, year={2024}, eprint={2406.08464}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2406.08464}, } ```
提供机构:
argilla-warehouse
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作