five

ai2-adapt-dev/flan_v2_converted

收藏
Hugging Face2024-10-14 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ai2-adapt-dev/flan_v2_converted
下载链接
链接失效反馈
官方服务:
资源简介:
This is a converted version of the Flan dataset into Tulu SFT training format. The conversion script can be found in our [open-instruct](https://github.com/allenai/open-instruct/blob/main/scripts/data/sft/flan.py) repo. The conversion took the following parameters: - apply_keyword_filters: True - apply_empty_message_filters: True - push_to_hub: True - hf_entity: ai2-adapt-dev - converted_dataset_name: flan_v2_converted - local_save_dir: ./data/sft/flan The original FLAN dataset needs extensive efforts to be regenerated, so we are using [a reproduced version by the OpenOrca team](https://huggingface.co/datasets/Open-Orca/FLAN).More specifically, we only use their top level jsonl files, which is a subset of the original dataset.And by default, we only use the `cot_fsopt_data`, `cot_zsopt_data`, `niv2_fsopt_data`, `niv2_zsopt_data` `flan_fsopt_data`, `flan_zsopt_data`, `t0_fsopt_data` subsets.If you want to use more data, you can modify this script to load more data from their Huggingface repo.Please refer to their Huggingface repo [here](https://huggingface.co/datasets/Open-Orca/FLAN) and the [original FLAN v2 repo](https://github.com/google-research/FLAN/tree/main/flan/v2) for more information about this dataset and the license.
提供机构:
ai2-adapt-dev
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作