swype/instruct
收藏Hugging Face2023-04-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/swype/instruct
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
---
# A large instruct dataset
This dataset is a combination of multiple sources, including the GPT4All dataset, the Alpaca dataset from Stanford, custom generation using AllenAI augmentation, and some dataset augmentation from open-source Meta datasets. The dataset is split into 70% for training, 20% for validation, and 10% for testing.
## Description
The Swype.com dataset contains prompt and completion pairs for various tasks. It's an augmented version of the following datasets:
- [GPT4All](https://github.com/nomic-ai/gpt4all): A dataset containing a wide range of tasks for training and evaluating general-purpose language models.
- [Alpaca dataset from Stanford](https://github.com/tatsu-lab/stanford_alpaca): A dataset containing prompts, completions, and annotations for controllable text generation.
- Custom generation using [AllenAI augmentation](https://allenai.org): Augmentation performed using the advanced NLP tools provided by AllenAI.
- Some dataset augmentation from open-source Meta datasets: Additional augmentation from various open-source Meta datasets.
The dataset is designed for training and evaluating language models on diverse tasks, with a focus on controllable and instruction-based text generation.
## Dataset Structure
The dataset contains the following columns:
- `prompt`: The input prompt string, representing a task or question.
- `completion`: The output completion string, representing the answer or generated text based on the prompt.
## Citation
If you use this dataset in your research or work, please cite it as follows:
@misc{srikanth2023swypedataset,
author = {Srikanth Srinivas},
title = {Swype.com Dataset},
year = {2023},
publisher = {Swype.com},
howpublished = {\url{https://swype.com}},
email = {s@swype.com}
}
提供机构:
swype
原始信息汇总
数据集概述
数据集名称
A large instruct dataset
数据集来源
- GPT4All
- Alpaca dataset from Stanford
- Custom generation using AllenAI augmentation
- Some dataset augmentation from open-source Meta datasets
数据集用途
用于训练和评估语言模型,特别是在可控和指令基础的文本生成方面。
数据集结构
包含以下列:
prompt: 输入提示字符串,代表任务或问题。completion: 输出完成字符串,代表答案或基于提示生成的文本。
数据集划分
- 训练集: 70%
- 验证集: 20%
- 测试集: 10%
许可证
MIT
引用信息
@misc{srikanth2023swypedataset, author = {Srikanth Srinivas}, title = {Swype.com Dataset}, year = {2023}, publisher = {Swype.com}, howpublished = {url{https://swype.com}}, email = {s@swype.com} }



