swype/instruct

Name: swype/instruct
Creator: swype
Published: 2023-04-05 23:14:28
License: 暂无描述

Hugging Face2023-04-05 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/swype/instruct

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit --- # A large instruct dataset This dataset is a combination of multiple sources, including the GPT4All dataset, the Alpaca dataset from Stanford, custom generation using AllenAI augmentation, and some dataset augmentation from open-source Meta datasets. The dataset is split into 70% for training, 20% for validation, and 10% for testing. ## Description The Swype.com dataset contains prompt and completion pairs for various tasks. It's an augmented version of the following datasets: - [GPT4All](https://github.com/nomic-ai/gpt4all): A dataset containing a wide range of tasks for training and evaluating general-purpose language models. - [Alpaca dataset from Stanford](https://github.com/tatsu-lab/stanford_alpaca): A dataset containing prompts, completions, and annotations for controllable text generation. - Custom generation using [AllenAI augmentation](https://allenai.org): Augmentation performed using the advanced NLP tools provided by AllenAI. - Some dataset augmentation from open-source Meta datasets: Additional augmentation from various open-source Meta datasets. The dataset is designed for training and evaluating language models on diverse tasks, with a focus on controllable and instruction-based text generation. ## Dataset Structure The dataset contains the following columns: - `prompt`: The input prompt string, representing a task or question. - `completion`: The output completion string, representing the answer or generated text based on the prompt. ## Citation If you use this dataset in your research or work, please cite it as follows: @misc{srikanth2023swypedataset, author = {Srikanth Srinivas}, title = {Swype.com Dataset}, year = {2023}, publisher = {Swype.com}, howpublished = {\url{https://swype.com}}, email = {s@swype.com} }

提供机构：

swype

原始信息汇总

数据集概述

数据集名称

A large instruct dataset

数据集来源

GPT4All
Alpaca dataset from Stanford
Custom generation using AllenAI augmentation
Some dataset augmentation from open-source Meta datasets

数据集用途

用于训练和评估语言模型，特别是在可控和指令基础的文本生成方面。

数据集结构

包含以下列：

prompt: 输入提示字符串，代表任务或问题。
completion: 输出完成字符串，代表答案或基于提示生成的文本。

数据集划分

训练集: 70%
验证集: 20%
测试集: 10%

许可证

MIT

引用信息

@misc{srikanth2023swypedataset, author = {Srikanth Srinivas}, title = {Swype.com Dataset}, year = {2023}, publisher = {Swype.com}, howpublished = {url{https://swype.com}}, email = {s@swype.com} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集