five

swype/instruct

收藏
Hugging Face2023-04-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/swype/instruct
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit --- # A large instruct dataset This dataset is a combination of multiple sources, including the GPT4All dataset, the Alpaca dataset from Stanford, custom generation using AllenAI augmentation, and some dataset augmentation from open-source Meta datasets. The dataset is split into 70% for training, 20% for validation, and 10% for testing. ## Description The Swype.com dataset contains prompt and completion pairs for various tasks. It's an augmented version of the following datasets: - [GPT4All](https://github.com/nomic-ai/gpt4all): A dataset containing a wide range of tasks for training and evaluating general-purpose language models. - [Alpaca dataset from Stanford](https://github.com/tatsu-lab/stanford_alpaca): A dataset containing prompts, completions, and annotations for controllable text generation. - Custom generation using [AllenAI augmentation](https://allenai.org): Augmentation performed using the advanced NLP tools provided by AllenAI. - Some dataset augmentation from open-source Meta datasets: Additional augmentation from various open-source Meta datasets. The dataset is designed for training and evaluating language models on diverse tasks, with a focus on controllable and instruction-based text generation. ## Dataset Structure The dataset contains the following columns: - `prompt`: The input prompt string, representing a task or question. - `completion`: The output completion string, representing the answer or generated text based on the prompt. ## Citation If you use this dataset in your research or work, please cite it as follows: @misc{srikanth2023swypedataset, author = {Srikanth Srinivas}, title = {Swype.com Dataset}, year = {2023}, publisher = {Swype.com}, howpublished = {\url{https://swype.com}}, email = {s@swype.com} }
提供机构:
swype
原始信息汇总

数据集概述

数据集名称

A large instruct dataset

数据集来源

  • GPT4All
  • Alpaca dataset from Stanford
  • Custom generation using AllenAI augmentation
  • Some dataset augmentation from open-source Meta datasets

数据集用途

用于训练和评估语言模型,特别是在可控和指令基础的文本生成方面。

数据集结构

包含以下列:

  • prompt: 输入提示字符串,代表任务或问题。
  • completion: 输出完成字符串,代表答案或基于提示生成的文本。

数据集划分

  • 训练集: 70%
  • 验证集: 20%
  • 测试集: 10%

许可证

MIT

引用信息

@misc{srikanth2023swypedataset, author = {Srikanth Srinivas}, title = {Swype.com Dataset}, year = {2023}, publisher = {Swype.com}, howpublished = {url{https://swype.com}}, email = {s@swype.com} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作