Nan-Do/SPP_30K_reasoning_tasks

Name: Nan-Do/SPP_30K_reasoning_tasks
Creator: Nan-Do
Published: 2023-08-22 07:09:57
License: 暂无描述

Hugging Face2023-08-22 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Nan-Do/SPP_30K_reasoning_tasks

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是Synthetic Python Problems (SPP)数据集的增强版本，旨在提高大型语言模型（LLMs）对Python 3代码的理解和推理能力。数据集包含三种任务类型：1) 给定代码并生成示例调用及预期返回值；2) 给定描述和示例调用，要求编写函数；3) 给定函数和示例调用（无预期值），要求预测函数应返回的值。数据集的特征包括类型、指令、输入和输出，所有数据均为英文。数据集仅包含训练数据，未划分训练集和测试集。

This dataset is an enhanced version of the Synthetic Python Problems (SPP) dataset, designed to improve large language models' (LLMs') understanding and reasoning capabilities regarding Python 3 code. The dataset includes three types of tasks: 1) Given a code snippet, generate sample invocations and their expected return values; 2) Given a natural language description and sample invocations, implement the corresponding function; 3) Given a function and its sample invocations (without expected values), predict the return value that the function should produce. The features of the dataset include task type, instruction, input, and output, and all data is in English. The dataset only contains training data, with no separate training and test splits.

提供机构：

Nan-Do

原始信息汇总

数据集概述

数据集信息

特征

type: 数据类型为 int64
instruction: 数据类型为 string
input: 数据类型为 string
output: 数据类型为 string

数据分割

train:
- 字节数: 44253001
- 样本数: 89898

下载和数据集大小

下载大小: 10073876 字节
数据集大小: 44253001 字节

任务类别

文本生成
对话
文本到文本生成

语言

英语

数据集名称

SPP python reasoning tasks

数据集摘要

该数据集是 Synthetic Python Problems(SPP) Dataset 的增强版本。数据集从原始数据中去重并使用 Python 解释器验证。

数据集包含三种不同的任务：

类型1: 输入代码，要求模型生成示例调用及预期返回值。
类型2: 输入描述和示例调用，要求模型编写函数。
类型3: 输入函数和示例调用（不含预期值），要求模型编写函数应返回的内容。

数据集创建

创建时间

2023年8月

创建目的

提高大型语言模型（LLMs）对 Python 3 推理/理解能力。

源数据

源数据集可在 Synthetic Python Problems(SPP) Dataset 找到。

注释

数据集包含 instruction, input, output 和 type 列。
type 列指示任务类型（1到3）。

注释过程

响应通过解析函数的文档字符串生成。

5,000+

优质数据集

54 个

任务类型

进入经典数据集