syntaxsynth/instruct_code_cleaning

Name: syntaxsynth/instruct_code_cleaning
Creator: syntaxsynth
Published: 2024-03-25 04:24:53
License: 暂无描述

Hugging Face2024-03-25 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/syntaxsynth/instruct_code_cleaning

下载链接

链接失效反馈

官方服务：

资源简介：

--- size_categories: - 10K<n<100K task_categories: - text-generation dataset_info: features: - name: source dtype: string - name: task dtype: string - name: question dtype: string - name: answer dtype: string splits: - name: train num_bytes: 333589265 num_examples: 94562 download_size: 104501596 dataset_size: 333589265 configs: - config_name: default data_files: - split: train path: data/train-* --- # SFT code dataset building Contain a list of tasks useful when building a iniitial dataset source: 1. reverse_translation Given a history of conversations, what would the human ask next? 2. reverse_translation_first_round Suppose you already have a response, the LLM must predict what question does the human asked 3. clean_code Given a code snippet, it determines whether its useful and atomic enough to be use for a response by LLM 4. gen_code_question Generates a question given a code snippet

提供机构：

syntaxsynth

原始信息汇总

数据集概述

基本信息

大小范围: 10K<n<100K
任务类别: text-generation

数据集特征

名称: source
- 数据类型: string
名称: task
- 数据类型: string
名称: question
- 数据类型: string
名称: answer
- 数据类型: string

数据分割

分割名称: train
- 字节数: 333589265
- 示例数: 94562

下载与数据集大小

下载大小: 104501596
数据集大小: 333589265

配置

配置名称: default
- 数据文件:
  - 分割: train
    - 路径: data/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集