DevShubham/python-text-training-instruct-ai

Name: DevShubham/python-text-training-instruct-ai
Creator: DevShubham
Published: 2024-05-13 17:45:02
License: 暂无描述

Hugging Face2024-05-13 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/DevShubham/python-text-training-instruct-ai

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是2024-02-03更新的matlok Python Copilot数据集，用于构建理解如何使用开源GitHub项目的多模态编码模型，特别是与Agora开源AI研究实验室相关的项目。数据集包含Python代码的指令和响应，格式为Alpaca和YAML。数据集分为多个配置，每个配置都有训练和测试集。数据集包含1182526行数据，大小为2.1 GB，数据格式为指令型，涉及1258个Python仓库。

提供机构：

DevShubham

原始信息汇总

数据集概述

基本信息

许可证: other
数据集名称: 2024-02-03 - python copilot instructions on how to code using alpaca and yaml
数据集大小: 2.1 GB
数据类型: instruct
格式: Introduction on code usage using alpaca and yaml response
行数: 1182526
标签: python-copilot, python-coding, python-architecture, knowledge-graphs, multimodal, text-image-audio, fine-tuning, training, question-answering, image-knowledge-graph, alpaca, mp3, png, text, instruct, coding, task, prompt, response, yaml
任务类别: text-generation, question-answering
任务ID: parsing

数据集配置

配置名称: andromeda, swarms, swarms_pytorch, longnet, zeta
数据文件: 每个配置包含训练和测试数据文件
数据集分割: 每个配置均分为训练集和测试集

数据集使用

加载数据集: 使用load_dataset函数加载不同配置的训练和测试数据集

数据集架构

列信息: 包括active, args, args_len, audio_file, audio_path, class_bases, class_name, code, code_len, desc, desc_docstr, desc_docstr_len, desc_len, docstr, docstr_len, file_path, file_type, function_names, gen_bytes, gen_data_type, gen_mode, gen_size, gen_valid, height, image_file, image_path, method_names, name, num_all_bases, num_bases, num_classes, num_functions, num_imports, num_methods, prompts, raises, raises_len, recsize, repo, returns, returns_len, size, src_object, total_objects, usage, usages, width

数据集详细配置

配置名称	数据文件路径
andromeda	train/train-0001-andromeda-andromeda_torch.parquet, test/train-0002-andromeda-tests.parquet
swarms	train/train-0004-swarms-swarms.parquet, test/train-0005-swarms-tests.parquet
swarms_pytorch	train/train-0006-swarms-pytorch-swarms_torch.parquet, test/train-0007-swarms-pytorch-tests.parquet
longnet	train/train-0009-longnet-long_net.parquet, test/train-0010-longnet-tests.parquet
zeta	train/train-0011-zeta-zeta.parquet, test/train-0012-zeta-tests.parquet

5,000+

优质数据集

54 个

任务类型

进入经典数据集