five

matlok/python-text-copilot-training-instruct-ai-research-2024-02-03

收藏
Hugging Face2024-02-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/matlok/python-text-copilot-training-instruct-ai-research-2024-02-03
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为2024-02-03 - python copilot instructions on how to code using alpaca and yaml,用于构建理解如何使用开源GitHub项目的编码多模态模型,特别是针对Agora开源AI研究实验室的项目。数据集包含Python代码,包括类方法或全局函数、导入的模块、基类、异常、返回值和参数等。数据集大小为2.1 GB,包含1182526行数据,涉及1258个Python仓库。数据集格式为使用alpaca和yaml响应的代码使用介绍。

该数据集名为2024-02-03 - python copilot instructions on how to code using alpaca and yaml,用于构建理解如何使用开源GitHub项目的编码多模态模型,特别是针对Agora开源AI研究实验室的项目。数据集包含Python代码,包括类方法或全局函数、导入的模块、基类、异常、返回值和参数等。数据集大小为2.1 GB,包含1182526行数据,涉及1258个Python仓库。数据集格式为使用alpaca和yaml响应的代码使用介绍。
提供机构:
matlok
原始信息汇总

数据集概述

数据集名称

2024-02-03 - python copilot instructions on how to code using alpaca and yaml

许可证

其他

数据集配置

  • andromeda
    • 分割:
      • train
      • test
    • 数据文件:
      • train: train/train-0001-andromeda-andromeda_torch.parquet
      • test: test/train-0002-andromeda-tests.parquet
  • swarms
    • 分割:
      • train
      • test
    • 数据文件:
      • train: train/train-0004-swarms-swarms.parquet
      • test: test/train-0005-swarms-tests.parquet
  • swarms_pytorch
    • 分割:
      • train
      • test
    • 数据文件:
      • train: train/train-0006-swarms-pytorch-swarms_torch.parquet
      • test: test/train-0007-swarms-pytorch-tests.parquet
  • longnet
    • 分割:
      • train
      • test
    • 数据文件:
      • train: train/train-0009-longnet-long_net.parquet
      • test: test/train-0010-longnet-tests.parquet
  • zeta
    • 分割:
      • train
      • test
    • 数据文件:
      • train: train/train-0011-zeta-zeta.parquet
      • test: test/train-0012-zeta-tests.parquet

数据集大小

1M < n < 10M

标签

  • python-copilot
  • python-coding
  • python-architecture
  • knowledge-graphs
  • multimodal
  • text-image-audio
  • fine-tuning
  • training
  • question-answering
  • image-knowledge-graph
  • alpaca
  • mp3
  • png
  • text
  • instruct
  • coding
  • task
  • prompt
  • response
  • yaml

支持的任务类别

  • text-generation
  • question-answering

支持的任务ID

  • parsing

数据集详细信息

  • 行数: 1182526
  • 大小: 2.1 GB
  • 数据类型: instruct
  • 格式: 使用alpaca和yaml响应的代码使用介绍
  • Python仓库数量: 1258

数据集加载示例

  • 加载Andromeda训练/测试集 python from datasets import load_dataset ds = load_dataset("matlok/python-text-copilot-training-instruct-ai-research-2024-02-03", "andromeda", verification_mode="no_checks")

  • 加载Swarms训练/测试集 python from datasets import load_dataset ds = load_dataset("matlok/python-text-copilot-training-instruct-ai-research-2024-02-03", "swarms", verification_mode="no_checks")

  • 加载Swarms Pytorch训练/测试集 python from datasets import load_dataset ds = load_dataset("matlok/python-text-copilot-training-instruct-ai-research-2024-02-03", "swarms_pytorch", verification_mode="no_checks")

  • 加载LongNet训练/测试集 python from datasets import load_dataset ds = load_dataset("matlok/python-text-copilot-training-instruct-ai-research-2024-02-03", "longnet", verification_mode="no_checks")

  • 加载Zeta训练/测试集 python from datasets import load_dataset ds = load_dataset("matlok/python-text-copilot-training-instruct-ai-research-2024-02-03", "zeta", verification_mode="no_checks")

数据集模式

  • desc 列包含指令alpaca文本和yaml响应 json { "active": "bool", "args": "string", "args_len": "float64", "audio_file": "string", "audio_path": "string", "class_bases": "string", "class_name": "string", "code": "string", "code_len": "float64", "desc": "string", "desc_docstr": "string", "desc_docstr_len": "float64", "desc_len": "int64", "docstr": "string", "docstr_len": "int64", "file_path": "string", "file_type": "string", "function_names": "string", "gen_bytes": "int64", "gen_data_type": "string", "gen_mode": "string", "gen_size": "int64", "gen_valid": "bool", "height": "int64", "image_file": "string", "image_path": "string", "method_names": "string", "name": "string", "num_all_bases": "int64", "num_bases": "int64", "num_classes": "int64", "num_functions": "float64", "num_imports": "int64", "num_methods": "float64", "prompts": "string", "raises": "string", "raises_len": "float64", "recsize": "int64", "repo": "string", "returns": "string", "returns_len": "float64", "size": "int64", "src_object": "string", "total_objects": "int64", "usage": "string", "usages": "string", "width": "int64" }
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作