five

MugiLab/reason_at_code

收藏
Hugging Face2025-01-21 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/MugiLab/reason_at_code
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: system dtype: string - name: conversations list: - name: from dtype: string - name: value dtype: string splits: - name: train num_bytes: 103183504 num_examples: 4746 download_size: 39331254 dataset_size: 103183504 configs: - config_name: default data_files: - split: train path: data/train-* license: mit task_categories: - text-generation language: - en tags: - code size_categories: - 1K<n<10K --- # Reasoning Dataset for Code This repository contains a curated reasoning dataset specifically designed for coding-related problems, particularly in Python. The dataset was created by filtering non-code problems from the original NovaSky-AI/Sky-T1_data_17k dataset. The goal of this dataset is to facilitate fine-tuning models for reasoning tasks related to code understanding, problem-solving, and logical deduction in programming. ## Overview The dataset emphasizes high-quality, diverse problems requiring reasoning in coding contexts, with a particular focus on Python. It can be used to fine-tune and evaluate models for: - Code comprehension - Logical reasoning in coding scenarios - Problem-solving strategies - Debugging and code correction tasks ## Key Features - **Source**: Filtered from NovaSky-AI/Sky-T1_data_17k. - **Language**: Python-focused coding problems. - **Scope**: A broad range of problem-solving contexts, including algorithmic challenges, data structure manipulations, and logical puzzles. ## Dataset Structure The dataset is organized into the following format: - **`system`**: system prompt for potential model. - **`conversations`**: a list of conversations in alpine format. ## Contribution We welcome contributions to enhance this dataset, including: - Adding new coding problems - Improving existing solutions - Expanding to other programming languages Feel free to open an issue or submit a pull request! ## License This dataset is licensed under [MIT License](LICENSE). Please ensure proper attribution when using this dataset in your projects. ## Acknowledgments - Original dataset: [NovaSky-AI/Sky-T1_data_17k](https://huggingface.co/datasets/NovaSky-AI/Sky-T1_data_17k) ## Contact For questions or feedback open an issue in the repository.
提供机构:
MugiLab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作