five

KExercises

收藏
魔搭社区2025-12-05 更新2025-05-03 收录
下载链接:
https://modelscope.cn/datasets/JetBrains/KExercises
下载链接
链接失效反馈
官方服务:
资源简介:
# KExercises dataset This dataset is composed of a diverse set of 15k Kotlin code exercises (~3.5M total tokens) from [Code Exercises dataset](https://huggingface.co/datasets/jinaai/code_exercises) (generated by GPT-3.5) and translated to Kotlin also by GPT-3.5. The dataset exercises follow the format of the HumanEval benchmark. Each training sample is split into a Kotlin function signature with a descriptive docstring, and a solution to the exercise. For more details about dataset, please see the technical report (coming soon!). # Statistics | **Attribute** | **Value** | |:---------------------------:|:----------------------------------------:| | `Language` | Kotlin, English | | `Number of samples` | 15K | | `Number of lines` | 335K | | `Number of tokens` | 3.5M | # Disclaimer This dataset has been synthetically generated. You should make sure that your usage complies with the OpenAI Terms of Use, in so far as legally applicable. Users are cautioned that the synthetic nature of the dataset may include inaccuracies or errors that could affect the validity of research results or other uses. The dataset may be updated or modified periodically. It is the responsibility of the user to ensure they are working with the latest version of the dataset, where applicable.

# KExercises 数据集 本数据集包含15000道多样化的Kotlin代码练习题(总Token数约350万),数据源自[Code Exercises数据集](https://huggingface.co/datasets/jinaai/code_exercises)(由GPT-3.5生成),并同样由GPT-3.5翻译为Kotlin语言版本。 本数据集的练习题遵循HumanEval基准测试的格式。每个训练样本均拆分为带有描述性文档字符串(docstring)的Kotlin函数签名,以及对应练习题的解答代码。 如需了解本数据集的更多细节,请参阅即将发布的技术报告。 # 统计信息 | **属性名称** | **属性值** | |:---------------------------:|:----------------------------------------:| | `语言` | Kotlin、英语 | | `样本数量` | 15000 | | `代码行数` | 335000 | | `Token总数` | 350万 | # 免责声明 本数据集为人工合成生成。在法律允许的范围内,使用者需确保其使用行为符合OpenAI服务条款。请注意,本数据集为合成生成,可能存在不准确或错误之处,这可能会影响研究结果或其他应用场景的有效性。本数据集可能会定期更新或修改。使用者有责任确保其使用的是数据集的最新版本(如适用)。
提供机构:
maas
创建时间:
2025-04-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作