RoCode
收藏arXiv2024-02-21 更新2024-07-23 收录
下载链接:
https://huggingface.co/datasets/cosmadrian/rocode
下载链接
链接失效反馈官方服务:
资源简介:
RoCode是一个专为评估罗马尼亚语自然语言处理模型代码智能而设计的数据集,包含2642个罗马尼亚语编写的编程问题,以及11000个C、C++和Python的解决方案。该数据集通过与罗马尼亚最受欢迎的编程竞赛平台infoarena.ro合作创建,旨在填补非英语语言模型在代码生成评估方面的空白。RoCode不仅提供了一个基准,用于评估基于罗马尼亚语/多语言文本训练的语言模型,还为预训练的罗马尼亚语模型提供了一个微调集。数据集的应用领域包括促进非英语语言模型的代码生成能力,以及支持多语言编程环境的开发。
RoCode is a purpose-built dataset for evaluating the code intelligence of natural language processing (NLP) models tailored for the Romanian language. It contains 2,642 programming problems written in Romanian, alongside 11,000 solution codes implemented in C, C++, and Python. Developed in collaboration with infoarena.ro, the most popular programming competition platform in Romania, this dataset aims to fill the existing gap in code generation evaluation for non-English language models. RoCode not only serves as a benchmark for evaluating language models trained on Romanian or multilingual text corpora, but also provides a fine-tuning dataset for pre-trained Romanian language models. Its application scenarios include facilitating the code generation capabilities of non-English language models and supporting the development of multilingual programming environments.
提供机构:
布加勒斯特理工大学
创建时间:
2024-02-21
原始信息汇总
RoCode: A Dataset for Measuring Code Intelligence from Problem Definitions in Romanian
数据集概述
RoCode是一个用于评估和微调大型语言模型的编程难题数据集,包含2,642个用罗马尼亚语编写的问题,以及11,000个用C、C++和Python编写的解决方案和全面的测试套件。该数据集旨在评估以罗马尼亚语/多语言文本训练的语言模型的代码智能,并为预训练的罗马尼亚语模型提供微调数据集。
数据集详情
数据集描述
- 创建者: Adrian Cosma, Bogdan Iordache, Paolo Rosso
- 语言: 罗马尼亚语, C++, Python
数据集来源
- 仓库: https://github.com/cosmaadrian/rocode
- 论文: https://arxiv.org/abs/2402.13222
用途
用于微调和评估解决罗马尼亚语编程难题的大型语言模型。
引用
@misc{cosma2024rocode, title={RoCode: A Dataset for Measuring Code Intelligence from Problem Definitions in Romanian}, author={Adrian Cosma and Bogdan Iordache and Paolo Rosso}, year={2024}, eprint={2402.13222}, archivePrefix={arXiv}, primaryClass={cs.CL} }
数据集卡片联系人
如有任何信息需求,请联系Adrian Cosma (cosma.i.adrian@gmail.com)



