COST
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/shi-labs/cost
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个庞大而全面的数据集,旨在评估程序翻译方法的性能。它包含了来自7种编程语言的代码片段级别和程序级别的平行数据。此外,该数据集确保了贡献者遵循关于注释和相应代码的模板,提供了大量多语言实例,以便在翻译任务中有效使用。数据集的规模涵盖了7种编程语言,并包含了多达42种编程语言对的任务,专注于多语言程序翻译。
This dataset is a large-scale and comprehensive resource designed to evaluate the performance of program translation methodologies. It contains parallel data at both code snippet level and program level from 7 programming languages. Furthermore, the dataset ensures that contributors adhere to templates for comments and their associated code, offering a substantial number of multilingual instances for effective utilization in translation tasks. Spanning 7 programming languages, the dataset encompasses tasks for up to 42 programming language pairs, with a core focus on multilingual program translation.
提供机构:
GeeksForGeeks



