archit11/hyperswitch-token-aware-cpt
收藏Hugging Face2025-11-07 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/archit11/hyperswitch-token-aware-cpt
下载链接
链接失效反馈官方服务:
资源简介:
Hyperswitch Token-Aware CPT数据集包含了1,076个来自Hyperswitch支付路由器项目的Rust代码样本,这些样本针对Continued Pre-Training(CPT)进行了优化,并使用了Kwaipilot/KAT-Dev分词器。数据集的样本类型包括文件、模块、合并文件和整个小型的crate。样本结构中包含了唯一标识符、样本类型、代码组织级别、完整代码内容和元数据等信息。
The Hyperswitch Token-Aware CPT dataset contains 1,076 Rust code samples from the Hyperswitch payment router project, optimized for Continued Pre-Training (CPT) using the Kwaipilot/KAT-Dev tokenizer. The sample types include file, module, combined files, and entire small crates. Each sample consists of a unique identifier, sample type, level of code organization, full code content, and metadata.
提供机构:
archit11



