archit11/hyperswitch-token-aware-cpt-fixed
收藏Hugging Face2025-11-07 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/archit11/hyperswitch-token-aware-cpt-fixed
下载链接
链接失效反馈官方服务:
资源简介:
Hyperswitch Token-Aware CPT数据集包含来自Hyperswitch支付路由器项目的1076个Rust代码样本,这些样本经过优化,适用于使用Kwaipilot/KAT-Dev分词器的持续预训练(CPT)。数据集的样本包括不同大小的代码片段,从单个大文件到整个小型crate,并且每个样本都包含了路径和元数据信息。
The Hyperswitch Token-Aware CPT dataset contains 1,076 Rust code samples from the Hyperswitch payment router project, optimized for Continued Pre-Training (CPT) with the Kwaipilot/KAT-Dev tokenizer. The samples in the dataset range from single large files to entire small crates, and each sample includes path and metadata information.
提供机构:
archit11



