five

cass

收藏
魔搭社区2025-12-05 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/MBZUAI/cass
下载链接
链接失效反馈
官方服务:
资源简介:
# 💻 CASS: CUDA–AMD Assembly and Source Mapping [CASS](https://huggingface.co/datasets/MBZUAI/CASS) is the **first large-scale dataset** for cross-architecture GPU transpilation, providing semantically aligned CUDA–HIP source pairs and their corresponding host/device assemblies for **NVIDIA (SASS)** and **AMD (RDNA3)** platforms. It enables research in: * 🔁 Source-to-source translation (CUDA ↔ HIP) * ⚙️ Assembly-level translation (SASS ↔ RDNA3) * 🧠 LLM-guided GPU code transpilation --- ## 📚 Dataset Structure Each sample contains the following fields: | Field | Description | | ------------- | ------------------------------------------ | | `filename` | Sample ID or file name | | `cuda_source` | Original CUDA source code | | `cuda_host` | Compiled x86 host-side assembly from CUDA | | `cuda_device` | Compiled SASS (Nvidia GPU) device assembly | | `hip_source` | Transpiled HIP source code (via HIPIFY) | | `hip_host` | Compiled x86 host-side assembly from HIP | | `hip_device` | Compiled RDNA3 (AMD GPU) device assembly | --- ## 🔀 Dataset Splits | Split | Description | # Examples | | ------- | ----------------------------------------- | ---------- | | `train` | Union of `synth`, `stack`, and `opencl` | 70,694 | | `synth` | LLM-synthesized CUDA programs | 40,591 | | `stack` | Scraped and filtered CUDA from StackV2 | 24,170 | | `bench` | 40 curated eval tasks from 16 GPU domains | 40 | --- ## 📦 How to Load ```python from datasets import load_dataset # 🧠 Load the full dataset (default config with all splits) cass = load_dataset("MBZUAI/cass", name="default") # Access a specific split train_data = cass["train"] # train = stack + synth + opencl stack_data = cass["stack"] synth_data = cass["synth"] bench_data = cass["bench"] ``` --- ## 📈 Benchmark and Evaluation The `bench` split includes 40 samples across 16 domains like: * 🧪 Physics Simulation * 📊 Data Structures * 📸 Image Processing * 🧮 Linear Algebra All samples have been manually verified for semantic equivalence across CUDA and HIP and come with executable device/host binaries. --- ## 📄 License Released under the **MIT license**. --- ## 🔗 Useful Links * 🤗 Hugging Face Collection: [CASS on Hugging Face](https://huggingface.co/collections/MBZUAI/cass-6825b5bf7414503cf16f87b2) * 📂 Code & Tools: [GitHub Repository](https://github.com/GustavoStahl/CASS) * Paper: [Arxiv CASS](https://arxiv.org/abs/2505.16968)

# 💻 CASS:CUDA-AMD汇编与源码映射 [CASS](https://huggingface.co/datasets/MBZUAI/CASS) 是首个大规模跨架构GPU代码转译数据集,提供语义对齐的CUDA-HIP源码对,以及对应NVIDIA(SASS)和AMD(RDNA3)平台的宿主端与设备端汇编代码。该数据集可支撑以下方向的研究: * 🔁 源码到源码转译(CUDA ↔ HIP) * ⚙️ 汇编级转译(SASS ↔ RDNA3) * 🧠 大语言模型(LLM/Large Language Model)引导的GPU代码转译 --- ## 📚 数据集结构 每个样本包含以下字段: | 字段名 | 描述 | | ------------- | ------------------------------------------ | | `filename` | 样本ID或文件名 | | `cuda_source` | 原始CUDA源码 | | `cuda_host` | 由CUDA编译得到的x86宿主端汇编代码 | | `cuda_device` | 编译得到的NVIDIA GPU设备端SASS汇编代码 | | `hip_source` | 通过HIPIFY转译得到的HIP源码 | | `hip_host` | 由HIP编译得到的x86宿主端汇编代码 | | `hip_device` | 编译得到的AMD GPU设备端RDNA3汇编代码 | --- ## 🔀 数据集划分 | 划分集 | 描述 | 样本数量 | | ------- | ----------------------------------------- | ---------- | | `train` | `synth`、`stack`与`opencl`的并集 | 70,694 | | `synth` | 由大语言模型生成的CUDA程序 | 40,591 | | `stack` | 从StackV2爬取并过滤得到的CUDA代码 | 24,170 | | `bench` | 涵盖16个GPU领域的40个精选评估任务 | 40 | --- ## 📦 加载方式 python from datasets import load_dataset # 🧠 加载完整数据集(默认配置包含所有划分集) cass = load_dataset("MBZUAI/cass", name="default") # 访问特定划分集 train_data = cass["train"] # 训练集 = stack + synth + opencl stack_data = cass["stack"] synth_data = cass["synth"] bench_data = cass["bench"] --- ## 📈 基准测试与评估 `bench`划分集包含覆盖16个领域的40个样本,例如: * 🧪 物理仿真 * 📊 数据结构 * 📸 图像处理 * 🧮 线性代数 所有样本均经过人工验证,确保CUDA与HIP代码语义等价,并附带可执行的设备端与宿主端二进制文件。 --- ## 📄 许可证 采用**MIT许可证**发布。 --- ## 🔗 实用链接 * 🤗 Hugging Face 数据集集合:[CASS 于 Hugging Face](https://huggingface.co/collections/MBZUAI/cass-6825b5bf7414503cf16f87b2) * 📂 代码与工具:[GitHub 仓库](https://github.com/GustavoStahl/CASS) * 论文:[Arxiv 预印本 CASS](https://arxiv.org/abs/2505.16968)
提供机构:
maas
创建时间:
2025-05-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作