cass
收藏魔搭社区2025-12-05 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/MBZUAI/cass
下载链接
链接失效反馈官方服务:
资源简介:
# 💻 CASS: CUDA–AMD Assembly and Source Mapping
[CASS](https://huggingface.co/datasets/MBZUAI/CASS) is the **first large-scale dataset** for cross-architecture GPU transpilation, providing semantically aligned CUDA–HIP source pairs and their corresponding host/device assemblies for **NVIDIA (SASS)** and **AMD (RDNA3)** platforms. It enables research in:
* 🔁 Source-to-source translation (CUDA ↔ HIP)
* ⚙️ Assembly-level translation (SASS ↔ RDNA3)
* 🧠 LLM-guided GPU code transpilation
---
## 📚 Dataset Structure
Each sample contains the following fields:
| Field | Description |
| ------------- | ------------------------------------------ |
| `filename` | Sample ID or file name |
| `cuda_source` | Original CUDA source code |
| `cuda_host` | Compiled x86 host-side assembly from CUDA |
| `cuda_device` | Compiled SASS (Nvidia GPU) device assembly |
| `hip_source` | Transpiled HIP source code (via HIPIFY) |
| `hip_host` | Compiled x86 host-side assembly from HIP |
| `hip_device` | Compiled RDNA3 (AMD GPU) device assembly |
---
## 🔀 Dataset Splits
| Split | Description | # Examples |
| ------- | ----------------------------------------- | ---------- |
| `train` | Union of `synth`, `stack`, and `opencl` | 70,694 |
| `synth` | LLM-synthesized CUDA programs | 40,591 |
| `stack` | Scraped and filtered CUDA from StackV2 | 24,170 |
| `bench` | 40 curated eval tasks from 16 GPU domains | 40 |
---
## 📦 How to Load
```python
from datasets import load_dataset
# 🧠 Load the full dataset (default config with all splits)
cass = load_dataset("MBZUAI/cass", name="default")
# Access a specific split
train_data = cass["train"] # train = stack + synth + opencl
stack_data = cass["stack"]
synth_data = cass["synth"]
bench_data = cass["bench"]
```
---
## 📈 Benchmark and Evaluation
The `bench` split includes 40 samples across 16 domains like:
* 🧪 Physics Simulation
* 📊 Data Structures
* 📸 Image Processing
* 🧮 Linear Algebra
All samples have been manually verified for semantic equivalence across CUDA and HIP and come with executable device/host binaries.
---
## 📄 License
Released under the **MIT license**.
---
## 🔗 Useful Links
* 🤗 Hugging Face Collection: [CASS on Hugging Face](https://huggingface.co/collections/MBZUAI/cass-6825b5bf7414503cf16f87b2)
* 📂 Code & Tools: [GitHub Repository](https://github.com/GustavoStahl/CASS)
* Paper: [Arxiv CASS](https://arxiv.org/abs/2505.16968)
# 💻 CASS:CUDA-AMD汇编与源码映射
[CASS](https://huggingface.co/datasets/MBZUAI/CASS) 是首个大规模跨架构GPU代码转译数据集,提供语义对齐的CUDA-HIP源码对,以及对应NVIDIA(SASS)和AMD(RDNA3)平台的宿主端与设备端汇编代码。该数据集可支撑以下方向的研究:
* 🔁 源码到源码转译(CUDA ↔ HIP)
* ⚙️ 汇编级转译(SASS ↔ RDNA3)
* 🧠 大语言模型(LLM/Large Language Model)引导的GPU代码转译
---
## 📚 数据集结构
每个样本包含以下字段:
| 字段名 | 描述 |
| ------------- | ------------------------------------------ |
| `filename` | 样本ID或文件名 |
| `cuda_source` | 原始CUDA源码 |
| `cuda_host` | 由CUDA编译得到的x86宿主端汇编代码 |
| `cuda_device` | 编译得到的NVIDIA GPU设备端SASS汇编代码 |
| `hip_source` | 通过HIPIFY转译得到的HIP源码 |
| `hip_host` | 由HIP编译得到的x86宿主端汇编代码 |
| `hip_device` | 编译得到的AMD GPU设备端RDNA3汇编代码 |
---
## 🔀 数据集划分
| 划分集 | 描述 | 样本数量 |
| ------- | ----------------------------------------- | ---------- |
| `train` | `synth`、`stack`与`opencl`的并集 | 70,694 |
| `synth` | 由大语言模型生成的CUDA程序 | 40,591 |
| `stack` | 从StackV2爬取并过滤得到的CUDA代码 | 24,170 |
| `bench` | 涵盖16个GPU领域的40个精选评估任务 | 40 |
---
## 📦 加载方式
python
from datasets import load_dataset
# 🧠 加载完整数据集(默认配置包含所有划分集)
cass = load_dataset("MBZUAI/cass", name="default")
# 访问特定划分集
train_data = cass["train"] # 训练集 = stack + synth + opencl
stack_data = cass["stack"]
synth_data = cass["synth"]
bench_data = cass["bench"]
---
## 📈 基准测试与评估
`bench`划分集包含覆盖16个领域的40个样本,例如:
* 🧪 物理仿真
* 📊 数据结构
* 📸 图像处理
* 🧮 线性代数
所有样本均经过人工验证,确保CUDA与HIP代码语义等价,并附带可执行的设备端与宿主端二进制文件。
---
## 📄 许可证
采用**MIT许可证**发布。
---
## 🔗 实用链接
* 🤗 Hugging Face 数据集集合:[CASS 于 Hugging Face](https://huggingface.co/collections/MBZUAI/cass-6825b5bf7414503cf16f87b2)
* 📂 代码与工具:[GitHub 仓库](https://github.com/GustavoStahl/CASS)
* 论文:[Arxiv 预印本 CASS](https://arxiv.org/abs/2505.16968)
提供机构:
maas
创建时间:
2025-05-16



