five

backendbench_tests

收藏
魔搭社区2025-12-05 更新2025-09-06 收录
下载链接:
https://modelscope.cn/datasets/GPUMODE/backendbench_tests
下载链接
链接失效反馈
官方服务:
资源简介:
# TorchBench The TorchBench suite of [BackendBench](https://github.com/meta-pytorch/BackendBench) is designed to mimic real-world use cases. It provides operators and inputs derived from 155 model traces found in [TIMM](https://huggingface.co/timm) (67), [Hugging Face Transformers](https://huggingface.co/docs/transformers/en/index) (45), and [TorchBench](https://github.com/pytorch/benchmark) (43). (These are also the models PyTorch developers use to [validate performance](https://hud.pytorch.org/benchmark/compilers).) You can view the origin of these traces by switching the subset in the dataset viewer to `ops_traces_models` and `torchbench` for the full dataset. When running BackendBench, much of the extra information about what you are testing is abstracted away, so you can simply run `uv run python --suite torchbench ...`. Here, however, we provide the test suite as a dataset that can be explored directly. It includes details about why certain operations and arguments were included or excluded, reflecting the careful consideration behind curating the set. You can download the dataset in either format: - `backend_bench_problems.parquet` (default format on Hugging Face) - `backend_bench_problems.json` (more human-readable) ### Fields - **uuid** – Unique identifier for the `(op_name, args)` pair. - **op_name** – Full name of the operator being tested. - **args** – Serialized form of the inputs from the trace. [See details below](#serialized-arguments-in-backendbench). - **runnable** – Whether the operator is runnable in BackendBench (some are not yet supported). - **included_in_benchmark** – Whether this `(op_name, args)` pair is tested in the TorchBench suite. - **why_excluded** – If not included, a list of reasons for exclusion (e.g., "BackendBench does not support correctness testing for random ops yet," "BackendBench does not support correctness testing for tensor creation and manipulation ops yet"). - **is_synthetic** – Marks synthetically generated inputs (e.g., very large tensors). These are currently excluded from the benchmark. - **runtime_ms** – Execution time (ms) on our hardware (single GPU from a machine with 8× H100s and an AMD EPYC 9654 96-core processor). - **relative_runtime_to_kernel_launch** – `runtime_ms` divided by the runtime of a dummy CUDA op (`torch.empty(0, device=cuda)`), representing launch overhead. - **is_overhead_dominated_op** – Flags operator/argument pairs running close to CUDA overhead as “performance canaries.” [Histogram analysis](https://github.com/meta-pytorch/BackendBench/issues/108) showed that a 1.3× threshold above CUDA overhead is a useful cutoff. These tests can be run for sanity-checking kernels with `uv run python --suite torchbench --check-overhead-dominated-ops ...`. - **count** – Number of times this operator/input pair appeared in model traces. - **in_models** – List of models (from real-world traces) where this operator/input pair appears. - **in_models_count** – Number of distinct models in which this operator/input pair occurs. # Serialized Arguments in BackendBench Generally, arguments are serialized by storing tensor shapes and preserving everything else as it's fairly intuitive. For example: `((T([8, 8, 8, 8, 8], f16), T([8, 8, 8, 8, 8], f16)), {})` Below we'll go into detail about the format for rigor. ## Format BackendBench stores function arguments as strings with all parameters needed to reproduce PyTorch operations: ```python ((arg1, arg2, ...), {'key1': val1, 'key2': val2}) ``` ```python (([T([5, 5], f32), T([3, 3], i64), 42],), {'weight': T([3, 3], f32)}) ``` ## Tensor Representation Tensors use the format `T([shape], dtype)` or `T([shape], dtype, [stride])`: ```python T([10, 20], f32) # 10×20 float32 tensor T([1, 512, 768], f16) # 1×512×768 float16 tensor T([64], i32) # 64-element int32 vector ``` **Data types**: `f16/f32/f64` (float), `bf16` (bfloat16), `i32/i64` (int), `b8` (bool) ## Examples **Single tensor argument:** ```python ((T([48, 24, 28, 28], f16),), {}) ``` 48×24×28×28 float16 tensor, no keyword arguments **Multiple tensors:** ```python ((T([8, 8, 8, 8, 8], f16), T([8, 8, 8, 8, 8], f16)), {}) ``` Two 5D tensors of identical shapes **Mixed arguments:** ```python ((T([128, 256], f16), [1024, 249, 249]), {'dtype': torch.float16, 'device': 'cuda'}) ``` Args are a tensor, list, and keyword arguments **Complex nested:** ```python (([T([5, 5], f32), T([3, 3], i64), 42],), {'weight': T([3, 3], f32)}) ``` List containing tensors and numbers, plus tensor keyword argument ## Argument Types - **Tensors**: `T([shape], dtype)` - **Lists**: `[item1, item2, ...]` (can contain tensors) - **Primitives**: `42`, `'hello'`, `True`, `None` - **PyTorch objects**: `torch.float16`, `torch.strided` # Trace Files in BackendBench This repository includes `.txt` trace files, which were the original output format of model traces and are used to compose the dataset. Here’s their structure: ## Format Trace files capture PyTorch operations and arguments from real model executions: ``` Operator: operation_name cnt: count, serialized_arguments cnt: count, serialized_arguments ... ``` ## Structure **Operator line**: Specifies the PyTorch operation ``` Operator: aten.add.Tensor Operator: aten.relu.default Operator: aten.linear.default ``` **Count lines**: Show how often each argument combination was used ``` cnt: 42, ((T([10, 20], f16), T([10, 20], f16)), {}) cnt: 0, ((T([5, 5], f32), T([5, 5], f32)), {}) ``` ## Reading Count Lines - **Count `42`**: Argument combination appeared 42 times in traced models - **`cnt: 0`** = Synthetic/generated arguments (not from real models) - **`cnt: >0`** = Real usage frequency from model traces **Arguments**: Same format as serialized arguments – `((args), {kwargs})` ## Example ``` Operator: aten.add.Tensor cnt: 156, ((T([1, 512, 768], f16), T([1, 512, 768], f16)), {}) cnt: 89, ((T([32, 128], f32), T([32, 128], f32)), {}) cnt: 0, ((T([10, 10], f16), T([10, 10], f16)), {}) Operator: aten.relu.default cnt: 234, ((T([64, 256], f16),), {}) ``` This shows: - `aten.add.Tensor` called 156 times with 1×512×768 tensors - Same operation called 89 times with 32×128 tensors - One synthetic test case (`cnt: 0`) - `aten.relu.default` called 234 times with a 64×256 tensor **Note: Traces may be deprecated in the future, but are described here as they are currently included in the dataset/codebase.** # Acknowledgements We are extremely grateful to the [TritonBench](https://github.com/pytorch-labs/tritonbench/tree/main) team for these traces and their intuitive format.

# TorchBench 本TorchBench套件包含的BackendBench(https://github.com/meta-pytorch/BackendBench)旨在模拟真实世界的应用场景。该套件提供的算子(Operator)与输入源自155份模型轨迹,这些轨迹分别来自TIMM(https://huggingface.co/timm,67份)、Hugging Face Transformers(https://huggingface.co/docs/transformers/en/index,45份)以及TorchBench(https://github.com/pytorch/benchmark,43份)。(这些模型同时也是PyTorch开发者用于验证性能(https://hud.pytorch.org/benchmark/compilers)的基准模型。)你可以通过将数据集查看器中的子集切换为`ops_traces_models`与`torchbench`,以查看完整数据集的轨迹来源。 在运行BackendBench时,大部分与测试对象相关的额外信息会被抽象化,因此你仅需执行`uv run python --suite torchbench ...`即可。而在此处,我们将该测试套件以可直接探索的数据集形式推出,其中收录了特定算子与输入参数被纳入或排除的详细缘由,充分体现了该数据集构建过程中的审慎考量。 你可以通过以下两种格式下载该数据集: - `backend_bench_problems.parquet`(Hugging Face平台的默认格式) - `backend_bench_problems.json`(可读性更强的格式) ### 字段说明 - **uuid**:`(op_name, args)`组合的唯一标识符。 - **op_name**:当前测试算子的完整名称。 - **args**:来自模型轨迹的输入参数的序列化形式。[详见下文](#serialized-arguments-in-backendbench)。 - **runnable**:标识该算子是否可在BackendBench中运行(部分算子暂未获得支持)。 - **included_in_benchmark**:标识该`(op_name, args)`组合是否已纳入TorchBench套件的测试范围。 - **why_excluded**:若该组合未被纳入测试,则列出被排除的原因(例如:"BackendBench暂不支持随机算子的正确性测试"、"BackendBench暂不支持张量创建与操作算子的正确性测试")。 - **is_synthetic**:标识输入参数是否为人工合成生成(例如超大张量)。此类输入当前暂未纳入基准测试。 - **runtime_ms**:在我们的硬件环境下的执行时长(单位:毫秒),该硬件环境搭载8张H100 GPU与1颗AMD EPYC 9654 96核处理器,测试使用其中单张GPU。 - **relative_runtime_to_kernel_launch**:`runtime_ms`与虚拟CUDA算子(`torch.empty(0, device=cuda)`)的执行时长的比值,用于表征内核启动开销。 - **is_overhead_dominated_op**:将运行时长受内核启动开销主导(即接近CUDA启动开销)的算子-输入组合标记为「性能哨兵」。[直方图分析](https://github.com/meta-pytorch/BackendBench/issues/108)证实,以超出CUDA启动开销1.3倍作为判定阈值具备良好实用性。你可通过执行`uv run python --suite torchbench --check-overhead-dominated-ops ...`运行此类测试,以完成内核的合理性校验。 - **count**:该算子-输入组合在模型轨迹中出现的总次数。 - **in_models**:该算子-输入组合所出现的真实模型轨迹对应的模型列表。 - **in_models_count**:该算子-输入组合出现过的不同模型的数量。 # BackendBench中的序列化参数 通常而言,参数序列化会存储张量(Tensor)形状并保留其余所有内容,该方式具备较强的直观性。示例如下: `((T([8, 8, 8, 8, 8], f16), T([8, 8, 8, 8, 8], f16)), {})` 下文将详细阐述该格式的规范,以确保严谨性。 ## 格式规范 BackendBench将函数参数以字符串形式存储,其中包含复现PyTorch算子所需的全部参数: python ((arg1, arg2, ...), {'key1': val1, 'key2': val2}) python (([T([5, 5], f32), T([3, 3], i64), 42],), {'weight': T([3, 3], f32)}) ## 张量表示形式 张量采用`T([shape], dtype)`或`T([shape], dtype, [stride])`的格式: python T([10, 20], f32) # 10×20 float32张量 T([1, 512, 768], f16) # 1×512×768 float16张量 T([64], i32) # 64元素int32向量 **数据类型**:`f16/f32/f64`(浮点型)、`bf16`(bfloat16型)、`i32/i64`(整型)、`b8`(布尔型) ## 示例 **单张量参数:** python ((T([48, 24, 28, 28], f16),), {}) 即48×24×28×28的float16张量,无关键字参数(keyword arguments)。 **多张量参数:** python ((T([8, 8, 8, 8, 8], f16), T([8, 8, 8, 8, 8], f16)), {}) 即两个形状完全一致的5维张量。 **混合参数:** python ((T([128, 256], f16), [1024, 249, 249]), {'dtype': torch.float16, 'device': 'cuda'}) 即参数包含一个张量、一个列表与若干关键字参数。 **复杂嵌套参数:** python (([T([5, 5], f32), T([3, 3], i64), 42],), {'weight': T([3, 3], f32)}) 即参数为一个包含张量与数值的列表,外加一个张量类型的关键字参数。 ## 参数类型 - **张量(Tensor)**:`T([shape], dtype)` - **列表**:`[item1, item2, ...]`(可包含张量) - **原生数据类型**:`42`、`'hello'`、`True`、`None` - **PyTorch对象**:`torch.float16`、`torch.strided` # BackendBench中的轨迹文件 本仓库包含`.txt`格式的轨迹文件,此类文件是模型轨迹的原始输出格式,用于构建本数据集。其结构如下: ## 格式规范 轨迹文件会记录真实模型执行过程中的PyTorch算子与参数: Operator: operation_name cnt: count, serialized_arguments cnt: count, serialized_arguments ... ## 结构说明 **算子行**:用于指定PyTorch算子 Operator: aten.add.Tensor Operator: aten.relu.default Operator: aten.linear.default **计数行**:用于展示每组参数组合的出现频次 cnt: 42, ((T([10, 20], f16), T([10, 20], f16)), {}) cnt: 0, ((T([5, 5], f32), T([5, 5], f32)), {}) ## 计数行解读 - **计数为`42`**:该参数组合在模型轨迹中出现了42次 - **`cnt: 0`**:代表参数为人工合成/生成(并非来自真实模型) - **`cnt: >0`**:代表该参数组合在真实模型轨迹中的实际使用频次 **参数格式**:与序列化参数格式一致,即`((args), {kwargs})` ## 示例 Operator: aten.add.Tensor cnt: 156, ((T([1, 512, 768], f16), T([1, 512, 768], f16)), {}) cnt: 89, ((T([32, 128], f32), T([32, 128], f32)), {}) cnt: 0, ((T([10, 10], f16), T([10, 10], f16)), {}) Operator: aten.relu.default cnt: 234, ((T([64, 256], f16),), {}) 该示例展示了以下内容: - `aten.add.Tensor`算子在使用1×512×768张量时被调用了156次 - 同一算子在使用32×128张量时被调用了89次 - 包含1个合成测试用例(`cnt: 0`) - `aten.relu.default`算子在使用64×256张量时被调用了234次 **注意**:轨迹文件未来可能会被弃用,但由于当前仍包含于数据集与代码库中,故在此进行说明。 # 致谢 我们由衷感谢[TritonBench](https://github.com/pytorch-labs/tritonbench/tree/main)团队提供的轨迹文件及其直观的格式设计。
提供机构:
maas
创建时间:
2025-08-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作