edteams/ref-ops-v2

Name: edteams/ref-ops-v2
Creator: edteams
Published: 2024-12-17 11:06:17
License: 暂无描述

Hugging Face2024-12-17 更新2025-11-01 收录

下载链接：

https://hf-mirror.com/datasets/edteams/ref-ops-v2

下载链接

链接失效反馈

官方服务：

资源简介：

# Reference Operations V2 ## Dot Product - MXINT8 dot product ```python # sum(a .* b), dot product # a.shape: [m] # b.shape: [m] # output.shape: scalar output = dot_product(a, b) ``` - MXINT8 vector-matrix multiplication ```python # vector @ matrix, vector-matrix multiplication # vector.shape: [m] # matrix.shape: [m, k] # output.shape: [k] output = gevm(vector, matrix) ``` - MXINT8 matrix-matrix multiplication ```python # input @ other, matrix-matrix multiplication # input.shape: [m, k] # other.shape: [k, n] # output.shape: [m, n] output = gemm(input, other) ``` ## RoPE constants ### BF16 Dumped tensors: `rope.freq_bf16`: - shape: `[1, 4096, 64]`, 4096 denotes maximum sequence length, 64 is half of the head dimension (128/2=64) - each element is $m \theta_i$ where $m$ is the position id, $i$ is the head dimension index. `rope.cos_bf16`: `cos_bf16 = cos(freq_bf16)` `rope.sin_bf16`: `sin_bf16 = sin(freq_bf16)` ### FP32 for reference HuggingFace forces the RoPE constants to be in FP32 format because both BF16 and FP16 introduce errors in the output cos and sin values. For example, BF16 cannot represent the exact value of 4088, 4089, ..., 4095 (all these values are rounded to 4096). [This issue](https://github.com/huggingface/transformers/pull/29285) discusses this problem. Therefore, we also provide the FP32 version of the RoPE constants for reference. This assumes all the constants are computed in FP32 format. - `rope.freq_fp32` - `rope.cos_fp32` - `rope.sin_fp32` ## Softmax `softmax_v` is a function that computes the `matmul(softmax(qk_T), v)` operation in the attention layer. To match the HW behaviour, `softmax_v` is broken down into the following steps: ```python def softmaxed_v(qk_T, v): """ qk_T.shape: [bs, num_heads, seq_q_len, seq_kv_len] v.shape: [bs, num_heads, seq_kv_len, head_dim] """ # elementwise exp exp_bf16 = row_exp(qk_T, dim=-1) # shape: [bs, num_heads, seq_q_len, seq_kv_len] # sum exp along the head dimension exp_sum = reduce_sum(exp_bf16, dim=-1) # shape: [bs, num_heads, seq_q_len, 1] # quantize to mxint8 for vector-matrix multiplication exp_mxint8 = quantize(exp_bf16) # vector-matrix multiplication between two mxint8 tensors scaled_v = mxint8_gevm(exp_mxint8, v) # shape: [bs, num_heads, seq_q_len, head_dim] # inverse of the sum inv = 1.0 / exp_sum # shape: [bs, num_heads, seq_q_len, 1] # adjust scaled_v by the inverse out = scaled_v * inv # shape: [bs, num_heads, seq_q_len, head_dim] return out ``` | Notation | Description | | --- | --- | | `bs` | batch size, 1 in this case | | `num_heads` | number of attention heads, 32 for llama-2-7b | | `seq_q_len` | query sequence length, 1 in this case | | `seq_kv_len` | key-value sequence length, which increments by 1 for each decoding step | | `head_dim` | head dimension, 128 for llama-2-7b | The folder name takes the form `kv-size-<seq_kv_len>_seed-<seed>`. For example, `kv-size-4_seed-0` means the key-value sequence length is 4 and the random seed is 0. The following tensors are dumped: | File Name | Description | | --- | --- | | `softmaxed_v.qk_T` | BF16 `qk_T` tensor | | `softmaxed_v.exp-bf16` | BF16 `exp_bf16` tensor | | `softmaxed_v.exp_sum` | BF16 `exp_sum` tensor | | `softmaxed_v.v` | MXINT8 `v` tensor | | `softmaxed_v.exp_mxint8` | MXINT8 `exp_mxint8` tensor | | `softmaxed_v.scaled_v` | BF16 `scaled_v` tensor | | `softmaxed_v.inv` | BF16 `inv` tensor | | `softmaxed_v.out` | BF16 `out` tensor |

提供机构：

edteams

5,000+

优质数据集

54 个

任务类型

进入经典数据集