GSMA/3GPP
收藏Hugging Face2026-04-15 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/GSMA/3GPP
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
license_name: 3gpp
license_link: https://www.3gpp.org/
language:
- en
tags:
- telecommunications
- 3gpp
- 5g
- 4g
- lte
- nr
- standards
size_categories:
- 10K<n<100K
pretty_name: 3GPP Specifications (Rel-8 to Rel-20)
---
# 3GPP Specifications — GSMA Mirror
Two-folder mirror of the 3GPP Technical Specification corpus, Releases 8
through 20, plus the full Datalab-converted Markdown rendering used for
downstream training and retrieval.
## Layout
```
GSMA/3GPP/
├── original/ raw sources, as downloaded from 3gpp.org
│ └── Rel-{8..20}/
│ └── {NN}_series/
│ ├── *.docx
│ ├── *.doc (legacy, mostly Rel-18 41-55 series)
│ └── *.zip (Rel-20 paired packages)
└── marked/ Datalab-converted Markdown + inline images
└── Rel-{8..19}/
└── {NN}_series/
└── {spec_id}/
├── raw.md
└── *.jpg (figures extracted alongside the text)
```
`marked/` does not yet include Rel-20 (ingestion in progress).
## Coverage
| Release | .docx | .doc | .zip | Total |
|---------|------:|-----:|-----:|------:|
| Rel-8 | 1145 | 0 | 0 | 1145 |
| Rel-9 | 1254 | 0 | 0 | 1254 |
| Rel-10 | 1132 | 0 | 0 | 1132 |
| Rel-11 | 1265 | 0 | 0 | 1265 |
| Rel-12 | 1316 | 0 | 0 | 1316 |
| Rel-13 | 1429 | 0 | 0 | 1429 |
| Rel-14 | 1393 | 0 | 0 | 1393 |
| Rel-15 | 1605 | 0 | 0 | 1605 |
| Rel-16 | 1713 | 0 | 0 | 1713 |
| Rel-17 | 1906 | 0 | 0 | 1906 |
| Rel-18 | 1051 | 564 | 0 | 1615 |
| Rel-19 | 1542 | 75 | 0 | 1617 |
| Rel-20 | 42 | 6 | 48 | 96 |
Grand total: ~17,686 source files across 13 releases.
Notes:
- Rel-14 is the first release with `38_series` (5G NR).
- Rel-19 omits `25_series` (UTRA legacy, deprecated).
- Rel-18 extends past 38 into the 41–55 range (mostly legacy `.doc`).
- Rel-20 mixes `.docx` specs with their paired `.zip` distributions.
## Quickstart
Pull a single spec:
```bash
curl -L -o 21101-j00.docx \
https://huggingface.co/datasets/GSMA/3GPP/resolve/main/original/Rel-19/21_series/21101-j00.docx
```
Pull the markdown view of the same spec:
```bash
curl -L -o raw.md \
https://huggingface.co/datasets/GSMA/3GPP/resolve/main/marked/Rel-19/21_series/21101/raw.md
```
Clone a single release folder with the HF CLI:
```bash
huggingface-cli download GSMA/3GPP \
--repo-type dataset \
--include 'marked/Rel-19/**' \
--local-dir ./rel-19
```
## Provenance
- `original/` was downloaded from `3gpp.org` release archives.
- `marked/` is produced by running each document through
[Datalab's](https://datalab.to/) DOCX → Markdown converter, preserving
section structure, tables, and extracting embedded figures as JPGs.
## Licence
3GPP specifications are released under the 3GPP terms of use. Redistribution
here is limited to mirroring the public publications; consult the upstream
source for authoritative versions and for any commercial use.
- Upstream: https://www.3gpp.org/specifications
- 3GPP copyright: https://www.3gpp.org/about-us/ip-policy
language:
- 英语
license: 其他
license_name: 3GPP
license_link: https://www.3gpp.org/specifications-technologies/legal-matters
tags:
- 电信
- 3GPP
- 5G
- NR (New Radio,新空口)
- LTE (Long Term Evolution,长期演进)
- 标准
- 规范
pretty_name: 3GPP技术规范集
configs:
- config_name: rel-18(发布版本18)
default: true
data_files:
- split: 21系列
path: data/rel-18/21_series-*.parquet
- split: 22系列
path: data/rel-18/22_series-*.parquet
- split: 23系列
path: data/rel-18/23_series-*.parquet
- split: 24系列
path: data/rel-18/24_series-*.parquet
- split: 26系列
path: data/rel-18/26_series-*.parquet
- split: 27系列
path: data/rel-18/27_series-*.parquet
- split: 28系列
path: data/rel-18/28_series-*.parquet
- split: 29系列
path: data/rel-18/29_series-*.parquet
- split: 31系列
path: data/rel-18/31_series-*.parquet
- split: 32系列
path: data/rel-18/32_series-*.parquet
- split: 33系列
path: data/rel-18/33_series-*.parquet
- split: 36系列
path: data/rel-18/36_series-*.parquet
- split: 37系列
path: data/rel-18/37_series-*.parquet
- split: 38系列
path: data/rel-18/38_series-*.parquet
- config_name: rel-19(发布版本19)
data_files:
- split: 21系列
path: data/rel-19/21_series-*.parquet
- split: 22系列
path: data/rel-19/22_series-*.parquet
- split: 23系列
path: data/rel-19/23_series-*.parquet
- split: 24系列
path: data/rel-19/24_series-*.parquet
- split: 26系列
path: data/rel-19/26_series-*.parquet
- split: 27系列
path: data/rel-19/27_series-*.parquet
- split: 28系列
path: data/rel-19/28_series-*.parquet
- split: 29系列
path: data/rel-19/29_series-*.parquet
- split: 31系列
path: data/rel-19/31_series-*.parquet
- split: 32系列
path: data/rel-19/32_series-*.parquet
- split: 33系列
path: data/rel-19/33_series-*.parquet
- split: 34系列
path: data/rel-19/34_series-*.parquet
- split: 35系列
path: data/rel-19/35_series-*.parquet
- split: 36系列
path: data/rel-19/36_series-*.parquet
- split: 37系列
path: data/rel-19/37_series-*.parquet
- split: 38系列
path: data/rel-19/38_series-*.parquet
# 3GPP技术规范集
本数据集包含938份3GPP技术规范与技术报告,已拆解为包含内嵌表格与图表的独立章节,为经过预验证的数据集。
支持版本:Rel-18、Rel-19
## 使用方法
python
from datasets import load_dataset
# 加载Rel-18版本的NR(新空口)技术规范数据集
ds = load_dataset("GSMA/3GPP", "rel-18", split="38_series")
# 获取指定章节
section = ds.filter(lambda r: r["spec_id"] == "38331" and r["clause"] == "5.2.1")[0]
print(section["body"]) # 章节内容为独立完整单元,内嵌所有表格与图表
# 重构完整规范文档
spec = ds.filter(lambda r: r["spec_id"] == "38331").sort("document_order")
full_text = "
".join(spec["body"])
# 加载Rel-19版本数据集
ds19 = load_dataset("GSMA/3GPP", "rel-19", split="38_series")
## 数据结构
| 列名 | 数据类型 | 字段说明 |
|------|----------|----------|
| `spec_id` | 字符串 | 规范标识符,例如`38331` |
| `spec_number` | 字符串 | 带点分隔的规范编号,例如`38.331` |
| `spec_type` | 字符串 | 规范类型:`TS`(Technical Specification,技术规范)或`TR`(Technical Report,技术报告) |
| `title` | 字符串 | 规范完整标题 |
| `release` | 字符串 | 所属版本:`Rel-18` 或 `Rel-19` |
| `clause` | 字符串 | 章节编号,例如`5.2.1` |
| `section_title` | 字符串 | 章节标题 |
| `parent_clause` | 字符串 | 父章节编号,例如`5.2` |
| `depth` | 32位整数 | 标题层级(1至6) |
| `body` | 字符串 | 独立完整的Markdown格式内容(内嵌表格与图表) |
| `body_chars` | 32位整数 | 内容字符数 |
| `document_order` | 32位整数 | 规范内的文档排序位置 |
| `images` | 图像列表 | 本章节引用的图像 |
| `image_hashes` | 字符串列表 | 对应图像的MD5哈希值 |
## 原始DOCX文件
3GPP原始Word文档(Rel-18版本含549个文件,Rel-19版本含442个文件)可在`original/`路径下获取:
original/
rel-18/
rel-19/
文件名遵循3GPP命名规范:`{spec_id}-{version}[_{part}].docx`。大型规范可能拆分为多个文件(例如`38101-1-j10_cover.docx`与`38101-1-j10_s00-0504.docx`)。
### 下载单个文件
python
from huggingface_hub import hf_hub_download
path = hf_hub_download(
repo_id="GSMA/3GPP",
filename="original/rel-18/38331-i00.docx",
repo_type="dataset",
)
## 许可证信息
详见[3GPP法律事项](https://www.3gpp.org/specifications-technologies/legal-matters)。
提供机构:
GSMA



