five

GSMA/3GPP

收藏
Hugging Face2026-04-15 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/GSMA/3GPP
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other license_name: 3gpp license_link: https://www.3gpp.org/ language: - en tags: - telecommunications - 3gpp - 5g - 4g - lte - nr - standards size_categories: - 10K<n<100K pretty_name: 3GPP Specifications (Rel-8 to Rel-20) --- # 3GPP Specifications — GSMA Mirror Two-folder mirror of the 3GPP Technical Specification corpus, Releases 8 through 20, plus the full Datalab-converted Markdown rendering used for downstream training and retrieval. ## Layout ``` GSMA/3GPP/ ├── original/ raw sources, as downloaded from 3gpp.org │ └── Rel-{8..20}/ │ └── {NN}_series/ │ ├── *.docx │ ├── *.doc (legacy, mostly Rel-18 41-55 series) │ └── *.zip (Rel-20 paired packages) └── marked/ Datalab-converted Markdown + inline images └── Rel-{8..19}/ └── {NN}_series/ └── {spec_id}/ ├── raw.md └── *.jpg (figures extracted alongside the text) ``` `marked/` does not yet include Rel-20 (ingestion in progress). ## Coverage | Release | .docx | .doc | .zip | Total | |---------|------:|-----:|-----:|------:| | Rel-8 | 1145 | 0 | 0 | 1145 | | Rel-9 | 1254 | 0 | 0 | 1254 | | Rel-10 | 1132 | 0 | 0 | 1132 | | Rel-11 | 1265 | 0 | 0 | 1265 | | Rel-12 | 1316 | 0 | 0 | 1316 | | Rel-13 | 1429 | 0 | 0 | 1429 | | Rel-14 | 1393 | 0 | 0 | 1393 | | Rel-15 | 1605 | 0 | 0 | 1605 | | Rel-16 | 1713 | 0 | 0 | 1713 | | Rel-17 | 1906 | 0 | 0 | 1906 | | Rel-18 | 1051 | 564 | 0 | 1615 | | Rel-19 | 1542 | 75 | 0 | 1617 | | Rel-20 | 42 | 6 | 48 | 96 | Grand total: ~17,686 source files across 13 releases. Notes: - Rel-14 is the first release with `38_series` (5G NR). - Rel-19 omits `25_series` (UTRA legacy, deprecated). - Rel-18 extends past 38 into the 41–55 range (mostly legacy `.doc`). - Rel-20 mixes `.docx` specs with their paired `.zip` distributions. ## Quickstart Pull a single spec: ```bash curl -L -o 21101-j00.docx \ https://huggingface.co/datasets/GSMA/3GPP/resolve/main/original/Rel-19/21_series/21101-j00.docx ``` Pull the markdown view of the same spec: ```bash curl -L -o raw.md \ https://huggingface.co/datasets/GSMA/3GPP/resolve/main/marked/Rel-19/21_series/21101/raw.md ``` Clone a single release folder with the HF CLI: ```bash huggingface-cli download GSMA/3GPP \ --repo-type dataset \ --include 'marked/Rel-19/**' \ --local-dir ./rel-19 ``` ## Provenance - `original/` was downloaded from `3gpp.org` release archives. - `marked/` is produced by running each document through [Datalab's](https://datalab.to/) DOCX → Markdown converter, preserving section structure, tables, and extracting embedded figures as JPGs. ## Licence 3GPP specifications are released under the 3GPP terms of use. Redistribution here is limited to mirroring the public publications; consult the upstream source for authoritative versions and for any commercial use. - Upstream: https://www.3gpp.org/specifications - 3GPP copyright: https://www.3gpp.org/about-us/ip-policy

language: - 英语 license: 其他 license_name: 3GPP license_link: https://www.3gpp.org/specifications-technologies/legal-matters tags: - 电信 - 3GPP - 5G - NR (New Radio,新空口) - LTE (Long Term Evolution,长期演进) - 标准 - 规范 pretty_name: 3GPP技术规范集 configs: - config_name: rel-18(发布版本18) default: true data_files: - split: 21系列 path: data/rel-18/21_series-*.parquet - split: 22系列 path: data/rel-18/22_series-*.parquet - split: 23系列 path: data/rel-18/23_series-*.parquet - split: 24系列 path: data/rel-18/24_series-*.parquet - split: 26系列 path: data/rel-18/26_series-*.parquet - split: 27系列 path: data/rel-18/27_series-*.parquet - split: 28系列 path: data/rel-18/28_series-*.parquet - split: 29系列 path: data/rel-18/29_series-*.parquet - split: 31系列 path: data/rel-18/31_series-*.parquet - split: 32系列 path: data/rel-18/32_series-*.parquet - split: 33系列 path: data/rel-18/33_series-*.parquet - split: 36系列 path: data/rel-18/36_series-*.parquet - split: 37系列 path: data/rel-18/37_series-*.parquet - split: 38系列 path: data/rel-18/38_series-*.parquet - config_name: rel-19(发布版本19) data_files: - split: 21系列 path: data/rel-19/21_series-*.parquet - split: 22系列 path: data/rel-19/22_series-*.parquet - split: 23系列 path: data/rel-19/23_series-*.parquet - split: 24系列 path: data/rel-19/24_series-*.parquet - split: 26系列 path: data/rel-19/26_series-*.parquet - split: 27系列 path: data/rel-19/27_series-*.parquet - split: 28系列 path: data/rel-19/28_series-*.parquet - split: 29系列 path: data/rel-19/29_series-*.parquet - split: 31系列 path: data/rel-19/31_series-*.parquet - split: 32系列 path: data/rel-19/32_series-*.parquet - split: 33系列 path: data/rel-19/33_series-*.parquet - split: 34系列 path: data/rel-19/34_series-*.parquet - split: 35系列 path: data/rel-19/35_series-*.parquet - split: 36系列 path: data/rel-19/36_series-*.parquet - split: 37系列 path: data/rel-19/37_series-*.parquet - split: 38系列 path: data/rel-19/38_series-*.parquet # 3GPP技术规范集 本数据集包含938份3GPP技术规范与技术报告,已拆解为包含内嵌表格与图表的独立章节,为经过预验证的数据集。 支持版本:Rel-18、Rel-19 ## 使用方法 python from datasets import load_dataset # 加载Rel-18版本的NR(新空口)技术规范数据集 ds = load_dataset("GSMA/3GPP", "rel-18", split="38_series") # 获取指定章节 section = ds.filter(lambda r: r["spec_id"] == "38331" and r["clause"] == "5.2.1")[0] print(section["body"]) # 章节内容为独立完整单元,内嵌所有表格与图表 # 重构完整规范文档 spec = ds.filter(lambda r: r["spec_id"] == "38331").sort("document_order") full_text = " ".join(spec["body"]) # 加载Rel-19版本数据集 ds19 = load_dataset("GSMA/3GPP", "rel-19", split="38_series") ## 数据结构 | 列名 | 数据类型 | 字段说明 | |------|----------|----------| | `spec_id` | 字符串 | 规范标识符,例如`38331` | | `spec_number` | 字符串 | 带点分隔的规范编号,例如`38.331` | | `spec_type` | 字符串 | 规范类型:`TS`(Technical Specification,技术规范)或`TR`(Technical Report,技术报告) | | `title` | 字符串 | 规范完整标题 | | `release` | 字符串 | 所属版本:`Rel-18` 或 `Rel-19` | | `clause` | 字符串 | 章节编号,例如`5.2.1` | | `section_title` | 字符串 | 章节标题 | | `parent_clause` | 字符串 | 父章节编号,例如`5.2` | | `depth` | 32位整数 | 标题层级(1至6) | | `body` | 字符串 | 独立完整的Markdown格式内容(内嵌表格与图表) | | `body_chars` | 32位整数 | 内容字符数 | | `document_order` | 32位整数 | 规范内的文档排序位置 | | `images` | 图像列表 | 本章节引用的图像 | | `image_hashes` | 字符串列表 | 对应图像的MD5哈希值 | ## 原始DOCX文件 3GPP原始Word文档(Rel-18版本含549个文件,Rel-19版本含442个文件)可在`original/`路径下获取: original/ rel-18/ rel-19/ 文件名遵循3GPP命名规范:`{spec_id}-{version}[_{part}].docx`。大型规范可能拆分为多个文件(例如`38101-1-j10_cover.docx`与`38101-1-j10_s00-0504.docx`)。 ### 下载单个文件 python from huggingface_hub import hf_hub_download path = hf_hub_download( repo_id="GSMA/3GPP", filename="original/rel-18/38331-i00.docx", repo_type="dataset", ) ## 许可证信息 详见[3GPP法律事项](https://www.3gpp.org/specifications-technologies/legal-matters)。
提供机构:
GSMA
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作