five

GSMA/ot-full

收藏
Hugging Face2026-03-26 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/GSMA/ot-full
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - question-answering - text-classification language: - en tags: - telecommunications - telecom - 3gpp - 5g - benchmarks - evaluation - llm pretty_name: Open Telco Full Benchmarks size_categories: - 10K<n<100K configs: - config_name: teleqna data_files: - split: test path: teleqna/test-* - config_name: teletables data_files: - split: test path: teletables/test-* - config_name: telemath data_files: - split: test path: telemath/test-* - config_name: telelogs data_files: - split: test path: telelogs/test-* - config_name: 3gpp_tsg data_files: - split: test path: 3gpp_tsg/test-* - config_name: oranbench data_files: - split: test path: oranbench/test-* - config_name: srsranbench data_files: - split: test path: srsranbench/test-* - config_name: sixg_bench data_files: - split: test path: sixg_bench/test-* dataset_info: - config_name: teleqna features: - name: question dtype: string - name: choices list: string - name: answer dtype: int64 - name: subject dtype: string splits: - name: test num_examples: 10000 - config_name: teletables features: - name: question dtype: string - name: choices list: string - name: answer dtype: int64 - name: explanation dtype: string - name: difficult dtype: bool - name: table_id dtype: string - name: table_title dtype: string - name: document_id dtype: string - name: document_title dtype: string - name: document_url dtype: string splits: - name: test num_examples: 500 - config_name: telemath features: - name: question dtype: string - name: answer dtype: float64 - name: category dtype: string - name: tags list: string - name: difficulty dtype: string splits: - name: test num_examples: 500 - config_name: telelogs features: - name: question dtype: string - name: answer dtype: string splits: - name: test num_examples: 864 - config_name: 3gpp_tsg features: - name: question dtype: string - name: answer dtype: string - name: file_name dtype: string splits: - name: test num_examples: 2000 - config_name: oranbench features: - name: question dtype: string - name: choices list: string - name: answer dtype: int64 - name: difficulty dtype: string splits: - name: test num_examples: 1500 - config_name: srsranbench features: - name: question dtype: string - name: choices list: string - name: answer dtype: int64 splits: - name: test num_examples: 1502 - config_name: sixg_bench features: - name: question dtype: string - name: choices list: string - name: answer dtype: int64 - name: task_id dtype: string - name: task_name dtype: string - name: difficulty dtype: string - name: category dtype: string splits: - name: test num_bytes: 6317797 num_examples: 3722 download_size: 2646691 dataset_size: 6317797 --- # Open Telco Full Benchmarks **20,588 telecom-specific evaluation samples** across 8 benchmarks — the complete evaluation suite for measuring telecom AI performance. Use this dataset for final, publishable results. For fast iteration during model development, use [`GSMA/ot-lite`](https://huggingface.co/datasets/GSMA/ot-lite). [Eval Framework](https://github.com/gsma-labs/evals) | [Sample Data](https://huggingface.co/datasets/GSMA/ot-lite) ## Benchmarks | Config | Samples | Task | Paper | |--------|--------:|------|-------| | `teleqna` | 10,000 | Multiple-choice Q&A on telecom standards | [arXiv](https://arxiv.org/abs/2310.15051) | | `teletables` | 500 | Table interpretation from 3GPP specs | [arXiv](https://arxiv.org/abs/2601.04202) | | `telemath` | 500 | Telecom mathematical reasoning | [arXiv](https://arxiv.org/abs/2506.10674) | | `telelogs` | 864 | 5G network root cause analysis | [arXiv](https://arxiv.org/abs/2507.21974) | | `3gpp_tsg` | 2,000 | 3GPP document classification by working group | [arXiv](https://arxiv.org/abs/2407.09424) | | `oranbench` | 1,500 | Multiple-choice Q&A on O-RAN specifications | [arXiv](https://arxiv.org/abs/2407.06245) | | `srsranbench` | 1,502 | Multiple-choice Q&A on srsRAN 5G codebase | [arXiv](https://arxiv.org/abs/2503.05200) | | `sixg_bench` | 3,722 | AI-native 6G network reasoning | [arXiv](https://arxiv.org/abs/2602.08675) | > For quick testing, use [`GSMA/ot-lite`](https://huggingface.co/datasets/GSMA/ot-lite). ## Quick Start ```python from datasets import load_dataset ds = load_dataset("GSMA/ot-full", "sixg_bench", split="test") # Available configs: teleqna, teletables, telemath, telelogs, 3gpp_tsg, oranbench, srsranbench, sixg_bench ``` Or run evaluations with [Inspect AI](https://inspect.aisi.org.uk/): ```bash uv run inspect eval src/evals/sixg_bench/sixg_bench.py --model openai/gpt-4o -T full=true ``` See [Running Evaluations](https://github.com/gsma-labs/evals/blob/main/docs/running-evaluations.md) for the full guide. ## Citation ```bibtex @misc{maatouk2023teleqna, title={TeleQnA: A Benchmark Dataset to Assess Large Language Models Telecommunications Knowledge}, author={Maatouk, Ali and Ayed, Fadhel and Piovesan, Nicola and De Domenico, Antonio and Debbah, Merouane and Luo, Zhi-Quan}, year={2023}, eprint={2310.15051}, archivePrefix={arXiv} } @misc{ezzakri2025teletables, title={TeleTables: A Benchmark for Large Language Models in Telecom Table Interpretation}, author={Ezzakri, Anas and Piovesan, Nicola and Sana, Mohamed and De Domenico, Antonio and Ayed, Fadhel and Zhang, Haozhe}, year={2025}, eprint={2601.04202}, archivePrefix={arXiv} } @misc{colle2025telemath, title={TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving}, author={Colle, Vincenzo and Sana, Mohamed and Piovesan, Nicola and De Domenico, Antonio and Ayed, Fadhel and Debbah, Merouane}, year={2025}, eprint={2506.10674}, archivePrefix={arXiv} } @misc{sana2025telelogs, title={Reasoning Language Models for Root Cause Analysis in 5G Wireless Networks}, author={Sana, Mohamed and Piovesan, Nicola and De Domenico, Antonio and Kang, Yibin and Zhang, Haozhe and Debbah, Merouane and Ayed, Fadhel}, year={2025}, eprint={2507.21974}, archivePrefix={arXiv} } @misc{zou2024telecomgpt, title={TelecomGPT: A Framework to Build Telecom-Specific Large Language Models}, author={Zou, Hang and Zhao, Qiyang and Tian, Yu and Bariah, Lina and Bader, Faouzi and Lestable, Thierry and Debbah, Merouane}, year={2024}, eprint={2407.09424}, archivePrefix={arXiv} } @misc{gajjar2024oranbench, title={ORAN-Bench-13K: An Open Source Benchmark for Assessing LLMs in Open Radio Access Networks}, author={Gajjar, Pranshav and Shah, Vijay K.}, year={2024}, eprint={2407.06245}, archivePrefix={arXiv} } @misc{gajjar2025oransight2, title={ORANSight-2.0: Foundational LLMs for O-RAN}, author={Gajjar, Pranshav and Shah, Vijay K.}, year={2025}, eprint={2503.05200}, archivePrefix={arXiv} } @misc{ferrag2026sixgbench, title={6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G Networks}, author={Ferrag, Mohamed Amine and Lakas, Abderrahmane and Debbah, Merouane}, year={2026}, eprint={2602.08675}, archivePrefix={arXiv} } ```
提供机构:
GSMA
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作