prnshv/TeleResilienceBench
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/prnshv/TeleResilienceBench
下载链接
链接失效反馈官方服务:
资源简介:
TeleResilienceBench是一个电信领域的基准测试,用于评估模型的**推理延续韧性**:给定一个问题和一个部分完成(可能包含错误)的推理轨迹,模型必须继续推理并恢复正确的最终答案。该基准测试涵盖GSMA Open-Telco LLM套件中的七个电信子领域,实例是通过从一个弱生成器中获取失败的解决方案,在其错误轨迹的中点截断,并将该部分轨迹作为目标模型的延续上下文构建的。性能评估采用以恢复为中心的指标,主要包括**正确翻转率(CFR)**,以及**错误翻转率(WFR)**和**无翻转率(NFR)**。数据集包含两个主要文件:TeleResilienceBench.csv(主文件,多选题)和Auxiliary.csv(辅助文件,电信数学问题)。
`TeleResilienceBench` is a telecommunications benchmark for **reasoning continuation resilience**: given a question and a partially completed (and potentially flawed) reasoning trace, a model must continue the reasoning and recover the correct final answer. The benchmark spans seven telecom sub-domains from the GSMA Open-Telco LLM suite, and instances are constructed by taking failed solutions from a weak generator, truncating the flawed trace at its midpoint, and using that partial trace as the continuation context for target models. Performance is evaluated with recovery-focused metrics, primarily **Correct Flip Rate (CFR)**, alongside **Wrong Flip Rate (WFR)** and **No Flip Rate (NFR)**. The dataset includes two main files: `TeleResilienceBench.csv` (main, multiple-choice) and `Auxiliary.csv` (aux, telecom-math).
提供机构:
prnshv



