SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing

Name: SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing
Creator: Pengrui Quan
License: 暂无描述

IEEE2026-04-17 收录

下载链接：

https://ieee-dataport.org/documents/sensorbench-benchmarking-llms-coding-based-sensor-processing

下载链接

链接失效反馈

官方服务：

资源简介：

Effective sensor data processing is critical for cyber-physical and IoT systems but often requires specialized expertise. While Large Language Models (LLMs) and Large Reasoning Models (LRMs) show promise as autonomous copilots for sensor processing, their capabilities remain underexplored. We introduce SensorBench, the first comprehensive benchmark for evaluating LLMs across diverse real-world sensor datasets and tasks. SensorBench evaluates three paradigms for leveraging LLMs in sensing tasks: Coding with API access (CA), Coding without API access (CNA), and Direct Answering (DA). Our analysis reveals that: (1) CA significantly outperforms CNA and DA; (2) LLMs excel at simple tasks but consistently underperform domain experts on compositional tasks requiring parameter tuning and multi-step reasoning. (3) The reasoning mechanism introduced in LRMs does not yield substantial performance gains.To improve the performance, we explore four prompting strategies and fine-tuning approaches (using our newly released sensor-processing corpus). The results show that self-verification prompting proves most effective, outperforming other methods simultaneously in 48\\% of tasks, while fine-tuning yields marginal gains. Our analysis suggests that more sophisticated interaction frameworks, such as signal-level self-verification, may bridge the gap to human expert-level performance. This benchmark provides a foundation for evaluating and improving LLMs in sensing applications

提供机构：

Pengrui Quan

5,000+

优质数据集

54 个

任务类型

进入经典数据集