houqiiiii/TopoSense-Bench

Name: houqiiiii/TopoSense-Bench
Creator: houqiiiii
Published: 2025-12-09 15:35:37
License: 暂无描述

Hugging Face2025-12-09 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/houqiiiii/TopoSense-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - text-generation - question-answering - robotics - reinforcement-learning - graph-ml language: - en tags: - planning - iot - sensor-scheduling - spatial-reasoning - trajectory-planning size_categories: - 1K<n<10K configs: - config_name: queries data_files: "data/queries.jsonl" - config_name: topology data_files: "data/topology.jsonl" --- # TopoSense-Bench: A Campus-Scale Benchmark for Semantic-Spatial Sensor Scheduling **TopoSense-Bench** is a large-scale, rigorous benchmark designed to evaluate Large Language Models (LLMs) and agents on the **Semantic-Spatial Sensor Scheduling (S³)** problem. It features a realistic digital twin of a university campus equipped with **2,510 cameras** and contains **5,250 natural language queries** grounded in physical topology. This dataset is the official benchmark for the **ACM MobiCom 2026** paper: **"IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling"**. ## 🚀 Dataset Summary Intelligent systems powered by large-scale sensor networks are shifting from predefined monitoring to intent-driven operation. However, bridging the gap between high-level human intent (e.g., *"Find my backpack lost between the library and the gym"*) and precise sensor actions requires complex spatial reasoning. **TopoSense-Bench** addresses this by providing: - **A Massive Topological Knowledge Base**: Covering **33 buildings**, **161 floor plans**, and **2,510 sensors**, built based on real-world OpenStreetMap (OSM) data and semi-automatically annotated with sensor attributes. - **Hierarchical Query Complexity**: 5,250 queries organized into 3 tiers of difficulty, ranging from simple intra-zone perception to complex cross-building coordination. - **Rigorous Grounding**: Every query is paired with a ground-truth topological answer, verified through a strict "Expert Template -> LLM Paraphrasing -> Manual Curation" pipeline to ensure logical soundness. ## 📂 Dataset Structure The dataset allows loading two distinct configurations: `queries` and `topology`. ### 1. Configuration: `queries` This subset contains the natural language instructions and their corresponding ground truth. - **Data Fields**: - `category`: The complexity tier of the task (e.g., "T3.b Hybrid Indoor-Outdoor"). - `query`: The natural language request from the user (e.g., *"Please check all cameras on the path from the library 4F to the hospital 1F..."*). - `answer`: The ground truth sensor ID or topological node info. - **Taxonomy**: - **Tier 1 (Intra-Zone)**: Focal Scene Awareness, Panoramic Coverage. - **Tier 2 (Intra-Building)**: Intra-Building Coordination. - **Tier 3 (Inter-Building)**: Open-Space Coordination, Hybrid Indoor-Outdoor Navigation. ### 2. Configuration: `topology` This subset contains the raw topological data used to construct the world model graph. It mimics the storage structure of a distributed knowledge base. - **Data Fields**: - `building`: The name of the facility (e.g., "library", "information_building"). - `floor`: The specific floor level (e.g., "1F", "4F"). - `filename`: The original source filename. - `content`: The raw text content containing node coordinates, connections, and sensor tags. ## 💻 Quick Start You can load the dataset directly using the Hugging Face `datasets` library. ### Loading Queries (Default) ```python from datasets import load_dataset # Load the benchmark queries dataset = load_dataset("YourUsername/TopoSense-Bench", "queries", trust_remote_code=True) # Inspect the first example print(dataset['test'][0]) # Output: # { # 'category': 'T1.a Focal Scene Awareness', # 'query': 'Please verify activity near the door to room 5F 1...', # 'answer': 'Node(188, 303, Tags: ...)' # } ``` ### Loading Topology Data ```python from datasets import load_dataset # Load the topological knowledge base topology = load_dataset("YourUsername/TopoSense-Bench", "topology", trust_remote_code=True) # Inspect a building file print(topology['test'][0]) ``` ## 🛠️ Data Construction & Privacy ### Construction Pipeline To ensure the benchmark's quality and realism, we employed a three-stage pipeline: 1. **Expert Templating**: Domain experts authored ~200 base templates per tier to cover realistic operational scenarios. 2. **LLM Paraphrasing**: State-of-the-art LLMs (e.g., GPT-4o) were used to paraphrase templates to introduce linguistic diversity. 3. **Manual Curation**: Every query underwent manual expert verification to guarantee logical soundness and topological solvability. ### Privacy & De-identification The dataset is derived from a real-world university campus digital twin. To protect privacy: - All sensitive location names have been de-identified (e.g., specific building names replaced with generic functional names like "Information Building"). - No real-world video feeds or images are included; the dataset operates entirely on the symbolic/topological level. ## 📜 Citation If you use **TopoSense-Bench** in your research, please cite our **MobiCom '26** paper: ```bibtex @inproceedings{iotbrain2026, title={IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling}, author={Anonymous Author(s)}, booktitle={Proceedings of the 32nd Annual International Conference on Mobile Computing and Networking (MobiCom '26)}, year={2026}, publisher={ACM} } ``` ## 📄 License This dataset is licensed under the [MIT License](https://opensource.org/licenses/MIT).

提供机构：

houqiiiii

5,000+

优质数据集

54 个

任务类型

进入经典数据集