Urban Wireless Sensor Network (WSN) Dataset for Environmental Monitoring, Communication Analysis, and Anomaly Detection
收藏DataCite Commons2025-05-12 更新2025-04-15 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/CM5QX9
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is a comprehensive synthetic representation of data collected from a real-world **urban Wireless Sensor Network (WSN)** deployed for environmental monitoring, communication analysis, and anomaly detection. The dataset simulates the activity of **500 sensor nodes** distributed across an urban environment, capturing over **1 million data points**. Each sensor node records multiple environmental variables, communication metrics, and operational data, providing a rich source for research and analysis in various fields such as IoT-based smart cities, sensor network optimization, anomaly detection, and urban environmental studies.
The dataset is available in **Parquet** and **CSV** formats, with each file containing '1,000,000 rows' and '12 columns'. The Parquet format is particularly suited for large-scale data processing, as it allows efficient data compression and columnar storage, while the CSV format ensures compatibility with a wide range of tools and platforms for analysis.
Dataset Features:
1. `sensor_id` (Integer):
- Unique identifier for each of the 500 sensor nodes in the network.
- Range: **1 to 500**.
- Purpose: Distinguishes between different sensor nodes, allowing analysis of node-specific behavior.
2. `timestamp` (Datetime):
- The exact time at which each sensor reading was recorded.
- Range: Randomly generated timestamps spanning one year.
- Purpose: Enables time-series analysis, trend discovery, and temporal anomaly detection. Useful for studying patterns over time, such as seasonal environmental changes or sensor failures.
3. `temperature` (Float):
- The temperature reading recorded by each sensor node (in Celsius).
- Range: **10°C to 40°C**.
- Purpose: Captures temperature variations in the urban area, which could be used for climate studies, urban heat mapping, or environmental modeling.
4. `humidity` (Float):
- The relative humidity recorded by the sensor node (in percentage).
- Range: **20% to 90%**.
- Purpose: Useful for studying atmospheric conditions, correlating humidity with other environmental variables, or examining anomalies related to sensor faults or weather conditions.
5. `ambient_light` (Float):
- The level of ambient light measured by the sensor (in Lux).
- Range: **100 to 1000 Lux**.
- Purpose: Useful for urban lighting studies, detecting lighting failures in smart cities, or assessing sunlight exposure patterns in specific locations.
6. `sensor_reading` (Float):
- The general sensor data reading (arbitrary units).
- Range: **0 to 100**.
- Purpose: Represents operational sensor output. It could be an aggregation of different parameters, or abstract sensor readings used for system health analysis or anomaly detection.
7. **`signal_strength`** (Float):
- The strength of the signal transmitted by the sensor node, measured in decibel-milliwatts (dBm).
- Range: **-100 dBm to -30 dBm**.
- Purpose: Reflects communication performance and network reliability. Can be used to study signal attenuation in urban environments or evaluate network performance under different conditions.
8. `battery_level` (Float):
- The remaining battery level of the sensor node (in percentage).
- Range: **10% to 100%**.
- Purpose: Monitors sensor node power levels. It can be used to optimize sensor node maintenance, analyze power consumption patterns, or develop energy-efficient algorithms for IoT networks.
9. `latitude` (Float):
- The geographical latitude of the sensor node.
- Range: **40.7128 to 40.7484** (simulating a section of New York City).
- Purpose: Useful for geospatial analysis of sensor data, identifying patterns based on location, and integrating with mapping tools for visualization.
10. `longitude` (Float):
- The geographical longitude of the sensor node.
- Range: **-74.0060 to -73.9352**.
- Purpose: Paired with latitude, it allows spatial analysis of the network's behavior and performance. It can also be used for geolocation-based anomaly detection or optimization.
11. `packet_loss_rate` (Float):
- The percentage of data packets lost during communication between the sensor and the central network.
- Range: **0% to 5%**.
- Purpose: Assesses the reliability of sensor communication. Can be used to detect network issues, optimize routing protocols, or improve network robustness.
12. `anomalous_event` (Binary):
- A binary flag indicating whether an anomalous event occurred (0 = Normal, 1 = Anomaly).
- Range: **0 or 1** (5% of the data is labeled as anomalies).
- Purpose: Enables research on anomaly detection, failure prediction, and system reliability improvement. The anomalies could represent sensor malfunctions, environmental events, or communication failures.
Potential Applications of the Dataset:
1. Anomaly Detection:
- The dataset provides rich information on anomalous events, which can be used for developing, testing, and benchmarking anomaly detection algorithms. Researchers can use the data to identify patterns that signal failures in sensor nodes, such as rapid battery depletion or sudden changes in environmental conditions.
2. Time-Series Analysis:
- With timestamps available, the dataset is ideal for studying temporal trends and seasonality in environmental parameters like temperature, humidity, and ambient light. Time-series models such as ARIMA, LSTM, or Prophet can be applied for predictive analysis.
3. Network Performance Optimization:
- The signal strength, packet loss, and battery level data provide a basis for evaluating network performance. This dataset can be used to optimize communication protocols, enhance sensor node energy efficiency, and improve overall network reliability in IoT deployments.
4. Geospatial Analysis:
- With latitude and longitude provided for each sensor, the dataset can be integrated with geospatial tools like GIS for mapping and spatial analysis. This allows users to study spatial patterns, perform location-based clustering, or correlate environmental variables with geographical factors.
5. Energy Consumption Studies:
- Battery level data allows researchers to explore energy consumption patterns and identify factors that drain battery life. This is especially useful for developing energy-efficient IoT systems, optimizing sensor operation schedules, and improving sensor node design.
6. Environmental Monitoring and Prediction:
- The dataset’s environmental features (temperature, humidity, ambient light) make it suitable for urban climate modeling, air quality studies, and smart city planning. Machine learning models can be trained on this data to predict weather changes, air quality, or optimize public lighting systems.
7. Urban IoT Systems:
- The dataset is a valuable resource for simulating and evaluating large-scale IoT systems deployed in smart cities. It allows for experimentation with sensor placements, network configuration, and smart city use cases, such as real-time monitoring and automated maintenance.
Dataset Summary:
- Rows: 1,000,000 (data points)
- Columns: 12 (features)
- File Formats: Available in "Parquet" and "CSV"
- Size: Varies by format (Parquet is more space-efficient due to compression)
Conclusion:
This Urban WSN dataset serves as a rich, multi-dimensional resource for researchers and practitioners working on IoT, anomaly detection, environmental studies, and communication networks. The diversity of the data makes it suitable for a wide range of applications, including but not limited to predictive modeling, spatial analysis, and system optimization. With over 1 million records of simulated real-world data, this dataset enables scalable, high-impact research in smart city technologies and sensor network management.
提供机构:
Harvard Dataverse
创建时间:
2024-09-06



