旧金山湾地区高速公路的不同车道上的比率数据集

Name: 旧金山湾地区高速公路的不同车道上的比率数据集
Creator: 帕依提提
License: 暂无描述

帕依提提2024-03-04 收录

下载链接：

https://www.payititi.com/opendatasets/show-26200.html

下载链接

链接失效反馈

官方服务：

资源简介：

Data Set Information: We have downloaded 15 months worth of daily data from the California Department of Transportation PEMS website, [Web link], The data describes the occupancy rate, between 0 and 1, of different car lanes of San Francisco bay area freeways. The measurements cover the period from Jan. 1st 2008 to Mar. 30th 2009 and are sampled every 10 minutes. We consider each day in this database as a single time series of dimension 963 (the number of sensors which functioned consistently throughout the studied period) and length 6 x 24=144. We remove public holidays from the dataset, as well as two days with anomalies (March 8th 2009 and March 9th 2008) where all sensors were muted between 2:00 and 3:00 AM. This results in a database of 440 time series. The task we propose on this dataset is to classify each observed day as the correct day of the week, from Monday to Sunday, e.g. label it with an integer in {1,2,3,4,5,6,7}. I will keep separate copies of this database on my website in a Matlab format. If you use Matlab, it might be more convenient to consider these .mat files directly. Data-Format ------------- There are two files for each fold, the data file and the labels file. We have split the 440 time series between train and test folds, but you are of course free to merge them to consider a different cross validation setting. - The PEMS_train textfile has 263 lines. Each line describes a time-series provided as a matrix. The matrix syntax is that of Matlab, e.g. [ a b ; c d] is the matrix with row vectors [a b] and [c d] in that order. Each matrix describes the different occupancies rates (963 lines, one for each station/detector) sampled every 10 minutes during the day (144 columns). - The PEMS_trainlabel text describes, for each day of measurements described above, the day of the week on which the data was sampled, namely an integer between 1 (Mon.) and 7 (Sun.). - PEMS_test and PEMS_testlabels are formatted in the same way, except that there are 173 test instances. - The permutation that I used to shuffle the dataset is given in the randperm file. If you need to rearrange the data so that it follows the calendar order, you should merge train and test samples and reorder them using the inverse permutation of randperm. Attribute Information: Each attribute describes the measurement of the occupancy rate (between 0 and 1) of a captor location as recorded by a measuring station, at a given timestamp in time during the day. The ID of each station is given in the stations_list text file. For more information on the location (GPS, Highway, Direction) of each station please refer to the PEMS website. There are 963 (stations) x 144 (timestamps) = 138.672 attributes for each record. Relevant Papers: M. Cuturi, Fast Global Alignment Kernels, Proceedings of the Intern. Conference on Machine Learning 2011. Citation Request: Please refer to the Machine Learning Repository's citation policy Source: California Department of Transportation, www.pems.dot.ca.gov Creator: Marco Cuturi, Kyoto University, mcuturi '@' i.kyoto-u.ac.jp

数据集信息：我们从美国加利福尼亚州交通部PEMS官网[网页链接]获取了为期15个月的每日交通数据。数据涵盖旧金山湾区高速公路各车道的交通占有率（取值范围0至1），采集时段为2008年1月1日至2009年3月30日，采样间隔为10分钟。本数据集将每日数据视为单条时间序列：序列维度为963（对应研究期间持续正常运行的传感器总数），序列长度为6×24=144（当日每10分钟一个采样点，共144个采样时刻）。我们已从数据集中剔除法定节假日，以及2008年3月9日和2009年3月8日这两个异常日期——当日凌晨2:00至3:00期间所有传感器均无有效数据。最终有效数据集共包含440条时间序列。本数据集的预设任务为：将每条每日时间序列分类至对应的星期几类别（周一至周日分别对应整数标签1至7）。作者已在个人网站上提供该数据集的Matlab格式副本，若您使用Matlab环境，可直接读取.mat格式文件以获得更佳使用体验。数据格式 ------------- 各折数据集均包含两个文件：数据文件与标签文件。我们已将440条时间序列划分为训练集与测试集，您也可自行合并二者以开展自定义的交叉验证实验。 - 训练集文件PEMS_train共263行，每行对应一条时间序列，以Matlab矩阵格式存储。例如`[a b; c d]`表示由行向量`[a b]`与`[c d]`组成的2×2矩阵。每条时间序列矩阵包含963行（对应每个检测站/传感器）与144列（对应当日每10分钟一次的采样时刻），每元素为对应车道的交通占有率。 - 训练集标签文件PEMS_trainlabel为每行一个整数标签，对应上述每日数据的采集星期几，其中1代表周一，7代表周日。 - 测试集文件PEMS_test与测试集标签文件PEMS_testlabels格式与训练集一致，仅测试集包含173条样本。 - 数据集打乱所用的随机置换存储于randperm文件中。若您需要将数据按日历时序重新排列，可合并训练集与测试集样本，并通过该随机置换的逆变换完成重排序。属性信息每条属性对应一个检测站在当日某一采样时刻的交通占有率测量值。各检测站的ID存储于stations_list文本文件中。若需获取各检测站的位置信息（包括GPS坐标、所属高速公路及行车方向），请参考PEMS官方网站。每条记录包含963（检测站数量）×144（采样时刻数）=138672个属性值。相关论文 M. Cuturi, 《快速全局对齐核》，收录于2011年国际机器学习大会（International Conference on Machine Learning）论文集。引用要求请遵循机器学习仓库（Machine Learning Repository）的引用规范。数据来源加利福尼亚州交通部，www.pems.dot.ca.gov 数据集创建者 Marco Cuturi，京都大学，邮箱：mcuturi '@' i.kyoto-u.ac.jp

提供机构：

帕依提提

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集包含旧金山湾地区高速公路不同车道的占用率数据，时间跨度为2008年1月1日至2009年3月30日，每10分钟采样一次，涉及963个传感器和144个时间戳，形成440个时间序列。数据集主要用于分类任务，目标是根据每天的占用率数据预测对应的星期几（周一至周日）。

以上内容由遇见数据集搜集并总结生成