five

Daily bus transportation demand - São Paulo-Brazil - Jan-2017 to Jun-2022

收藏
Mendeley Data2024-05-17 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/7082473
下载链接
链接失效反馈
官方服务:
资源简介:
The raw demand data was obtained from SPTRANS, the public company responsible for bus transportation in São Paulo. São Paulo's greatest public transportation network is the bus network, which is mostly managed by public company SPTRANS. SPTRANS provides bus transportation services during business hours and night shifts, along with specific services directed towards impaired and disabled citizens who need support in traveling to hospitals and health centers. SPTRANS makes available online daily mobility data for all bus lines in São Paulo, providing an enormously valuable resource for data mining, urban geography studies, and transportation planning. By using this data, it is possible to effectively study the impacts of pandemics on São Paulo's bus transportation network and gain insights into the regional differences and if different kinds of lines were affected differently. The data is distributed in xlxs format and contains information about line name, demand by kind of users, such as users who pay with money, users who use travel cards, elderly users (who do not need to pay for travel), and total demand. The line name is a string containing a code made of letters and numbers. Demand data between January/2017 and June/2022 were automatically downloaded and accessed by a script in python, using pandas library for creating dataframes. There were significant challenges in using SPTRANS data, such as the irregular formatting of row and column names through the years, inconsistent abbreviations of keywords, such as terminal stations and metro stations (i.e "term.","terminal","metr","m", etc). These issues were solved by capturing the code and crossing the data with General Transit Feed Specification (GTFS) data, which is provided by SPTRANS as well on an almost weekly basis and contains information about route and schedule of lines for a given time period. This step enriched the data with standardized names and route points. Another difficulty in obtaining the data through automatic scripts is that Links do not have regular naming patterns. A significant number of downloaded XLSX files came with a bad configuration so that empty rows are read by pandas as filled with content, triggering an error related to a prohibitively large number of rows. This bug was solved by simply individually opening it and re-saving it. The details about the origin of this error are not understood by the authors. All the bus lines were enriched with geospatial information, containing all points belonging to each route, extracted from GTFS. After all these steps, the data was organized in a matrix where each row represents a day and each column represents a line. The geospatial information was stored separately. And since bus lines are created or deactivated with some regularity in São Paulo, The choice was to keep in the sample only the lines that remained active throughout the observation period.
创建时间:
2023-06-28
二维码
社区交流群
二维码
科研交流群
商业服务