train.csv.zip
收藏DataCite Commons2025-06-01 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/train_csv_zip/24796053/1
下载链接
链接失效反馈官方服务:
资源简介:
We have provided an accurate dataset describing a complete year (from 01/07/2013 to 30/06/2014) of the trajectories for all the 442 taxis running in the city of Porto, in Portugal (i.e. one CSV file named "train.csv"). These taxis operate through a taxi dispatch central, using mobile data terminals installed in the vehicles. We categorize each ride into three categories: A) taxi central based, B) stand-based or C) non-taxi central based. For the first, we provide an anonymized id, when such information is available from the telephone call. The last two categories refer to services that were demanded directly to the taxi drivers on a B) taxi stand or on a C) random street.Each data sample corresponds to one completed trip. It contains a total of<br>9 (nine) features, described as follows:TRIP_ID: (String) It contains an unique identifier for each trip;CALL_TYPE: (char) It identifies the way used to demand this service. It may contain one of three possible values:<br>‘A’ if this trip was dispatched from the central;<br>‘B’ if this trip was demanded directly to a taxi driver on a specific stand;<br>‘C’ otherwise (i.e. a trip demanded on a random street).ORIGIN_CALL: (integer) It contains an unique identifier for each phone number which was used to demand, at least, one service. It identifies the trip’s customer if CALL_TYPE=’A’. Otherwise, it assumes a NULL value;ORIGIN_STAND: (integer): It contains an unique identifier for the taxi stand. It identifies the starting point of the trip if CALL_TYPE=’B’. Otherwise, it assumes a NULL value;TAXI_ID: (integer): It contains an unique identifier for the taxi driver that performed each trip;TIMESTAMP: (integer) Unix Timestamp (in seconds). It identifies the trip’s start;DAYTYPE: (char) It identifies the daytype of the trip’s start. It assumes one of three possible values:<br>‘B’ if this trip started on a holiday or any other special day (i.e. extending holidays, floating holidays, etc.);<br>‘C’ if the trip started on a day before a type-B day;<br>‘A’ otherwise (i.e. a normal day, workday or weekend).MISSING_DATA: (Boolean) It is FALSE when the GPS data stream is complete and TRUE whenever one (or more) locations are missingPOLYLINE: (String): It contains a list of GPS coordinates (i.e. WGS84 format) mapped as a string. The beginning and the end of the string are identified with brackets (i.e. [ and ], respectively). Each pair of coordinates is also identified by the same brackets as [LONGITUDE, LATITUDE]. This list contains one pair of coordinates for each 15 seconds of trip. The last list item corresponds to the trip’s destination while the first one represents its start;The total travel time of the trip (the prediction target of this competition) is defined as the (number of points-1) x 15 seconds. For example, a trip with 101 data points in POLYLINE has a length of (101-1) * 15 = 1500 seconds. Some trips have missing data points in POLYLINE, indicated by MISSING_DATA column, and it is part of the challenge how you utilize this knowledge.AcknowledgementsData from ECML/PKDD 15: Taxi Trip Time Prediction (II) CompetitionInspirationAdded this dataset because competition datasets do not appear in the dataset search and this dataset could help learn basic<br>methods in the area of geo-spatial analysis and trajectory handling
提供机构:
figshare
创建时间:
2023-12-12
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含葡萄牙波尔图市442辆出租车在2013年7月1日至2014年6月30日一整年的完整轨迹数据,以每次行程为样本,共9个特征,包括行程ID、呼叫类型、GPS轨迹等,适用于地理空间分析和轨迹处理方法的学习。
以上内容由遇见数据集搜集并总结生成



