1 出租车GPS数据处理与可视化

出租车GPS数据处理

在这个例子中,我们将介绍如何使用TransBigData包来有效地处理出租车GPS数据。
首先,导入TransBigData并使用pandas读取数据
[1]:
import transbigdata as tbd
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

# Read data
data = pd.read_csv('data/TaxiData-Sample.csv', header=None)
data.columns = ['VehicleNum', 'Time', 'Lng', 'Lat', 'OpenStatus', 'Speed']
data.head()
[1]:
VehicleNum Time Lng Lat OpenStatus Speed
0 34745 20:27:43 113.806847 22.623249 1 27
1 34745 20:24:07 113.809898 22.627399 0 0
2 34745 20:24:27 113.809898 22.627399 0 0
3 34745 20:22:07 113.811348 22.628067 0 0
4 34745 20:10:06 113.819885 22.647800 0 54
[2]:
# Read the GeoDataFrame of the study area
sz = gpd.read_file(r'data/sz.json')
sz.crs = None
sz.head()
[2]:
centroid_x centroid_y qh geometry
0 114.143157 22.577605 罗湖 POLYGON ((114.10006 22.53431, 114.10083 22.534...
1 114.041535 22.546180 福田 POLYGON ((113.98578 22.51348, 114.00553 22.513...
2 114.270206 22.596432 盐田 POLYGON ((114.19799 22.55673, 114.19817 22.556...
3 113.851387 22.679120 宝安 MULTIPOLYGON (((113.81831 22.54676, 113.81948 ...
4 113.926290 22.766157 光明 POLYGON ((113.99768 22.76643, 113.99704 22.766...
[3]:
fig = plt.figure(1, (8, 3), dpi=150)
ax1 = plt.subplot(111)
sz.plot(ax=ax1)
plt.xticks([], fontsize=10)
plt.yticks([], fontsize=10);
../_images/gallery_Example_1-Taxi_GPS_data_processing_5_0.png

数据预处理

TransBigData集成了几种常见的数据预处理方法。使用“tbd.clean_outofshape”方法,给定研究区域的数据和地理数据帧,可以删除研究区域之外的数据。“tbd.clean_taxi_status”方法可以过滤掉乘客状态瞬时变化的数据(OpenStatus)。使用预处理方法时,需要将相应的列名作为参数传入:

[4]:
# Data Preprocessing
# Delete the data outside of the study area
data = tbd.clean_outofshape(data, sz, col=['Lng', 'Lat'], accuracy=500)

# Delete the data with instantaneous changes in passenger status
data = tbd.clean_taxi_status(data, col=['VehicleNum', 'Time', 'OpenStatus'])

数据网格化

表示数据分布的最基本方法是以地理网格的形式;数据网格化后,每个GPS数据点映射到相应的网格。对于数据网格化,首先需要确定格网参数(可以解释为定义格网坐标系):

[5]:
# Data gridding
# Define the bounds and generate gridding parameters
bounds = [113.6, 22.4, 114.8, 22.9]
params = tbd.area_to_params(bounds, accuracy=500)
print(params)
{'slon': 113.6, 'slat': 22.4, 'deltalon': 0.004872390756896538, 'deltalat': 0.004496605206422906, 'theta': 0, 'method': 'rect', 'gridsize': 500}

获得网格参数后,下一步是将GPS映射到其对应的网格。使用“待定”。GPS_to_grids“,它将生成”LONCOL“列和”LATCOL“列。这两列一起可以指定一个网格:

[6]:
# Mapping GPS data to grids
data['LONCOL'], data['LATCOL'] = tbd.GPS_to_grid(data['Lng'], data['Lat'], params)
data.head()
[6]:
VehicleNum Time Lng Lat OpenStatus Speed LONCOL LATCOL
0 34745 20:27:43 113.806847 22.623249 1 27 42 50
1 27368 09:08:53 113.805893 22.624996 0 49 42 50
2 22998 10:51:10 113.806931 22.624166 1 54 42 50
3 22998 10:11:50 113.805946 22.625433 0 43 42 50
4 22998 10:12:05 113.806381 22.623833 0 60 42 50

统计每个格子的数据量:

[7]:
# Aggregate data into grids
datatest = data.groupby(['LONCOL', 'LATCOL'])['VehicleNum'].count().reset_index()
datatest.head()
[7]:
LONCOL LATCOL VehicleNum
0 36 63 2
1 36 66 1
2 36 67 8
3 37 62 9
4 37 63 8

生成网格的几何图形并将其转换为 GeoDataFrame:

[8]:
# Generate the geometry for grids
datatest['geometry'] = tbd.grid_to_polygon([datatest['LONCOL'], datatest['LATCOL']], params)

# Change it into GeoDataFrame
# import geopandas as gpd
datatest = gpd.GeoDataFrame(datatest)
datatest.head()
[8]:
LONCOL LATCOL VehicleNum geometry
0 36 63 2 POLYGON ((113.77297 22.68104, 113.77784 22.681...
1 36 66 1 POLYGON ((113.77297 22.69453, 113.77784 22.694...
2 36 67 8 POLYGON ((113.77297 22.69902, 113.77784 22.699...
3 37 62 9 POLYGON ((113.77784 22.67654, 113.78271 22.676...
4 37 63 8 POLYGON ((113.77784 22.68104, 113.78271 22.681...

绘制生成的网格:

[9]:
# Plot the grids
fig = plt.figure(1, (16, 6), dpi=300)
ax1 = plt.subplot(111)

# tbd.plot_map(plt, bounds, zoom=10, style=4)
datatest.plot(ax=ax1, column='VehicleNum', legend=True)
plt.xticks([], fontsize=10)
plt.yticks([], fontsize=10)
plt.title('Counting of Taxi GPS Trajectory Points', fontsize=12);
../_images/gallery_Example_1-Taxi_GPS_data_processing_19_0.png
[10]:
# Plot the grids
fig = plt.figure(1, (16, 6), dpi=300) # 确定图形高为6,宽为8;图形清晰度
ax1 = plt.subplot(111)
datatest.plot(ax=ax1, column='VehicleNum', legend=True, scheme='quantiles')
# plt.legend(fontsize=10)
plt.xticks([], fontsize=10)
plt.yticks([], fontsize=10)
plt.title('Counting of Taxi GPS Trajectory Points', fontsize=12);
../_images/gallery_Example_1-Taxi_GPS_data_processing_20_0.png
[11]:
# Plot the grids
fig = plt.figure(1, (16, 6), dpi=150) # 确定图形高为6,宽为8;图形清晰度
ax1 = plt.subplot(111)
datatest.plot(ax=ax1, column='VehicleNum', legend=True, cmap='OrRd', scheme='quantiles')
# plt.legend(fontsize=10)
plt.xticks([], fontsize=10)
plt.yticks([], fontsize=10)
plt.title('Counting of Taxi GPS Trajectory Points', fontsize=12);
../_images/gallery_Example_1-Taxi_GPS_data_processing_21_0.png

Origin-destination(OD) 提取和聚合出租车行程

使用“tbd.taxigps_to_od”方法并传入相应的列名以提取出租车行程 OD:

[12]:
# Extract taxi OD from GPS data
oddata = tbd.taxigps_to_od(data,col = ['VehicleNum', 'Time', 'Lng', 'Lat', 'OpenStatus'])
oddata
[12]:
VehicleNum stime slon slat etime elon elat ID
427075 22396 00:19:41 114.013016 22.664818 00:23:01 114.021400 22.663918 0
131301 22396 00:41:51 114.021767 22.640200 00:43:44 114.026070 22.640266 1
417417 22396 00:45:44 114.028099 22.645082 00:47:44 114.030380 22.650017 2
376160 22396 01:08:26 114.034897 22.616301 01:16:34 114.035614 22.646717 3
21768 22396 01:26:06 114.046021 22.641251 01:34:48 114.066048 22.636183 4
... ... ... ... ... ... ... ... ...
57666 36805 22:37:42 114.113403 22.534767 22:48:01 114.114365 22.550632 5332
175519 36805 22:49:12 114.114365 22.550632 22:50:40 114.115501 22.557983 5333
212092 36805 22:52:07 114.115402 22.558083 23:03:27 114.118484 22.547867 5334
119041 36805 23:03:45 114.118484 22.547867 23:20:09 114.133286 22.617750 5335
224103 36805 23:36:19 114.112968 22.549601 23:43:12 114.089485 22.538918 5336

5337 rows × 8 columns

聚合提取的OD,生成LineString GeoDataFrame

[13]:
# Gridding and aggragate data
od_gdf = tbd.odagg_grid(oddata, params)
od_gdf.head()
/opt/anaconda3/lib/python3.8/site-packages/pandas/core/dtypes/cast.py:91: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.
  values = construct_1d_object_array_from_listlike(values)
[13]:
SLONCOL SLATCOL ELONCOL ELATCOL count SHBLON SHBLAT EHBLON EHBLAT geometry
0 40 62 45 68 1 113.794896 22.678790 113.819258 22.705769 LINESTRING (113.79490 22.67879, 113.81926 22.7...
3331 101 36 86 29 1 114.092111 22.561878 114.019026 22.530402 LINESTRING (114.09211 22.56188, 114.01903 22.5...
3330 101 35 105 30 1 114.092111 22.557381 114.111601 22.534898 LINESTRING (114.09211 22.55738, 114.11160 22.5...
3329 101 34 109 34 1 114.092111 22.552885 114.131091 22.552885 LINESTRING (114.09211 22.55288, 114.13109 22.5...
3328 101 34 103 34 1 114.092111 22.552885 114.101856 22.552885 LINESTRING (114.09211 22.55288, 114.10186 22.5...
[14]:
# Plot the grids
fig = plt.figure(1, (16, 6), dpi=150) # 确定图形高为6,宽为8;图形清晰度
ax1 = plt.subplot(111)
# data_grid_count.plot(ax=ax1, column='VehicleNum', legend=True, cmap='OrRd', scheme='quantiles')

od_gdf.plot(ax=ax1, column='count', legend=True, scheme='quantiles')
plt.xticks([], fontsize=10)
plt.yticks([], fontsize=10)
plt.title('OD Trips', fontsize=12);
/opt/anaconda3/lib/python3.8/site-packages/mapclassify/classifiers.py:238: UserWarning: Warning: Not enough unique values in array to form k classes
  Warn(
/opt/anaconda3/lib/python3.8/site-packages/mapclassify/classifiers.py:241: UserWarning: Warning: setting k to 2
  Warn("Warning: setting k to %d" % k_q, UserWarning)
../_images/gallery_Example_1-Taxi_GPS_data_processing_27_1.png

将OD聚合成多边形

``TransBigData``也提供了OD聚合成多边形的方法

[15]:
# Aggragate OD data to polygons
# without passing gridding parameters, the algorithm will map the data
# to polygons directly using their coordinates
od_gdf = tbd.odagg_shape(oddata, sz, round_accuracy=6)
fig = plt.figure(1, (16, 6), dpi=150) # 确定图形高为6,宽为8;图形清晰度
ax1 = plt.subplot(111)
od_gdf.plot(ax=ax1, column='count')
plt.xticks([], fontsize=10)
plt.yticks([], fontsize=10)
plt.title('OD Trips', fontsize=12);
/opt/anaconda3/lib/python3.8/site-packages/pandas/core/dtypes/cast.py:91: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.
  values = construct_1d_object_array_from_listlike(values)
../_images/gallery_Example_1-Taxi_GPS_data_processing_30_1.png

基于Matplotlib的地图绘制

“TransBigData”还在 matplotlib 中提供底图加载。在使用此方法之前,您需要设置底图的mapboxtoken和存储位置,请参阅:“此链接<https://transbigdata.readthedocs.io/en/latest/plot_map.html>”__。“tbd.plot_map”添加底图,tbd.plotscale 添加比例和指南针:

[16]:
# Create figure
fig = plt.figure(1, (10, 10), dpi=300)
ax = plt.subplot(111)
plt.sca(ax)

# Load basemap
tbd.plot_map(plt, bounds, zoom=12, style=4)

# Define an ax for colorbar
cax = plt.axes([0.05, 0.33, 0.02, 0.3])
plt.title('OD\nMatrix')
plt.sca(ax)

# Plot the OD
od_gdf.plot(ax=ax, vmax=100, column='count', cax=cax, legend=True)

# Plot the polygons
sz.plot(ax=ax, edgecolor=(0, 0, 0, 1), facecolor=(0, 0, 0, 0.2), linewidths=0.5)

# Add compass and scale
tbd.plotscale(ax, bounds=bounds, textsize=10, compasssize=1, accuracy=2000, rect=[0.06, 0.03], zorder=10)
plt.axis('off')
plt.xlim(bounds[0], bounds[2])
plt.ylim(bounds[1], bounds[3])
plt.show()
../_images/gallery_Example_1-Taxi_GPS_data_processing_33_0.png

提取出租车轨迹

采用“tbd.taxigps_traj_point”法,输入GPS数据和OD数据,可提取轨迹点

[17]:
data_deliver, data_idle = tbd.taxigps_traj_point(data,oddata,col=['VehicleNum',
                                                                  'Time',
                                                                  'Lng',
                                                                  'Lat',
                                                                  'OpenStatus'])
[18]:
data_deliver.head()
[18]:
VehicleNum Time Lng Lat OpenStatus Speed LONCOL LATCOL ID flag
427075 22396 00:19:41 114.013016 22.664818 1 63.0 85.0 59.0 0.0 1.0
427085 22396 00:19:49 114.014030 22.665483 1 55.0 85.0 59.0 0.0 1.0
416622 22396 00:21:01 114.018898 22.662500 1 1.0 86.0 58.0 0.0 1.0
427480 22396 00:21:41 114.019348 22.662300 1 7.0 86.0 58.0 0.0 1.0
416623 22396 00:22:21 114.020615 22.663366 1 0.0 86.0 59.0 0.0 1.0
[19]:
data_idle.head()
[19]:
VehicleNum Time Lng Lat OpenStatus Speed LONCOL LATCOL ID flag
416628 22396 00:23:01 114.021400 22.663918 0 25.0 86.0 59.0 0.0 0.0
401744 22396 00:25:01 114.027115 22.662100 0 25.0 88.0 58.0 0.0 0.0
394630 22396 00:25:41 114.024551 22.659834 0 21.0 87.0 58.0 0.0 0.0
394671 22396 00:26:21 114.022797 22.658367 0 0.0 87.0 57.0 0.0 0.0
394672 22396 00:26:29 114.022797 22.658367 0 0.0 87.0 57.0 0.0 0.0

从轨迹点生成投放和空闲轨迹

[20]:
traj_deliver = tbd.points_to_traj(data_deliver)
traj_deliver.plot();
../_images/gallery_Example_1-Taxi_GPS_data_processing_40_0.png
[21]:
traj_idle = tbd.points_to_traj(data_idle[data_idle['OpenStatus'] == 0])
traj_idle.plot()
[21]:
<AxesSubplot:>
../_images/gallery_Example_1-Taxi_GPS_data_processing_41_1.png

轨迹可视化

“TransBigData”的内置可视化功能利用可视化包“keplergl”,使用简单的代码以交互方式可视化Jupyter笔记本上的数据。要使用此方法,请为 python 安装 ‘’keplergl’ 包:

pip 安装 keplergl

详细信息请参阅“此链接<https://transbigdata.readthedocs.io/en/latest/visualization.html>”__

轨迹数据可视化:

[22]:
tbd.visualization_trip(data_deliver)
Processing trajectory data...
Generate visualization...
User Guide: https://docs.kepler.gl/docs/keplergl-jupyter