8 共享单车出行需求的社区发现

对于共享单车需求来说，每一次出行都可以看作是从起点到终点的一段过程。当我们把起点和终点看成节点，把它们之间的行程当作边时，就可以构建一个网络。通过分析该网络，我们可以得到有关城市空间连接结构或共享单车需求的宏观出行特征的信息。

社区检测，也称为图分区，帮助我们揭示网络中节点之间的隐藏关系。在这个例子中，我们将介绍如何从共享单车数据将TransBigData集成到社区检测的分析过程中。

要运行这个例子，你可能必须安装``igraph``和``seaborn``：

pip install igraph

pip install seaborn

数据预处理

[1]:

# Fristly, import packages.
import pandas as pd
import numpy as np
import geopandas as gpd
import transbigdata as tbd

[2]:

#Read bicycle sharing data
bikedata = pd.read_csv(r'data/bikedata-sample.csv')
bikedata.head(5)

[2]:

	BIKE_ID	DATA_TIME	LOCK_STATUS	LONGITUDE	LATITUDE
0	5	2018-09-01 0:00:36	1	121.363566	31.259615
1	6	2018-09-01 0:00:50	0	121.406226	31.214436
2	6	2018-09-01 0:03:01	1	121.409402	31.215259
3	6	2018-09-01 0:24:53	0	121.409228	31.214427
4	6	2018-09-01 0:26:38	1	121.409771	31.214406

[3]:

#Read the polygon of the study area
shanghai_admin = gpd.read_file(r'data/shanghai.json')
#delete the data outside of the study area
bikedata = tbd.clean_outofshape(bikedata, shanghai_admin, col=['LONGITUDE', 'LATITUDE'], accuracy=500)

使用``tbd.bikedata_to_od``识别共享单车行程信息

[4]:

move_data,stop_data = tbd.bikedata_to_od(bikedata,
                   col = ['BIKE_ID','DATA_TIME','LONGITUDE','LATITUDE','LOCK_STATUS'])
move_data.head(5)

[4]:

	BIKE_ID	stime	slon	slat	etime	elon	elat
96	6	2018-09-01 0:00:50	121.406226	31.214436	2018-09-01 0:03:01	121.409402	31.215259
561	6	2018-09-01 0:24:53	121.409228	31.214427	2018-09-01 0:26:38	121.409771	31.214406
564	6	2018-09-01 0:50:16	121.409727	31.214403	2018-09-01 0:52:14	121.412610	31.214905
784	6	2018-09-01 0:53:38	121.413333	31.214951	2018-09-01 0:55:38	121.412656	31.217051
1028	6	2018-09-01 11:35:01	121.419261	31.213414	2018-09-01 11:35:13	121.419518	31.213657

[5]:

#Calculate the travel distance
move_data['distance'] = tbd.getdistance(move_data['slon'],move_data['slat'],move_data['elon'],move_data['elat'])
#Remove too long and too short trips
move_data = move_data[(move_data['distance']>100)&(move_data['distance']<10000)]

执行数据栅格化：

[6]:

#obtain gridding params
bounds = (120.85, 30.67, 122.24, 31.87)
params = tbd.grid_params(bounds,accuracy = 500)
#aggregate the travel informations
od_gdf = tbd.odagg_grid(move_data, params, col=['slon', 'slat', 'elon', 'elat'])
od_gdf.head(5)

/opt/anaconda3/envs/transbigdata/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:122: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.
  arr = construct_1d_object_array_from_listlike(values)

[6]:

	SLONCOL	SLATCOL	ELONCOL	ELATCOL	count	SHBLON	SHBLAT	EHBLON	EHBLAT	geometry
0	26	95	26	96	1	120.986782	31.097177	120.986782	31.101674	LINESTRING (120.98678 31.09718, 120.98678 31.1...
40803	117	129	116	127	1	121.465519	31.250062	121.460258	31.241069	LINESTRING (121.46552 31.25006, 121.46026 31.2...
40807	117	129	117	128	1	121.465519	31.250062	121.465519	31.245565	LINESTRING (121.46552 31.25006, 121.46552 31.2...
40810	117	129	117	131	1	121.465519	31.250062	121.465519	31.259055	LINESTRING (121.46552 31.25006, 121.46552 31.2...
40811	117	129	118	126	1	121.465519	31.250062	121.470780	31.236572	LINESTRING (121.46552 31.25006, 121.47078 31.2...

可视化OD数据

[7]:

#Create figure
import matplotlib.pyplot as plt
fig =plt.figure(1,(8,8),dpi=300)
ax =plt.subplot(111)
plt.sca(ax)

#Load basemap
tbd.plot_map(plt,bounds,zoom = 11,style = 8)

#Create colorbar
cax = plt.axes([0.05, 0.33, 0.02, 0.3])
plt.title('Data count')
plt.sca(ax)

#Plot OD
od_gdf.plot(ax = ax,column = 'count',cmap = 'Blues_r',linewidth = 0.5,vmax = 10,cax = cax,legend = True)

#Plot compass and scale
tbd.plotscale(ax,bounds = bounds,textsize = 10,compasssize = 1,textcolor = 'white',accuracy = 2000,rect = [0.06,0.03],zorder = 10)
plt.axis('off')
plt.xlim(bounds[0],bounds[2])
plt.ylim(bounds[1],bounds[3])
plt.show()

../_images/gallery_Example_8-Community_detection_for_bikesharing_data_12_0.png

创建网络

提取节点数据

将“LONCOL”和“LATCOL”列合并到一个字段中并提取节点集

[8]:

#Combine the ``LONCOL`` and ``LATCOL`` columns into one field
od_gdf['S'] = od_gdf['SLONCOL'].astype(str) + ',' + od_gdf['SLATCOL'].astype(str)
od_gdf['E'] = od_gdf['ELONCOL'].astype(str) + ',' + od_gdf['ELATCOL'].astype(str)
#extract node set
node = set(od_gdf['S'])|set(od_gdf['E'])
node = pd.DataFrame(node)
#reindex the node
node['id'] = range(len(node))
node

[8]:

	0	id
0	164,81	0
1	71,125	1
2	102,118	2
3	125,115	3
4	143,76	4
...	...	...
9806	98,167	9806
9807	46,130	9807
9808	118,82	9808
9809	158,57	9809
9810	104,169	9810

9811 rows × 2 columns

提取边数据

[9]:

#Merge the node information to the OD data to extract edge data.
node.columns = ['S','S_id']
od_gdf = pd.merge(od_gdf,node,on = ['S'])
node.columns = ['E','E_id']
od_gdf = pd.merge(od_gdf,node,on = ['E'])
#Extract edge data
edge = od_gdf[['S_id','E_id','count']]
edge

[9]:

	S_id	E_id	count
0	6251	4211	1
1	5879	8676	1
2	8432	8676	3
3	5511	8676	1
4	3386	8676	1
...	...	...	...
68468	5663	5835	2
68469	7738	4266	2
68470	360	8003	2
68471	6759	601	3
68472	6081	3107	3

68473 rows × 3 columns

创建网络

[10]:

import igraph
#Create Network
g = igraph.Graph()
#Add node
g.add_vertices(len(node))
#Add edge
g.add_edges(edge[['S_id','E_id']].values)
#Add weight
edge_weights = edge[['count']].values
for i in range(len(edge_weights)):
    g.es[i]['weight'] = edge_weights[i]

社区检测

[11]:

#Community detection
g_clustered = g.community_multilevel(weights = edge_weights, return_levels=False)

[12]:

#Modularity
g_clustered.modularity

[12]:

0.8496074605497185

[13]:

#Assign the group result to the node
node['group'] = g_clustered.membership
#rename the columns
node.columns = ['grid','node_id','group']
node

[13]:

	grid	node_id	group
0	164,81	0	0
1	71,125	1	1
2	102,118	2	2
3	125,115	3	3
4	143,76	4	4
...	...	...	...
9806	98,167	9806	9
9807	46,130	9807	555
9808	118,82	9808	6
9809	158,57	9809	132
9810	104,169	9810	86

9811 rows × 3 columns

可视化社区

[14]:

#Count the number of grids per community
group = node['group'].value_counts()
#Extract communities with more than 10 grids
group = group[group>10]
#Retain only these community grids
node = node[node['group'].apply(lambda r:r in group.index)]

#Get the grid number
node['LONCOL'] = node['grid'].apply(lambda r:r.split(',')[0]).astype(int)
node['LATCOL'] = node['grid'].apply(lambda r:r.split(',')[1]).astype(int)
#Generate the geometry
node['geometry'] = tbd.gridid_to_polygon(node['LONCOL'],node['LATCOL'],params)
#Change it into GeoDataFrame
import geopandas as gpd
node = gpd.GeoDataFrame(node)
node

/var/folders/b0/q8rx9fj965b5p7yqq8zhvdx80000gn/T/ipykernel_30130/418053260.py:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  node['LONCOL'] = node['grid'].apply(lambda r:r.split(',')[0]).astype(int)
/var/folders/b0/q8rx9fj965b5p7yqq8zhvdx80000gn/T/ipykernel_30130/418053260.py:10: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  node['LATCOL'] = node['grid'].apply(lambda r:r.split(',')[1]).astype(int)
/var/folders/b0/q8rx9fj965b5p7yqq8zhvdx80000gn/T/ipykernel_30130/418053260.py:12: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  node['geometry'] = tbd.gridid_to_polygon(node['LONCOL'],node['LATCOL'],params)

[14]:

	grid	node_id	group	LONCOL	LATCOL	geometry
1	71,125	1	1	71	125	POLYGON ((121.22089 31.22983, 121.22615 31.229...
2	102,118	2	2	102	118	POLYGON ((121.38398 31.19835, 121.38924 31.198...
3	125,115	3	3	125	115	POLYGON ((121.50498 31.18486, 121.51024 31.184...
4	143,76	4	4	143	76	POLYGON ((121.59967 31.00949, 121.60493 31.009...
5	142,87	5	4	142	87	POLYGON ((121.59441 31.05896, 121.59967 31.058...
...	...	...	...	...	...	...
9802	103,103	9802	8	103	103	POLYGON ((121.38924 31.13090, 121.39450 31.130...
9803	162,133	9803	28	162	133	POLYGON ((121.69963 31.26580, 121.70489 31.265...
9804	107,130	9804	41	107	130	POLYGON ((121.41028 31.25231, 121.41554 31.252...
9806	98,167	9806	9	98	167	POLYGON ((121.36293 31.41868, 121.36819 31.418...
9808	118,82	9808	6	118	82	POLYGON ((121.46815 31.03647, 121.47341 31.036...

8522 rows × 6 columns

[15]:

node.plot('group')

[15]:

<AxesSubplot:>

../_images/gallery_Example_8-Community_detection_for_bikesharing_data_27_1.png

[16]:

#Use the group column to merge polygon
node_community = tbd.merge_polygon(node,'group')
#Input polygon GeoDataFrame data, take the exterior boundary of the polygon to form a new polygon
node_community = tbd.polyon_exterior(node_community,minarea = 0.000100)

/opt/anaconda3/envs/transbigdata/lib/python3.9/site-packages/transbigdata/gisprocess.py:205: ShapelyDeprecationWarning: Iteration over multi-part geometries is deprecated and will be removed in Shapely 2.0. Use the `geoms` property to access the constituent parts of a multi-part geometry.
  for i in p:

[17]:

#Generate palette
import seaborn as sns
## l: Luminance
## s: Saturation
cmap = sns.hls_palette(n_colors=len(node_community), l=.7, s=0.8)
sns.palplot(cmap)

../_images/gallery_Example_8-Community_detection_for_bikesharing_data_29_0.png

[19]:

#Create figure
import matplotlib.pyplot as plt
fig =plt.figure(1,(8,8),dpi=300)
ax =plt.subplot(111)
plt.sca(ax)
#Load basemap
tbd.plot_map(plt,bounds,zoom = 10,style = 6)
#Set colormap
from matplotlib.colors import ListedColormap
#Disrupting the order of the community
node_community = node_community.sample(frac=1)
#Plot community
node_community.plot(cmap = ListedColormap(cmap),ax = ax,edgecolor = '#333',alpha = 0.8)
#Add scale
tbd.plotscale(ax,bounds = bounds,textsize = 10,compasssize = 1,textcolor = 'k'
              ,accuracy = 2000,rect = [0.06,0.03],zorder = 10)
plt.axis('off')
plt.xlim(bounds[0],bounds[2])
plt.ylim(bounds[1],bounds[3])
plt.show()

../_images/gallery_Example_8-Community_detection_for_bikesharing_data_30_0.png