8 Community detection for bicycle-sharing demand

For bicycle sharing demand, each trip of can be seen as a process from the starting loaction to the end loaction. When we regard the start point and the end point as nodes, and the travel between them as edges, a network can be constructed. By analysing this network, we can get information about the spatial connection structure of the city or the macro travel characteristics of the bicycle sharing demand.

Community detection, also called graph partition, helps us to reveal the hidden relations among the nodes in the network. In this example, we will introduce how to integrate TransBigData into the analysis process of community detection from bicycle-sharing data.

To run this example, you may have to install igraph and seaborn:

pip install igraph

pip install seaborn

Data preprocessing

[1]:

# Fristly, import packages.
import pandas as pd
import numpy as np
import geopandas as gpd
import transbigdata as tbd

[2]:

#Read bicycle sharing data
bikedata = pd.read_csv(r'data/bikedata-sample.csv')
bikedata.head(5)

[2]:

	BIKE_ID	DATA_TIME	LOCK_STATUS	LONGITUDE	LATITUDE
0	5	2018-09-01 0:00:36	1	121.363566	31.259615
1	6	2018-09-01 0:00:50	0	121.406226	31.214436
2	6	2018-09-01 0:03:01	1	121.409402	31.215259
3	6	2018-09-01 0:24:53	0	121.409228	31.214427
4	6	2018-09-01 0:26:38	1	121.409771	31.214406

[3]:

#Read the polygon of the study area
shanghai_admin = gpd.read_file(r'data/shanghai.json')
#delete the data outside of the study area
bikedata = tbd.clean_outofshape(bikedata, shanghai_admin, col=['LONGITUDE', 'LATITUDE'], accuracy=500)

Identify Bicycle sharing trip information using tbd.bikedata_to_od

[4]:

move_data,stop_data = tbd.bikedata_to_od(bikedata,
                   col = ['BIKE_ID','DATA_TIME','LONGITUDE','LATITUDE','LOCK_STATUS'])
move_data.head(5)

[4]:

	BIKE_ID	stime	slon	slat	etime	elon	elat
96	6	2018-09-01 0:00:50	121.406226	31.214436	2018-09-01 0:03:01	121.409402	31.215259
561	6	2018-09-01 0:24:53	121.409228	31.214427	2018-09-01 0:26:38	121.409771	31.214406
564	6	2018-09-01 0:50:16	121.409727	31.214403	2018-09-01 0:52:14	121.412610	31.214905
784	6	2018-09-01 0:53:38	121.413333	31.214951	2018-09-01 0:55:38	121.412656	31.217051
1028	6	2018-09-01 11:35:01	121.419261	31.213414	2018-09-01 11:35:13	121.419518	31.213657

[5]:

#Calculate the travel distance
move_data['distance'] = tbd.getdistance(move_data['slon'],move_data['slat'],move_data['elon'],move_data['elat'])
#Remove too long and too short trips
move_data = move_data[(move_data['distance']>100)&(move_data['distance']<10000)]

Perform data gridding:

[6]:

#obtain gridding params
bounds = (120.85, 30.67, 122.24, 31.87)
params = tbd.grid_params(bounds,accuracy = 500)
#aggregate the travel informations
od_gdf = tbd.odagg_grid(move_data, params, col=['slon', 'slat', 'elon', 'elat'])
od_gdf.head(5)

/opt/anaconda3/envs/transbigdata/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:122: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.
  arr = construct_1d_object_array_from_listlike(values)

[6]:

	SLONCOL	SLATCOL	ELONCOL	ELATCOL	count	SHBLON	SHBLAT	EHBLON	EHBLAT	geometry
0	26	95	26	96	1	120.986782	31.097177	120.986782	31.101674	LINESTRING (120.98678 31.09718, 120.98678 31.1...
40803	117	129	116	127	1	121.465519	31.250062	121.460258	31.241069	LINESTRING (121.46552 31.25006, 121.46026 31.2...
40807	117	129	117	128	1	121.465519	31.250062	121.465519	31.245565	LINESTRING (121.46552 31.25006, 121.46552 31.2...
40810	117	129	117	131	1	121.465519	31.250062	121.465519	31.259055	LINESTRING (121.46552 31.25006, 121.46552 31.2...
40811	117	129	118	126	1	121.465519	31.250062	121.470780	31.236572	LINESTRING (121.46552 31.25006, 121.47078 31.2...

Visualize the OD data

[7]:

#Create figure
import matplotlib.pyplot as plt
fig =plt.figure(1,(8,8),dpi=300)
ax =plt.subplot(111)
plt.sca(ax)

#Load basemap
tbd.plot_map(plt,bounds,zoom = 11,style = 8)

#Create colorbar
cax = plt.axes([0.05, 0.33, 0.02, 0.3])
plt.title('Data count')
plt.sca(ax)

#Plot OD
od_gdf.plot(ax = ax,column = 'count',cmap = 'Blues_r',linewidth = 0.5,vmax = 10,cax = cax,legend = True)

#Plot compass and scale
tbd.plotscale(ax,bounds = bounds,textsize = 10,compasssize = 1,textcolor = 'white',accuracy = 2000,rect = [0.06,0.03],zorder = 10)
plt.axis('off')
plt.xlim(bounds[0],bounds[2])
plt.ylim(bounds[1],bounds[3])
plt.show()

../_images/gallery_Example_8-Community_detection_for_bikesharing_data_12_0.png

Create Network

Extract node data

Combine the LONCOL and LATCOL columns into one field and extract node set

[8]:

#Combine the ``LONCOL`` and ``LATCOL`` columns into one field
od_gdf['S'] = od_gdf['SLONCOL'].astype(str) + ',' + od_gdf['SLATCOL'].astype(str)
od_gdf['E'] = od_gdf['ELONCOL'].astype(str) + ',' + od_gdf['ELATCOL'].astype(str)
#extract node set
node = set(od_gdf['S'])|set(od_gdf['E'])
node = pd.DataFrame(node)
#reindex the node
node['id'] = range(len(node))
node

[8]:

	0	id
0	164,81	0
1	71,125	1
2	102,118	2
3	125,115	3
4	143,76	4
...	...	...
9806	98,167	9806
9807	46,130	9807
9808	118,82	9808
9809	158,57	9809
9810	104,169	9810

9811 rows × 2 columns

Extract edge data

[9]:

#Merge the node information to the OD data to extract edge data.
node.columns = ['S','S_id']
od_gdf = pd.merge(od_gdf,node,on = ['S'])
node.columns = ['E','E_id']
od_gdf = pd.merge(od_gdf,node,on = ['E'])
#Extract edge data
edge = od_gdf[['S_id','E_id','count']]
edge

[9]:

	S_id	E_id	count
0	6251	4211	1
1	5879	8676	1
2	8432	8676	3
3	5511	8676	1
4	3386	8676	1
...	...	...	...
68468	5663	5835	2
68469	7738	4266	2
68470	360	8003	2
68471	6759	601	3
68472	6081	3107	3

68473 rows × 3 columns

Create Network

[10]:

import igraph
#Create Network
g = igraph.Graph()
#Add node
g.add_vertices(len(node))
#Add edge
g.add_edges(edge[['S_id','E_id']].values)
#Add weight
edge_weights = edge[['count']].values
for i in range(len(edge_weights)):
    g.es[i]['weight'] = edge_weights[i]

Community detection

[11]:

#Community detection
g_clustered = g.community_multilevel(weights = edge_weights, return_levels=False)

[12]:

#Modularity
g_clustered.modularity

[12]:

0.8496074605497185

[13]:

#Assign the group result to the node
node['group'] = g_clustered.membership
#rename the columns
node.columns = ['grid','node_id','group']
node

[13]:

	grid	node_id	group
0	164,81	0	0
1	71,125	1	1
2	102,118	2	2
3	125,115	3	3
4	143,76	4	4
...	...	...	...
9806	98,167	9806	9
9807	46,130	9807	555
9808	118,82	9808	6
9809	158,57	9809	132
9810	104,169	9810	86

9811 rows × 3 columns

Visualize the community

[14]:

#Count the number of grids per community
group = node['group'].value_counts()
#Extract communities with more than 10 grids
group = group[group>10]
#Retain only these community grids
node = node[node['group'].apply(lambda r:r in group.index)]

#Get the grid number
node['LONCOL'] = node['grid'].apply(lambda r:r.split(',')[0]).astype(int)
node['LATCOL'] = node['grid'].apply(lambda r:r.split(',')[1]).astype(int)
#Generate the geometry
node['geometry'] = tbd.gridid_to_polygon(node['LONCOL'],node['LATCOL'],params)
#Change it into GeoDataFrame
import geopandas as gpd
node = gpd.GeoDataFrame(node)
node

/var/folders/b0/q8rx9fj965b5p7yqq8zhvdx80000gn/T/ipykernel_30130/418053260.py:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  node['LONCOL'] = node['grid'].apply(lambda r:r.split(',')[0]).astype(int)
/var/folders/b0/q8rx9fj965b5p7yqq8zhvdx80000gn/T/ipykernel_30130/418053260.py:10: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  node['LATCOL'] = node['grid'].apply(lambda r:r.split(',')[1]).astype(int)
/var/folders/b0/q8rx9fj965b5p7yqq8zhvdx80000gn/T/ipykernel_30130/418053260.py:12: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  node['geometry'] = tbd.gridid_to_polygon(node['LONCOL'],node['LATCOL'],params)

[14]:

	grid	node_id	group	LONCOL	LATCOL	geometry
1	71,125	1	1	71	125	POLYGON ((121.22089 31.22983, 121.22615 31.229...
2	102,118	2	2	102	118	POLYGON ((121.38398 31.19835, 121.38924 31.198...
3	125,115	3	3	125	115	POLYGON ((121.50498 31.18486, 121.51024 31.184...
4	143,76	4	4	143	76	POLYGON ((121.59967 31.00949, 121.60493 31.009...
5	142,87	5	4	142	87	POLYGON ((121.59441 31.05896, 121.59967 31.058...
...	...	...	...	...	...	...
9802	103,103	9802	8	103	103	POLYGON ((121.38924 31.13090, 121.39450 31.130...
9803	162,133	9803	28	162	133	POLYGON ((121.69963 31.26580, 121.70489 31.265...
9804	107,130	9804	41	107	130	POLYGON ((121.41028 31.25231, 121.41554 31.252...
9806	98,167	9806	9	98	167	POLYGON ((121.36293 31.41868, 121.36819 31.418...
9808	118,82	9808	6	118	82	POLYGON ((121.46815 31.03647, 121.47341 31.036...

8522 rows × 6 columns

[15]:

node.plot('group')

[15]:

<AxesSubplot:>

../_images/gallery_Example_8-Community_detection_for_bikesharing_data_27_1.png

[16]:

#Use the group column to merge polygon
node_community = tbd.merge_polygon(node,'group')
#Input polygon GeoDataFrame data, take the exterior boundary of the polygon to form a new polygon
node_community = tbd.polyon_exterior(node_community,minarea = 0.000100)

/opt/anaconda3/envs/transbigdata/lib/python3.9/site-packages/transbigdata/gisprocess.py:205: ShapelyDeprecationWarning: Iteration over multi-part geometries is deprecated and will be removed in Shapely 2.0. Use the `geoms` property to access the constituent parts of a multi-part geometry.
  for i in p:

[17]:

#Generate palette
import seaborn as sns
## l: Luminance
## s: Saturation
cmap = sns.hls_palette(n_colors=len(node_community), l=.7, s=0.8)
sns.palplot(cmap)

../_images/gallery_Example_8-Community_detection_for_bikesharing_data_29_0.png

[19]:

#Create figure
import matplotlib.pyplot as plt
fig =plt.figure(1,(8,8),dpi=300)
ax =plt.subplot(111)
plt.sca(ax)
#Load basemap
tbd.plot_map(plt,bounds,zoom = 10,style = 6)
#Set colormap
from matplotlib.colors import ListedColormap
#Disrupting the order of the community
node_community = node_community.sample(frac=1)
#Plot community
node_community.plot(cmap = ListedColormap(cmap),ax = ax,edgecolor = '#333',alpha = 0.8)
#Add scale
tbd.plotscale(ax,bounds = bounds,textsize = 10,compasssize = 1,textcolor = 'k'
              ,accuracy = 2000,rect = [0.06,0.03],zorder = 10)
plt.axis('off')
plt.xlim(bounds[0],bounds[2])
plt.ylim(bounds[1],bounds[3])
plt.show()

../_images/gallery_Example_8-Community_detection_for_bikesharing_data_30_0.png