6 Mobile phone data processing

In this example, we will introduce how to use the TransBigData to process mobile phone data.
Firstly, import the TransBigData and read the data using pandas
[1]:
import pandas as pd
import transbigdata as tbd

data = pd.read_csv(r'data/mobiledata_sample.csv')
#make sure the time column is correct
data['stime'] = pd.to_datetime(data['stime'], format='%Y%m%d%H%M')
data = data.sort_values(by = ['user_id','stime'])
data.head()
[1]:
user_id stime longitude latitude date
78668 00466ab30de56db7efbd04991b680ae1 2018-06-01 00:00:00 121.43 30.175 20180601
78669 00466ab30de56db7efbd04991b680ae1 2018-06-01 03:35:00 121.43 30.175 20180601
78670 00466ab30de56db7efbd04991b680ae1 2018-06-01 04:25:00 121.43 30.175 20180601
78671 00466ab30de56db7efbd04991b680ae1 2018-06-01 05:15:00 121.43 30.175 20180601
78289 00466ab30de56db7efbd04991b680ae1 2018-06-01 06:05:00 121.43 30.175 20180601

Identify stay and move infomation from mobile phone trajectory data

When processing mobile phone data, TransBigData’s approach is to first correspond the data to the grids and treat the data within the same grid as being at the same location to avoid data positioning errors that cause the same location to be identified as multiple.

[3]:
#Obtain gridding parameters
params = tbd.area_to_params([121.860, 29.295, 121.862, 29.301], accuracy=500)
#Identify stay and move infomation from mobile phone trajectory data
stay,move = tbd.traj_stay_move(data,params,col = ['user_id','stime','longitude', 'latitude'])
[4]:
stay.head()
[4]:
user_id stime LONCOL LATCOL etime lon lat duration stayid
78668 00466ab30de56db7efbd04991b680ae1 2018-06-01 00:00:00 -83 196 2018-06-01 07:21:00 121.430 30.175 26460.0 0
78303 00466ab30de56db7efbd04991b680ae1 2018-06-01 07:36:00 -81 191 2018-06-01 10:38:00 121.444 30.152 10920.0 1
78364 00466ab30de56db7efbd04991b680ae1 2018-06-01 10:38:00 -81 191 2018-06-01 12:02:00 121.444 30.152 5040.0 2
78399 00466ab30de56db7efbd04991b680ae1 2018-06-01 12:20:00 -83 196 2018-06-01 13:04:00 121.430 30.175 2640.0 3
78471 00466ab30de56db7efbd04991b680ae1 2018-06-01 14:34:00 -60 189 2018-06-01 16:06:00 121.551 30.143 5520.0 4
[5]:
move.head()
[5]:
user_id SLONCOL SLATCOL stime slon slat etime elon elat ELONCOL ELATCOL duration moveid
78668 00466ab30de56db7efbd04991b680ae1 -83 196 2018-06-01 00:00:00 121.430 30.175 2018-06-01 00:00:00 121.430 30.175 -83.0 196.0 0.0 0
78668 00466ab30de56db7efbd04991b680ae1 -83 196 2018-06-01 07:21:00 121.430 30.175 2018-06-01 07:36:00 121.444 30.152 -81.0 191.0 900.0 1
78303 00466ab30de56db7efbd04991b680ae1 -81 191 2018-06-01 10:38:00 121.444 30.152 2018-06-01 10:38:00 121.444 30.152 -81.0 191.0 0.0 2
78364 00466ab30de56db7efbd04991b680ae1 -81 191 2018-06-01 12:02:00 121.444 30.152 2018-06-01 12:20:00 121.430 30.175 -83.0 196.0 1080.0 3
78399 00466ab30de56db7efbd04991b680ae1 -83 196 2018-06-01 13:04:00 121.430 30.175 2018-06-01 14:34:00 121.551 30.143 -60.0 189.0 5400.0 4

Home and work place identify

[6]:
#Identify home location
home = tbd.mobile_identify_home(stay, col=['user_id','stime', 'etime','LONCOL', 'LATCOL','lon','lat'], start_hour=8, end_hour=20 )
home.head()
[6]:
user_id LONCOL LATCOL lon lat
2036 fcc3a9e9df361667e00ee5c16cb08922 -147 292 121.103 30.610
2019 f71e9d7d78e6f5bc9539d141e3a5a1c4 -216 330 120.745 30.778
2001 f6b65495b63574c2eb73c7e63ae38252 -225 -286 120.699 28.011
1982 f1f4224a60da630a0b83b3a231022123 102 157 122.387 30.000
1942 e96739aedb70a8e5c4efe4c488934b43 -223 278 120.708 30.546
[7]:
#Identify work location
work = tbd.mobile_identify_work(stay, col=['user_id', 'stime', 'etime', 'LONCOL', 'LATCOL','lon','lat'], minhour=3, start_hour=8, end_hour=20,workdaystart=0, workdayend=4)
work.head()
[7]:
user_id LONCOL LATCOL lon lat
0 fcc3a9e9df361667e00ee5c16cb08922 -148 292 121.097 30.610
1 f71e9d7d78e6f5bc9539d141e3a5a1c4 -219 325 120.732 30.757
3 f1f4224a60da630a0b83b3a231022123 103 153 122.390 29.982
5 e1a1dfb5a77578c889bd3368ffe1d30f -62 138 121.540 29.915
6 e0e30d88fc4f4b8a1d649baf9dd1274e -436 -35 119.614 29.137
[8]:
# If you want to filter out the users with work place location from home location
home['flag'] = 1
work = pd.merge(work,home,how='left')
home = home.drop(['flag'],axis = 1)
work = work[work['flag'].isnull()].drop(['flag'],axis = 1)

Plot activity

[9]:
#Plot the activity of the user, different color represent different location
uid = 'fcc3a9e9df361667e00ee5c16cb08922'
stay['group'] = stay['LONCOL'].astype(str)+','+stay['LATCOL'].astype(str)
tbd.plot_activity(stay[stay['user_id']==uid],figsize = (20, 5))
../_images/gallery_Example_6-Mobile_phone_data_processing_13_0.png