6 Mobile phone data processing
In this example, we will introduce how to use the TransBigData to process mobile phone data.
Firstly, import the TransBigData and read the data using pandas
[1]:
import pandas as pd
import transbigdata as tbd
data = pd.read_csv(r'data/mobiledata_sample.csv')
#make sure the time column is correct
data['stime'] = pd.to_datetime(data['stime'], format='%Y%m%d%H%M')
data = data.sort_values(by = ['user_id','stime'])
data.head()
[1]:
user_id | stime | longitude | latitude | date | |
---|---|---|---|---|---|
78668 | 00466ab30de56db7efbd04991b680ae1 | 2018-06-01 00:00:00 | 121.43 | 30.175 | 20180601 |
78669 | 00466ab30de56db7efbd04991b680ae1 | 2018-06-01 03:35:00 | 121.43 | 30.175 | 20180601 |
78670 | 00466ab30de56db7efbd04991b680ae1 | 2018-06-01 04:25:00 | 121.43 | 30.175 | 20180601 |
78671 | 00466ab30de56db7efbd04991b680ae1 | 2018-06-01 05:15:00 | 121.43 | 30.175 | 20180601 |
78289 | 00466ab30de56db7efbd04991b680ae1 | 2018-06-01 06:05:00 | 121.43 | 30.175 | 20180601 |
Identify stay and move infomation from mobile phone trajectory data
When processing mobile phone data, TransBigData’s approach is to first correspond the data to the grids and treat the data within the same grid as being at the same location to avoid data positioning errors that cause the same location to be identified as multiple.
[3]:
#Obtain gridding parameters
params = tbd.area_to_params([121.860, 29.295, 121.862, 29.301], accuracy=500)
#Identify stay and move infomation from mobile phone trajectory data
stay,move = tbd.traj_stay_move(data,params,col = ['user_id','stime','longitude', 'latitude'])
[4]:
stay.head()
[4]:
user_id | stime | LONCOL | LATCOL | etime | lon | lat | duration | stayid | |
---|---|---|---|---|---|---|---|---|---|
78668 | 00466ab30de56db7efbd04991b680ae1 | 2018-06-01 00:00:00 | -83 | 196 | 2018-06-01 07:21:00 | 121.430 | 30.175 | 26460.0 | 0 |
78303 | 00466ab30de56db7efbd04991b680ae1 | 2018-06-01 07:36:00 | -81 | 191 | 2018-06-01 10:38:00 | 121.444 | 30.152 | 10920.0 | 1 |
78364 | 00466ab30de56db7efbd04991b680ae1 | 2018-06-01 10:38:00 | -81 | 191 | 2018-06-01 12:02:00 | 121.444 | 30.152 | 5040.0 | 2 |
78399 | 00466ab30de56db7efbd04991b680ae1 | 2018-06-01 12:20:00 | -83 | 196 | 2018-06-01 13:04:00 | 121.430 | 30.175 | 2640.0 | 3 |
78471 | 00466ab30de56db7efbd04991b680ae1 | 2018-06-01 14:34:00 | -60 | 189 | 2018-06-01 16:06:00 | 121.551 | 30.143 | 5520.0 | 4 |
[5]:
move.head()
[5]:
user_id | SLONCOL | SLATCOL | stime | slon | slat | etime | elon | elat | ELONCOL | ELATCOL | duration | moveid | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
78668 | 00466ab30de56db7efbd04991b680ae1 | -83 | 196 | 2018-06-01 00:00:00 | 121.430 | 30.175 | 2018-06-01 00:00:00 | 121.430 | 30.175 | -83.0 | 196.0 | 0.0 | 0 |
78668 | 00466ab30de56db7efbd04991b680ae1 | -83 | 196 | 2018-06-01 07:21:00 | 121.430 | 30.175 | 2018-06-01 07:36:00 | 121.444 | 30.152 | -81.0 | 191.0 | 900.0 | 1 |
78303 | 00466ab30de56db7efbd04991b680ae1 | -81 | 191 | 2018-06-01 10:38:00 | 121.444 | 30.152 | 2018-06-01 10:38:00 | 121.444 | 30.152 | -81.0 | 191.0 | 0.0 | 2 |
78364 | 00466ab30de56db7efbd04991b680ae1 | -81 | 191 | 2018-06-01 12:02:00 | 121.444 | 30.152 | 2018-06-01 12:20:00 | 121.430 | 30.175 | -83.0 | 196.0 | 1080.0 | 3 |
78399 | 00466ab30de56db7efbd04991b680ae1 | -83 | 196 | 2018-06-01 13:04:00 | 121.430 | 30.175 | 2018-06-01 14:34:00 | 121.551 | 30.143 | -60.0 | 189.0 | 5400.0 | 4 |
Home and work place identify
[6]:
#Identify home location
home = tbd.mobile_identify_home(stay, col=['user_id','stime', 'etime','LONCOL', 'LATCOL','lon','lat'], start_hour=8, end_hour=20 )
home.head()
[6]:
user_id | LONCOL | LATCOL | lon | lat | |
---|---|---|---|---|---|
2036 | fcc3a9e9df361667e00ee5c16cb08922 | -147 | 292 | 121.103 | 30.610 |
2019 | f71e9d7d78e6f5bc9539d141e3a5a1c4 | -216 | 330 | 120.745 | 30.778 |
2001 | f6b65495b63574c2eb73c7e63ae38252 | -225 | -286 | 120.699 | 28.011 |
1982 | f1f4224a60da630a0b83b3a231022123 | 102 | 157 | 122.387 | 30.000 |
1942 | e96739aedb70a8e5c4efe4c488934b43 | -223 | 278 | 120.708 | 30.546 |
[7]:
#Identify work location
work = tbd.mobile_identify_work(stay, col=['user_id', 'stime', 'etime', 'LONCOL', 'LATCOL','lon','lat'], minhour=3, start_hour=8, end_hour=20,workdaystart=0, workdayend=4)
work.head()
[7]:
user_id | LONCOL | LATCOL | lon | lat | |
---|---|---|---|---|---|
0 | fcc3a9e9df361667e00ee5c16cb08922 | -148 | 292 | 121.097 | 30.610 |
1 | f71e9d7d78e6f5bc9539d141e3a5a1c4 | -219 | 325 | 120.732 | 30.757 |
3 | f1f4224a60da630a0b83b3a231022123 | 103 | 153 | 122.390 | 29.982 |
5 | e1a1dfb5a77578c889bd3368ffe1d30f | -62 | 138 | 121.540 | 29.915 |
6 | e0e30d88fc4f4b8a1d649baf9dd1274e | -436 | -35 | 119.614 | 29.137 |
[8]:
# If you want to filter out the users with work place location from home location
home['flag'] = 1
work = pd.merge(work,home,how='left')
home = home.drop(['flag'],axis = 1)
work = work[work['flag'].isnull()].drop(['flag'],axis = 1)
Plot activity
[9]:
#Plot the activity of the user, different color represent different location
uid = 'fcc3a9e9df361667e00ee5c16cb08922'
stay['group'] = stay['LONCOL'].astype(str)+','+stay['LATCOL'].astype(str)
tbd.plot_activity(stay[stay['user_id']==uid],figsize = (20, 5))