Data Preprocess
|
The input is the latitude and longitude coordinates of the lower left and upper right of the study area and exclude data that are outside the study area |
|
Input the GeoDataFrame of the study area and exclude the data beyond the study area |
|
Renumber the ID columns of the data |
|
Renumber the ID columns of the data,If two adjacent records exceed the distance, the number is the new ID |
- transbigdata.clean_outofbounds(data, bounds, col=['Lng', 'Lat'])
The input is the latitude and longitude coordinates of the lower left and upper right of the study area and exclude data that are outside the study area
- Parameters:
data (DataFrame) – Data
bounds (List) – Latitude and longitude of the lower left and upper right of the study area, in the order of [lon1, lat1, lon2, lat2]
col (List) – Column name of longitude and latitude
- Returns:
data1 – Data within the scope of the study
- Return type:
DataFrame
- transbigdata.clean_outofshape(data, shape, col=['Lng', 'Lat'], accuracy=500)
Input the GeoDataFrame of the study area and exclude the data beyond the study area
- Parameters:
data (DataFrame) – Data
shape (GeoDataFrame) – The GeoDataFrame of the study area
col (List) – Column name of longitude and latitude
accuracy (number) – The size of grid. The principle is to do the data gridding first and then do the data cleaning. The smaller the size is, the higher accuracy it has
- Returns:
data1 – Data within the scope of the study
- Return type:
DataFrame
- transbigdata.id_reindex(data, col, new=False, timegap=None, timecol=None, suffix='_new', sample=None)
Renumber the ID columns of the data
- Parameters:
data (DataFrame) – Data
col (str) – Name of the ID column to be re-indexed
new (bool) – False: the new number of the same ID will be the same index; True: according to the order of the table, the origin ID appears again with different index
timegap (number) – If an individual does not appear for a period of time (timegap is the time threshold), it is numbered as a new individual. This parameter should be set with timecol to take effect.
timecol (str) – The column name of time, it should be set with timegap to take effect
suffix (str) – The suffix of the new column. When set to False, the former column will be replaced
sample (int (optional)) – To desampling the data
- Returns:
data1 – Renumbered data
- Return type:
DataFrame
- transbigdata.id_reindex_disgap(data, col=['uid', 'lon', 'lat'], disgap=1000, suffix='_new')
Renumber the ID columns of the data,If two adjacent records exceed the distance, the number is the new ID
- Parameters:
data (DataFrame) – Data
col (str) – Name of the ID column to be re-indexed
disgap (number) – If two adjacent records exceed this distance, the number is the new ID
suffix (str) – The suffix of the new column. When set to False, the former column will be replaced
- Returns:
data1 – Renumbered data
- Return type:
DataFrame