数据预处理

`clean_outofbounds`(data, bounds[, col])	输入为研究区域左下角和右上角的纬度和经度坐标，并排除研究区域外的数据
`clean_outofshape`(data, shape[, col, accuracy])	输入研究区域的地理数据框并排除研究区域以外的数据
`id_reindex`(data, col[, new, timegap, ...])	重新编号数据的ID列
`id_reindex_disgap`(data[, col, disgap, suffix])	对数据的ID列重新编号，如果相邻两条记录超过距离，则编号为新ID

transbigdata.clean_outofbounds(data, bounds, col=['Lng', 'Lat'])

输入为研究区域左下角和右上角的纬度和经度坐标，并排除研究区域外的数据

参数:

返回:

data1 – 研究范围内的数据

返回类型:

DataFrame

transbigdata.clean_outofshape(data, shape, col=['Lng', 'Lat'], accuracy=500)

输入研究区域的地理数据框并排除研究区域以外的数据

参数:

返回:

data1 – 研究范围内的数据

返回类型:

DataFrame

transbigdata.id_reindex(data, col, new=False, timegap=None, timecol=None, suffix='_new', sample=None)

重新编号数据的ID列

参数:

data (DataFrame) – 数据
col (str) – 要重新索引的ID列的名称
new (bool) – False：相同 ID 的新编号将是相同的索引;True：根据表的顺序，源 ID 再次出现，索引不同
timegap (number) – 如果某个个体在一段时间内没有出现（时间间隔是时间阈值），则将其编号为新个体。此参数应与 timecol 一起设置才能生效。
timecol (str) – time的列名，需要设置timegap才能生效
suffix (str) – 新列的后缀。设置为 False 时，将替换前一列
sample (int (optional)) – 对数据进行去采样

返回:

data1 – 重新编号的数据

返回类型:

DataFrame

transbigdata.id_reindex_disgap(data, col=['uid', 'lon', 'lat'], disgap=1000, suffix='_new')

对数据的ID列重新编号，如果相邻两条记录超过距离，则编号为新ID

参数:

返回:

data1 – 重新编号的数据

返回类型:

DataFrame