GIS Processing

`ckdnearest`(dfA_origin, dfB_origin[, Aname, ...])	Search the nearest points in dfB_origin for dfA_origin, and calculate the distance
`ckdnearest_point`(gdA, gdB)	This method will match the nearest points in gdfB to gdfA, and add a new column called dist
`ckdnearest_line`(gdfA, gdfB)	This method will seach from gdfB to find the nearest line to the point in gdfA.
`splitline_with_length`(Centerline[, maxlength])	The intput is the linestring GeoDataFrame.
`merge_polygon`(data, col)	The input is the GeoDataFrame of polygon geometry, and the col name.
`polyon_exterior`(data[, minarea])	The input is the GeoDataFrame of the polygon geometry.

Nearest neighbor searches

transbigdata.ckdnearest(dfA_origin, dfB_origin, Aname=['lon', 'lat'], Bname=['lon', 'lat'])

Search the nearest points in dfB_origin for dfA_origin, and calculate the distance

Parameters:

dfA_origin (DataFrame) – DataFrame A
dfB_origin (DataFrame) – DataFrame B
Aname (List) – The column of lng and lat in DataFrame A
Bname (List) – The column of lng and lat in DataFrame A

Returns:

gdf – The output DataFrame

Return type:

DataFrame

transbigdata.ckdnearest_point(gdA, gdB)

This method will match the nearest points in gdfB to gdfA, and add a new column called dist

Parameters:

gdA (GeoDataFrame) – GeoDataFrame A, point geometry
gdB (GeoDataFrame) – GeoDataFrame B, point geometry

Returns:

gdf – The output DataFrame

Return type:

DataFrame

transbigdata.ckdnearest_line(gdfA, gdfB)

This method will seach from gdfB to find the nearest line to the point in gdfA.

Parameters:

gdA (GeoDataFrame) – GeoDataFrame A, point geometry
gdB (GeoDataFrame) – GeoDataFrame B, linestring geometry

Returns:

gdf – Searching the nearset linestring in gdfB for the point in gdfA

Return type:

DataFrame

The following example will show how to search the nearest point-point, nearest point-edge throuh TransBigData. This method is based on KDTree algorithm. The computation complexity is o(log(n)). For more details, refer to wiki：https://en.wikipedia.org/wiki/K-d_tree

Point to point matching (DataFrame and DataFrame)

In [1]: import transbigdata as tbd

In [2]: import pandas as pd

In [3]: import geopandas as gpd

In [4]: from shapely.geometry import LineString

In [5]: dfA = gpd.GeoDataFrame([[1,2],[2,4],[2,6],
   ...:                         [2,10],[24,6],[21,6],
   ...:                         [22,6]],columns = ['lon1','lat1'])
   ...: 

In [6]: dfA
Out[6]: 
   lon1  lat1
0     1     2
1     2     4
2     2     6
3     2    10
4    24     6
5    21     6
6    22     6

In [7]: dfB = gpd.GeoDataFrame([[1,3],[2,5],[2,2]],columns = ['lon','lat'])

In [8]: dfB
Out[8]: 
   lon  lat
0    1    3
1    2    5
2    2    2

Use transbigdata.ckdnearest() to match points to points, if the inputs are two DataFrame without geometry columns, you should specify the lon and lat columns.

In [9]: tbd.ckdnearest(dfA,dfB,Aname=['lon1','lat1'],Bname=['lon','lat'])
Out[9]: 
   lon1  lat1  index  lon  lat          dist
   1     2      0    1    3  1.111949e+05
   2     4      1    2    5  1.111949e+05
   2     6      1    2    5  1.111949e+05
   2    10      1    2    5  5.559746e+05
  24     6      1    2    5  2.437393e+06
  21     6      1    2    5  2.105798e+06
  22     6      1    2    5  2.216318e+06

Point to point searching

Transform DataFrame to GeoDataFrame

In [10]: dfA['geometry'] = gpd.points_from_xy(dfA['lon1'],dfA['lat1'])

In [11]: dfA
Out[11]: 
   lon1  lat1                  geometry
0     1     2   POINT (1.00000 2.00000)
1     2     4   POINT (2.00000 4.00000)
2     2     6   POINT (2.00000 6.00000)
3     2    10  POINT (2.00000 10.00000)
4    24     6  POINT (24.00000 6.00000)
5    21     6  POINT (21.00000 6.00000)
6    22     6  POINT (22.00000 6.00000)

In [12]: dfB['geometry'] = gpd.points_from_xy(dfB['lon'],dfB['lat'])

In [13]: dfB
Out[13]: 
   lon  lat                 geometry
0    1    3  POINT (1.00000 3.00000)
1    2    5  POINT (2.00000 5.00000)
2    2    2  POINT (2.00000 2.00000)

使用 transbigdata.ckdnearest_point() 进行点与点匹配

In [14]: tbd.ckdnearest_point(dfA,dfB)
Out[14]: 
   lon1  lat1                geometry_x  ...  lon  lat               geometry_y
0     1     2   POINT (1.00000 2.00000)  ...    1    3  POINT (1.00000 3.00000)
1     2     4   POINT (2.00000 4.00000)  ...    2    5  POINT (2.00000 5.00000)
2     2     6   POINT (2.00000 6.00000)  ...    2    5  POINT (2.00000 5.00000)
3     2    10  POINT (2.00000 10.00000)  ...    2    5  POINT (2.00000 5.00000)
4    24     6  POINT (24.00000 6.00000)  ...    2    5  POINT (2.00000 5.00000)
5    21     6  POINT (21.00000 6.00000)  ...    2    5  POINT (2.00000 5.00000)
6    22     6  POINT (22.00000 6.00000)  ...    2    5  POINT (2.00000 5.00000)

[7 rows x 8 columns]

Point to Line searching (GeoDataFrame and GeoDataFrame)

In this case, Table A is still a node file, Table B is a linestring file

In [15]: dfA['geometry'] = gpd.points_from_xy(dfA['lon1'],dfA['lat1'])

In [16]: dfB['geometry'] = [LineString([[1,1],[1.5,2.5],[3.2,4]]),
   ....:                    LineString([[1,0],[1.5,0],[4,0]]),
   ....:                     LineString([[1,-1],[1.5,-2],[4,-4]])]
   ....: 

In [17]: dfB
Out[17]: 
   lon  lat                                           geometry  index
0    1    3  LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...      0
1    2    5  LINESTRING (1.00000 0.00000, 1.50000 0.00000, ...      1
2    2    2  LINESTRING (1.00000 -1.00000, 1.50000 -2.00000...      2

In [18]: tbd.ckdnearest_line(dfA,dfB)
Out[18]: 
   lon1  lat1  ... lat                                         geometry_y
0     1     2  ...   3  LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
1     2     4  ...   3  LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
2     2     6  ...   3  LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
3     2    10  ...   3  LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
4    21     6  ...   3  LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
5    22     6  ...   3  LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
6    24     6  ...   5  LINESTRING (1.00000 0.00000, 1.50000 0.00000, ...

[7 rows x 8 columns]

Split the line

splitline_with_length can be used to split a line into several sub-line with a maximum length threshold

transbigdata.splitline_with_length(Centerline, maxlength=100)

The intput is the linestring GeoDataFrame. The splited line’s length wull be no longer than maxlength

Parameters:

Centerline (GeoDataFrame) – Linestring geometry
maxlength (number) – The maximum length of the splited line

Returns:

splitedline – Splited line

Return type:

GeoDataFrame

The following case will show how to split a line itno 100 subline

#读取线要素
import geopandas as gpd
Centerline = gpd.read_file(r'test_lines.json')
Centerline.plot()

#转换线为投影坐标系
Centerline.crs = {'init':'epsg:4326'}
Centerline = Centerline.to_crs(epsg = '4517')
#计算线的长度
Centerline['length'] = Centerline.length
Centerline

	geometry	length
0	LINESTRING (29554925.232 4882800.694, 29554987...	285.503444
1	LINESTRING (29554682.635 4882450.554, 29554773...	185.482276
2	LINESTRING (29554987.079 4882521.969, 29555040...	291.399180
3	LINESTRING (29554987.079 4882521.969, 29555073...	248.881529
4	LINESTRING (29554987.079 4882521.969, 29554969...	207.571197
5	LINESTRING (29554773.177 4882288.671, 29554828...	406.251357
6	LINESTRING (29554773.177 4882288.671, 29554926...	158.114403
7	LINESTRING (29555060.286 4882205.456, 29555082...	107.426629
8	LINESTRING (29555040.278 4882235.468, 29555060...	36.069941
9	LINESTRING (29555060.286 4882205.456, 29555095...	176.695446

#将线打断为最长100米的线段
import transbigdata as tbd
splitedline = tbd.splitline_with_length(Centerline,maxlength = 100)

#打断后线型不变
splitedline.plot()

#但内容已经变成一段一段了
splitedline

	geometry	id	length
0	LINESTRING (29554925.232 4882800.694, 29554927...	0	100.000000
1	LINESTRING (29554946.894 4882703.068, 29554949...	0	100.000000
2	LINESTRING (29554968.557 4882605.443, 29554970...	0	85.503444
0	LINESTRING (29554682.635 4882450.554, 29554688...	1	100.000000
1	LINESTRING (29554731.449 4882363.277, 29554736...	1	85.482276
0	LINESTRING (29554987.079 4882521.969, 29554989...	2	100.000000
1	LINESTRING (29555005.335 4882423.650, 29555007...	2	100.000000
2	LINESTRING (29555023.592 4882325.331, 29555025...	2	91.399180
0	LINESTRING (29554987.079 4882521.969, 29554993...	3	100.000000
1	LINESTRING (29555042.051 4882438.435, 29555048...	3	99.855617
2	LINESTRING (29555111.265 4882370.450, 29555116...	3	48.881529
0	LINESTRING (29554987.079 4882521.969, 29554985...	4	100.000000
1	LINESTRING (29554973.413 4882422.908, 29554971...	4	99.756943
2	LINESTRING (29554930.341 4882335.023, 29554929...	4	7.571197
0	LINESTRING (29554773.177 4882288.671, 29554777...	5	100.000000
1	LINESTRING (29554816.361 4882198.476, 29554821...	5	99.782969
2	LINESTRING (29554882.199 4882125.314, 29554891...	5	99.745378
3	LINESTRING (29554976.612 4882096.588, 29554987...	5	100.000000
4	LINESTRING (29555076.548 4882100.189, 29555077...	5	6.251357
0	LINESTRING (29554773.177 4882288.671, 29554783...	6	100.000000
1	LINESTRING (29554869.914 4882314.006, 29554876...	6	58.114403
0	LINESTRING (29555060.286 4882205.456, 29555062...	7	100.000000
1	LINESTRING (29555081.239 4882107.675, 29555081...	7	7.426629
0	LINESTRING (29555040.278 4882235.468, 29555042...	8	36.069941
0	LINESTRING (29555060.286 4882205.456, 29555064...	9	100.000000
1	LINESTRING (29555094.981 4882299.244, 29555100...	9	76.419694

Polygon processing

transbigdata.merge_polygon(data, col)

The input is the GeoDataFrame of polygon geometry, and the col name. This function will merge the polygon based on the category in the mentioned column

Parameters:

data (GeoDataFrame) – The polygon geometry
col (str) – The column name for indicating category

Returns:

data1 – The merged polygon

Return type:

GeoDataFrame

transbigdata.polyon_exterior(data, minarea=0)

The input is the GeoDataFrame of the polygon geometry. The method will construct new polygon by extending the outer boundary of the ploygon

Parameters:

data (GeoDataFrame) – The polygon geometry
minarea (number) – The minimum area. Polygon of less area will be removed

Returns:

data1 – The processed polygon

Return type:

GeoDataFrame