GIS Processing

ckdnearest(dfA_origin, dfB_origin[, Aname, ...])

Search the nearest points in dfB_origin for dfA_origin, and calculate the distance

ckdnearest_point(gdA, gdB)

This method will match the nearest points in gdfB to gdfA, and add a new column called dist

ckdnearest_line(gdfA, gdfB)

This method will seach from gdfB to find the nearest line to the point in gdfA.

splitline_with_length(Centerline[, maxlength])

The intput is the linestring GeoDataFrame.

merge_polygon(data, col)

The input is the GeoDataFrame of polygon geometry, and the col name.

polyon_exterior(data[, minarea])

The input is the GeoDataFrame of the polygon geometry.

Nearest neighbor searches

transbigdata.ckdnearest(dfA_origin, dfB_origin, Aname=['lon', 'lat'], Bname=['lon', 'lat'])

Search the nearest points in dfB_origin for dfA_origin, and calculate the distance

Parameters:
  • dfA_origin (DataFrame) – DataFrame A

  • dfB_origin (DataFrame) – DataFrame B

  • Aname (List) – The column of lng and lat in DataFrame A

  • Bname (List) – The column of lng and lat in DataFrame A

Returns:

gdf – The output DataFrame

Return type:

DataFrame

transbigdata.ckdnearest_point(gdA, gdB)

This method will match the nearest points in gdfB to gdfA, and add a new column called dist

Parameters:
  • gdA (GeoDataFrame) – GeoDataFrame A, point geometry

  • gdB (GeoDataFrame) – GeoDataFrame B, point geometry

Returns:

gdf – The output DataFrame

Return type:

DataFrame

transbigdata.ckdnearest_line(gdfA, gdfB)

This method will seach from gdfB to find the nearest line to the point in gdfA.

Parameters:
  • gdA (GeoDataFrame) – GeoDataFrame A, point geometry

  • gdB (GeoDataFrame) – GeoDataFrame B, linestring geometry

Returns:

gdf – Searching the nearset linestring in gdfB for the point in gdfA

Return type:

DataFrame

The following example will show how to search the nearest point-point, nearest point-edge throuh TransBigData. This method is based on KDTree algorithm. The computation complexity is o(log(n)). For more details, refer to wiki:https://en.wikipedia.org/wiki/K-d_tree

Point to point matching (DataFrame and DataFrame)


In [1]: import transbigdata as tbd

In [2]: import pandas as pd

In [3]: import geopandas as gpd

In [4]: from shapely.geometry import LineString

In [5]: dfA = gpd.GeoDataFrame([[1,2],[2,4],[2,6],
   ...:                         [2,10],[24,6],[21,6],
   ...:                         [22,6]],columns = ['lon1','lat1'])
   ...: 

In [6]: dfA
Out[6]: 
   lon1  lat1
0     1     2
1     2     4
2     2     6
3     2    10
4    24     6
5    21     6
6    22     6

In [7]: dfB = gpd.GeoDataFrame([[1,3],[2,5],[2,2]],columns = ['lon','lat'])

In [8]: dfB
Out[8]: 
   lon  lat
0    1    3
1    2    5
2    2    2
Use transbigdata.ckdnearest() to match points to points, if the inputs are two DataFrame without geometry columns, you should specify the lon and lat columns.
In [9]: tbd.ckdnearest(dfA,dfB,Aname=['lon1','lat1'],Bname=['lon','lat'])
Out[9]: 
   lon1  lat1  index  lon  lat          dist
0     1     2      0    1    3  1.111949e+05
1     2     4      1    2    5  1.111949e+05
2     2     6      1    2    5  1.111949e+05
3     2    10      1    2    5  5.559746e+05
4    24     6      1    2    5  2.437393e+06
5    21     6      1    2    5  2.105798e+06
6    22     6      1    2    5  2.216318e+06

Point to point searching

Transform DataFrame to GeoDataFrame

In [10]: dfA['geometry'] = gpd.points_from_xy(dfA['lon1'],dfA['lat1'])

In [11]: dfA
Out[11]: 
   lon1  lat1                  geometry
0     1     2   POINT (1.00000 2.00000)
1     2     4   POINT (2.00000 4.00000)
2     2     6   POINT (2.00000 6.00000)
3     2    10  POINT (2.00000 10.00000)
4    24     6  POINT (24.00000 6.00000)
5    21     6  POINT (21.00000 6.00000)
6    22     6  POINT (22.00000 6.00000)

In [12]: dfB['geometry'] = gpd.points_from_xy(dfB['lon'],dfB['lat'])

In [13]: dfB
Out[13]: 
   lon  lat                 geometry
0    1    3  POINT (1.00000 3.00000)
1    2    5  POINT (2.00000 5.00000)
2    2    2  POINT (2.00000 2.00000)
使用 transbigdata.ckdnearest_point() 进行点与点匹配
In [14]: tbd.ckdnearest_point(dfA,dfB)
Out[14]: 
   lon1  lat1                geometry_x  ...  lon  lat               geometry_y
0     1     2   POINT (1.00000 2.00000)  ...    1    3  POINT (1.00000 3.00000)
1     2     4   POINT (2.00000 4.00000)  ...    2    5  POINT (2.00000 5.00000)
2     2     6   POINT (2.00000 6.00000)  ...    2    5  POINT (2.00000 5.00000)
3     2    10  POINT (2.00000 10.00000)  ...    2    5  POINT (2.00000 5.00000)
4    24     6  POINT (24.00000 6.00000)  ...    2    5  POINT (2.00000 5.00000)
5    21     6  POINT (21.00000 6.00000)  ...    2    5  POINT (2.00000 5.00000)
6    22     6  POINT (22.00000 6.00000)  ...    2    5  POINT (2.00000 5.00000)

[7 rows x 8 columns]

Point to Line searching (GeoDataFrame and GeoDataFrame)

In this case, Table A is still a node file, Table B is a linestring file

In [15]: dfA['geometry'] = gpd.points_from_xy(dfA['lon1'],dfA['lat1'])

In [16]: dfB['geometry'] = [LineString([[1,1],[1.5,2.5],[3.2,4]]),
   ....:                    LineString([[1,0],[1.5,0],[4,0]]),
   ....:                     LineString([[1,-1],[1.5,-2],[4,-4]])]
   ....: 

In [17]: dfB
Out[17]: 
   lon  lat                                           geometry  index
0    1    3  LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...      0
1    2    5  LINESTRING (1.00000 0.00000, 1.50000 0.00000, ...      1
2    2    2  LINESTRING (1.00000 -1.00000, 1.50000 -2.00000...      2

In [18]: tbd.ckdnearest_line(dfA,dfB)
Out[18]: 
   lon1  lat1  ... lat                                         geometry_y
0     1     2  ...   3  LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
1     2     4  ...   3  LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
2     2     6  ...   3  LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
3     2    10  ...   3  LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
4    21     6  ...   3  LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
5    22     6  ...   3  LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
6    24     6  ...   5  LINESTRING (1.00000 0.00000, 1.50000 0.00000, ...

[7 rows x 8 columns]

Split the line

splitline_with_length can be used to split a line into several sub-line with a maximum length threshold

transbigdata.splitline_with_length(Centerline, maxlength=100)

The intput is the linestring GeoDataFrame. The splited line’s length wull be no longer than maxlength

Parameters:
  • Centerline (GeoDataFrame) – Linestring geometry

  • maxlength (number) – The maximum length of the splited line

Returns:

splitedline – Splited line

Return type:

GeoDataFrame

The following case will show how to split a line itno 100 subline

#读取线要素
import geopandas as gpd
Centerline = gpd.read_file(r'test_lines.json')
Centerline.plot()
_images/output_2_1.png
#转换线为投影坐标系
Centerline.crs = {'init':'epsg:4326'}
Centerline = Centerline.to_crs(epsg = '4517')
#计算线的长度
Centerline['length'] = Centerline.length
Centerline
Id geometry length
0 0 LINESTRING (29554925.232 4882800.694, 29554987... 285.503444
1 0 LINESTRING (29554682.635 4882450.554, 29554773... 185.482276
2 0 LINESTRING (29554987.079 4882521.969, 29555040... 291.399180
3 0 LINESTRING (29554987.079 4882521.969, 29555073... 248.881529
4 0 LINESTRING (29554987.079 4882521.969, 29554969... 207.571197
5 0 LINESTRING (29554773.177 4882288.671, 29554828... 406.251357
6 0 LINESTRING (29554773.177 4882288.671, 29554926... 158.114403
7 0 LINESTRING (29555060.286 4882205.456, 29555082... 107.426629
8 0 LINESTRING (29555040.278 4882235.468, 29555060... 36.069941
9 0 LINESTRING (29555060.286 4882205.456, 29555095... 176.695446
#将线打断为最长100米的线段
import transbigdata as tbd
splitedline = tbd.splitline_with_length(Centerline,maxlength = 100)
#打断后线型不变
splitedline.plot()
_images/output_5_1.png
#但内容已经变成一段一段了
splitedline
geometry id length
0 LINESTRING (29554925.232 4882800.694, 29554927... 0 100.000000
1 LINESTRING (29554946.894 4882703.068, 29554949... 0 100.000000
2 LINESTRING (29554968.557 4882605.443, 29554970... 0 85.503444
0 LINESTRING (29554682.635 4882450.554, 29554688... 1 100.000000
1 LINESTRING (29554731.449 4882363.277, 29554736... 1 85.482276
0 LINESTRING (29554987.079 4882521.969, 29554989... 2 100.000000
1 LINESTRING (29555005.335 4882423.650, 29555007... 2 100.000000
2 LINESTRING (29555023.592 4882325.331, 29555025... 2 91.399180
0 LINESTRING (29554987.079 4882521.969, 29554993... 3 100.000000
1 LINESTRING (29555042.051 4882438.435, 29555048... 3 99.855617
2 LINESTRING (29555111.265 4882370.450, 29555116... 3 48.881529
0 LINESTRING (29554987.079 4882521.969, 29554985... 4 100.000000
1 LINESTRING (29554973.413 4882422.908, 29554971... 4 99.756943
2 LINESTRING (29554930.341 4882335.023, 29554929... 4 7.571197
0 LINESTRING (29554773.177 4882288.671, 29554777... 5 100.000000
1 LINESTRING (29554816.361 4882198.476, 29554821... 5 99.782969
2 LINESTRING (29554882.199 4882125.314, 29554891... 5 99.745378
3 LINESTRING (29554976.612 4882096.588, 29554987... 5 100.000000
4 LINESTRING (29555076.548 4882100.189, 29555077... 5 6.251357
0 LINESTRING (29554773.177 4882288.671, 29554783... 6 100.000000
1 LINESTRING (29554869.914 4882314.006, 29554876... 6 58.114403
0 LINESTRING (29555060.286 4882205.456, 29555062... 7 100.000000
1 LINESTRING (29555081.239 4882107.675, 29555081... 7 7.426629
0 LINESTRING (29555040.278 4882235.468, 29555042... 8 36.069941
0 LINESTRING (29555060.286 4882205.456, 29555064... 9 100.000000
1 LINESTRING (29555094.981 4882299.244, 29555100... 9 76.419694

Polygon processing

transbigdata.merge_polygon(data, col)

The input is the GeoDataFrame of polygon geometry, and the col name. This function will merge the polygon based on the category in the mentioned column

Parameters:
  • data (GeoDataFrame) – The polygon geometry

  • col (str) – The column name for indicating category

Returns:

data1 – The merged polygon

Return type:

GeoDataFrame

transbigdata.polyon_exterior(data, minarea=0)

The input is the GeoDataFrame of the polygon geometry. The method will construct new polygon by extending the outer boundary of the ploygon

Parameters:
  • data (GeoDataFrame) – The polygon geometry

  • minarea (number) – The minimum area. Polygon of less area will be removed

Returns:

data1 – The processed polygon

Return type:

GeoDataFrame