11 Jan 2019
两千万级别的空间查询最近点
基础数据范围
边界方位 lat [18.2 - 53.6] 3541 y lon [73.4 - 135.1] 6165 x
time lat lon time,y,x
大概每隔 0.01° 一个点。
数据库初始化
CREATE TABLE "rgf" (
"lat" float4 NOT NULL,
"lon" float4 NOT NULL,
"x" int4 NOT NULL,
"y" int4 NOT NULL,
"geom" "geometry"
)
WITH (OIDS=FALSE)
;
- 空间索引
CREATE INDEX "geom_index" ON "rgf" USING gist ("geom");
ALTER TABLE rgf ADD PRIMARY KEY ("x", "y");
- 批量插入数据
省略….
select count(1) from rgf;
-- 21830265
-
添加 geometry 字段
alter table rgf add column geom geometry;
-
更新 geom 字段内容
update rgf set geom = ST_GeomFromText('POINT(' || lon ||' ' || lat || ')', 4326); select lat,lon,st_astext(geom) from rgf limit 1;
-
创建空间索引
create index geom_index on rgf using gist(geom);
-
查询最近点
select lat, lon, st_distance(st_geomfromtext('POINT(120.45 50.25)', 4326),geom) as dis from rgf order by dis limit 1; select lat,lon,st_distance(st_geometryfromtext('POINT(120.45 50.25)', 4326), geom) as dist from rgf limit 1;
– 性能很差。查询一次大概要20秒左右。后来发现把排序去掉效率很高。
解决方式
1、查询该点的一个缓冲区。缓冲区稍微大一些。
2、查询在该点缓冲区的所有点。
3、在通过第二步的结果之上,再查询最近点。
结果:发现性能之间达到毫秒级。(1000次循环查询用时大概1秒左右)
-
查询一个点的缓冲区
select st_astext(st_buffer(st_geometryfromtext('POINT(120.45 50.25)', 4326),0.2)); select st_buffer(st_geometryfromtext('POINT(120.45 50.25)', 4326),0.2);
-
查询在该点缓冲区的所有点,再查询最近点
select x, y, lat, lon, st_distance(st_geomfromtext('POINT(120.45 50.25)', 4326),geom) as dis from (select * from rgf where st_contains(st_buffer(st_geometryfromtext('POINT(120.45 50.25)', 4326),0.02), geom)) a order by dis limit 1; select x, y, lat, lon, st_distance(st_geomfromtext('POINT(120.45 50.25)', 4326),geom) as dis from (select x, y, lat, lon, geom from rgf where st_contains(st_buffer(st_geometryfromtext('POINT(120.45 50.25)', 4326),0.02), geom)) a order by dis limit 1;
环境:
1、windows环境(win10,16G,i7)
2、postgresql9.4
3、postgis2.3
4、rgf 表中共用 21830265 条数据
参考链接
业精于勤,荒于嬉; 行成于思,毁于随。
pnunu
at 14:18