Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
279 views
in Technique[技术] by (71.8m points)

sql - Aggregating (x,y) coordinate point clouds in PostgreSQL

I have a a PostgreSQL database table with the following simplified structure:

  • Device Id varchar
  • Pos_X (int)
  • Pos_Y (int)

Basically this table contains a lot of two dimensional waypoint data for devices. Now I want to design a query which reduces the number of coordinates in the output. It should aggregate nearby coordinates (for a certain x,y threshold) An example:

row 1: DEVICE1;603;1205

row 2: DEVICE1;604;1204

If the threshold is 5, these two rows should be aggregated since the variance is smaller than 5. Any idea how to do this in PostgreSQL or SQL in general?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use the often overlooked built-in function width_bucket() in combination with your aggregation:

If your coordinates run from, say, 0 to 2000 and you want to consolidate everything within squares of 5 to single points, I would lay out a grid of 10 (5*2) like this:

SELECT device_id
     , width_bucket(pos_x, 0, 2000, 2000/10) * 10 AS pos_x
     , width_bucket(pos_y, 0, 2000, 2000/10) * 10 AS pos_y
     , count(*) AS ct -- or any other aggregate
FROM   tbl
GROUP  BY 1,2,3
ORDER  BY 1,2,3;

To minimize the error you could GROUP BY the grid as demonstrated, but save actual average coordinates:

SELECT device_id
     , avg(pos_x)::int AS pos_x   -- save actual averages to minimize error
     , avg(pos_y)::int AS pos_y   -- cast if you need to
     , count(*)        AS ct      -- or any other aggregate
FROM   tbl
GROUP  BY
       device_id
     , width_bucket(pos_x, 0, 2000, 2000/10) * 10  -- aggregate by grid
     , width_bucket(pos_y, 0, 2000, 2000/10) * 10
ORDER  BY 1,2,3;

sqlfiddle demonstrating both alongside.

Well, this particular case could be simpler:

...
GROUP  BY
       device_id
     , (pos_x / 10) * 10          -- truncates last digit of an integer
     , (pos_y / 10) * 10
...

But that's just because the demo grid size of 10 conveniently matches the decimal system. Try the same with a grid size of 17 or something ...


Expand to timestamps

You can expand this approach to cover date and timestamp values by converting them to unix epoch (number of seconds since '1970-1-1') with extract().

SELECT extract(epoch FROM '2012-10-01 21:06:38+02'::timestamptz);

When you are done, convert the result back to timestamp with time zone:

SELECT timestamptz 'epoch' + 1349118398 * interval '1s';

Or simply to_timestamp():

SELECT to_timestamp(1349118398);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...