Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
366 views
in Technique[技术] by (71.8m points)

hierarchical clustering - Building a 2-D Range Tree with duplicate coordinates

I have an application where I have multiple (10k-200k) points on a 2D-plane. I want to (greedily?) cluster these with a logic of sorts:

1. Select a random point on the plane.
2. Check for other points in radius x.
3. Form a cluster (with either 1 or more points).

These steps can be repeated with the newly formed clusters with higher x, which would result in a somewhat hierarchial clustering (I found this clustering logic on this post and I am fairly sure it would satisfy my needs).

As was mentioned in the post, the algorithm works better with some spatial indexing. Since I don't need "closest point" query type and am only interested in range query, I figured a 2D Range Tree would suit my situation the best.

I can understand the logic of a range tree to some extent, but it all goes quite complex when I have a lot of x-coordinate and y-coordinate duplicates on my data set. What also causes confusion is the median split with even number set.

Example:

Consider the following point set:

(2,7)
(2,6)
(2,5)
(2,4)
(2,3)
(5,6)
(5,5)
(7,5)
(9,3)
(9,2)
(9,1)

What would be the best approach to constructing a range tree from a dataset like that? If I remove the duplicates, how does that affect the corresponding y-tree?

Also, removing duplicates from x-tree would leave me with 2,5,7, and 9. The median of that group is 6, right? Or should I pick a point in the set? If so, is it 5 or 7?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...