Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
273 views
in Technique[技术] by (71.8m points)

php - Map Clustering Algorithm

My current code is pretty quick, but I need to make it even faster so we can accommodate even more markers. Any suggestions?

Notes:

  • The code runs fastest when the SQL statement is ordered by marker name - which itself does a very partial job of clustering the markers (the names of markers in the same location are often, but not always similar).
  • I can't pre-cluster the markers, because they can be dynamically searched and filtered.
  • I've tried grid-based clustering - but the results often aren't very nice.
  • I know that the clusters are slightly skewed on a Mercator projection.
  • I'm not interested in a commercial clustering service.

The code:

$singleMarkers = array();
$clusterMarkers = array();

while (count($markers)) {
    $marker  = array_pop($markers);
    $cluster = array();

    // Compare marker against all remaining markers.
    foreach ($markers as $key => $compareMarker) {
        // This function returns the distance between two markers, at a defined zoom level.
        $pixels = pixelDistance($marker['lat'], $marker['lng'], $compareMarker['lat'], $compareMarker['lng'], $zoomLevel);
        // If two markers are closer than defined distance, remove compareMarker from array and add to cluster.
        if ($pixels < $distance) {
            unset($markers[$key]);
            $cluster[] = $compareMarker;
        }
    }

    // If a marker was added to cluster, also add the marker we were comparing to.
    if (count($cluster) > 0) {
        $cluster[] = $marker;
        $clusterMarkers[] = $cluster;
    } else {
        $singleMarkers[] = $marker;
    }
}

function pixelDistance($lat1, $lon1, $lat2, $lon2, $zoom) {
    $x1 = $lon1*10000000; //This is what I did to compensate for using lat/lon values instead of pixels.
    $y1 = $lat1*10000000;
    $x2 = $lon2*10000000;
    $y2 = $lat2*10000000;

    return sqrt(pow(($x1-$x2),2) + pow(($y1-$y2),2)) >> (21 - $zoom); //21 is the max zoom level
}

UPDATE

Here's the current code:

$singleMarkers = array();
$clusterMarkers = array();

// Minimum distance between markers to be included in a cluster, at diff. zoom levels
$DISTANCE = (10000000 >> $ZOOM) / 100000;

// Loop until all markers have been compared.
while (count($markers)) {
    $marker  = array_pop($markers);
    $cluster = array();

    // Compare against all markers which are left.
    foreach ($markers as $key => $target) {
        $pixels = abs($marker['lat']-$target['lat']) + abs($marker['lng']-$target['lng']);

        // If the two markers are closer than given distance remove target marker from array and add it to cluster.
        if ($pixels < $DISTANCE) {
            unset($markers[$key]);
            $cluster[] = $target;
        }
    }

    // If a marker has been added to cluster, add also the one we were comparing to.
    if (count($cluster) > 0) {
        $cluster[] = $marker;
        $clusterMarkers[] = $cluster;
    } else {
        $singleMarkers[] = $marker;
    }
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Do you actually need to calculate the Euclidean distance? If you are just comparing relative magnitudes of distances, you can probably get away with using the Manhattan distance, which will save you two calls to pow() and one to sqrt():

function pixelDistance($lat1, $lon1, $lat2, $lon2, $zoom) {
    $x1 = $lon1*10000000; //This is what I did to compensate for using lat/lon values instead of pixels.
    $y1 = $lat1*10000000;
    $x2 = $lon2*10000000;
    $y2 = $lat2*10000000;

    return ($x1-$x2) + ($y1-$y2) >> (21 - $zoom);
}

Not sure if you need the >> (21 - $zoom) bit for your calculations, so I left it in. But unless you actually need to use the calculated distance values elsewhere, you can probably get away with just using the latitude/longitude directly (no need to multiply by anything) and taking the Manhattan distance, assuming you pre-calculate $distance to fit in with that measure, which will be a lot cheaper in computational terms than coercing all the distances to fit in with the units and magnitude of $distance.

EDIT: When I was researching this problem last year, I found some useful stuff on Wikipedia - yes, it can happen ;-)

I can also highly recommend the book Programming Collective Intelligence: Building Smart Web 2.0 Applications which goes into clustering in great depth, as applied not only to geographical data but also to other areas of data analysis.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...