I'm currently faced with the problem of finding a way to cluster around 500,000 latitude/longitude pairs in python. So far I've tried computing a distance matrix with numpy (to pass into the scikit-learn DBSCAN) but with such a large input it quickly spits out a Memory Error.
The points are stored in tuples containing the latitude, longitude, and the data value at that point.
In short, what is the most efficient way to spatially cluster a large number of latitude/longitude pairs in python? For this application, I'm willing to sacrifice some accuracy in the name of speed.
Edit:
The number of clusters for the algorithm to find is unknown ahead of time.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…