Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
582 views
in Technique[技术] by (71.8m points)

r - kmeans: Quick-TRANSfer stage steps exceeded maximum

I am running k-means clustering in R on a dataset with 636,688 rows and 7 columns using the standard stats package: kmeans(dataset, centers = 100, nstart = 25, iter.max = 20).

I get the following error: Quick-TRANSfer stage steps exceeded maximum (= 31834400), and although one can view the code at http://svn.r-project.org/R/trunk/src/library/stats/R/kmeans.R - I am unsure as to what is going wrong. I assume my problem has to do with the size of my dataset, but I would be grateful if someone could clarify once and for all what I can do to mitigate the issue.

question from:https://stackoverflow.com/questions/21382681/kmeans-quick-transfer-stage-steps-exceeded-maximum

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I just had the same issue.

See the documentation of kmeans in R via ?kmeans:

The Hartigan-Wong algorithm generally does a better job than either of those, but trying several random starts (‘nstart’> 1) is often recommended. In rare cases, when some of the points (rows of ‘x’) are extremely close, the algorithm may not converge in the “Quick-Transfer” stage, signalling a warning (and returning ‘ifault = 4’). Slight rounding of the data may be advisable in that case.

In these cases, you may need to switch to the Lloyd or MacQueen algorithms.

The nasty thing about R here is that it continues with a warning that may go unnoticed. For my benchmark purposes, I consider this to be a failed run, and thus I use:

if (kms$ifault==4) { stop("Failed in Quick-Transfer"); }

Depending on your use case, you may want to do something like

if (kms$ifault==4) { kms = kmeans(X, kms$centers, algorithm="MacQueen"); }

instead, to continue with a different algorithm.

If you are benchmarking K-means, note that R uses iter.max=10 per default. It may take much more than 10 iterations to converge.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...