Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Strategic Study of CAE >> 2007, Volume 9, Issue 11

A Hierarchical-Based Initialization Method for K-Means Algorithm

Department of Computer, Nanjing University of Science and Technology, Nanjing 210094, China

Funding project:湖南省教育厅优秀青年基金资助项目(07B022);湖南省教育厅基金资助项目(03C495) Received: 2006-07-14 Revised: 2006-09-13

Next Previous

Abstract

K-means algorithm is one of common clustering algorithms,  but the cluster center initialization is a hard problem.  In this paper,  a hierarchical-based initialization approach is proposed for K-Means algorithm.  The general clustering problem is treated as weighted clustering problem,  the original data is sampled level by level to reduce the data amount.  Then clustering is carried out at each level by top-down.  The initial center of each level is mapped from the clustering center of upper level and this procedure is repeated until the original data level is reached.  As a result,  the initial center for the original data is obtained.  Both the experimental results on simulated data and real data show that the proposed method has high converging speed,  high quality of clustering and is insensitive to noise,  which is superior to some existing clustering algorithms.

References

[ 1 ] Keim D A, Hinneburg A.Clustering techniques for large data sets—from the past to the future [ A] .In: Proc Tutorial Notes 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [ C] .San Diego, 1999.141 ~181 link1

[ 2 ] Duda R O, Hart P E.Pattern Classification and Scene Analysis [ M] .New York: John Wiley and Sons, 1973

[ 3 ] Pena J M, Lozano J A, Larranaga P.An empirical comparison of four initialization methods for the K -means algorithm [ J ] . Pattern Recognition Letters, 1999 , 20 ( 10 ) : 27 ~40 link1

[ 4 ] Kaufman L, Rousseeuw P J. Finding Groups in Data: An Introduction to Cluster Analysis [ M] .Wiley, Canada, 1990

[ 5 ] Forgy E. Cluster analysis of multivariate data: efficiency vs interpretability of classifications [ J] .Biometrics 1965 , 21 : 768 link1

[ 6 ] MacQueen J B.Some methods for classification and analysis of multivariate observations [ A ] . In: Proc Symposium on Mathematics and Probability, 5th, Berkely, Vol 1 , AD 669871 [ M] .University of California Press, Berkeley, CA, 1967.281 ~297 link1

[ 7 ] Fayyad U M, Renia C A, Bradley P S.Initialization of iterative refinement clustering algorithm [ A ] . In: Proceedings of the International Conference on Knowledge Discovery and Data Mining ( KDD'98 ) [ C] .New York, 1998.94 ~98 link1

[ 8 ] Bradley P S, Fayyad U.Refining INITIAL Points for K -means clustering [ A] .In: Proc 5 th Int Conf Machine Learning [ C] . Morgan Kaumann, 1998 link1

[ 9 ] Khan S S, Ahmad A.Cluster center initialization algorithm for K -means clustering [ J] .Pattern Recognition Letters, 2004 , 25 ( 11 ) : 1293 ~1302 link1

[10] Du Q, Faber V, Gunzburger M.Centroidal voronoi tessellations, theory, applications and algorithms [ J ] .SIAM Review, 1999 , 41 ( 4 ) : 637 ~676 link1

Related Research