基于层次的K-means初始化算法

汤九斌、陆建峰、唐振民、杨静宇

中国工程科学 ›› 2007, Vol. 9 ›› Issue (11) : 74-79.

PDF(343 KB)
PDF(343 KB)
中国工程科学 ›› 2007, Vol. 9 ›› Issue (11) : 74-79.

基于层次的K-means初始化算法

  • 汤九斌、陆建峰、唐振民、杨静宇

作者信息 +

A Hierarchical-Based Initialization Method for K-Means Algorithm

  • Tang Jiubin、Lu Jianfeng、Tang Zhenmin、Yang Jingyu

Author information +
History +

摘要

K-means算法是一种常用的聚类算法,但是聚类中心的初始化是其中的一个难点。笔者提出了一个基于层次思想的初始化方法。一般聚类问题均可看作加权聚类,通过层层抽样减少数据量,然后采用自顶向下的方式,从抽样结束层到原始数据层,每层都进行聚类,其中每层初始聚类中心均通过对上层聚类中心进行换算得到,重复该过程直到原始数据层,可得原始数据层的初始聚类中心。模拟数据和真实数据的实验结果均显示基于层次抽样初始化的K-means算法不仅收敛速度快、聚类质量高,而且对噪声不敏感,其性能明显优于现有的相关算法。

Abstract

K-means algorithm is one of common clustering algorithms,  but the cluster center initialization is a hard problem.  In this paper,  a hierarchical-based initialization approach is proposed for K-Means algorithm.  The general clustering problem is treated as weighted clustering problem,  the original data is sampled level by level to reduce the data amount.  Then clustering is carried out at each level by top-down.  The initial center of each level is mapped from the clustering center of upper level and this procedure is repeated until the original data level is reached.  As a result,  the initial center for the original data is obtained.  Both the experimental results on simulated data and real data show that the proposed method has high converging speed,  high quality of clustering and is insensitive to noise,  which is superior to some existing clustering algorithms.

关键词

层次技术 / 初始聚类中心 / 加权数据 / K平均聚类

Keywords

hierarchical technique / initial cluster centers / weighted data / K-means clustering

引用本文

导出引用
汤九斌,陆建峰,唐振民,杨静宇. 基于层次的K-means初始化算法. 中国工程科学. 2007, 9(11): 74-79

参考文献

基金
湖南省教育厅优秀青年基金资助项目(07B022);湖南省教育厅基金资助项目(03C495)
PDF(343 KB)

Accesses

Citation

Detail

段落导航
相关文章

/