
基于非线性数据变换的离群点检测算法
徐雪松、张谞、宋东明、张宏、刘凤玉
Outliers detection algorithm based on nonlinear data transformation
Xu Xuesong、Zhang Xu、Song Dongming、Zhang Hong、Liu Fnegyu
为了提高高维数据集合离群数据挖掘效率,在分析了传统的离群数据挖掘算法优点和缺点的基础上,提出了一种离群点检测算法,首先将非线性问题转化为高维特征空间中的线性问题,然后利用非线性数据变换进行维数约减,对所得数据对象每个投影分量逐个判断数据点是否是离群点,通过实验证明该算法不仅可用于线性可分数据集的离群点检测,而且可用于线性不可分数据集的离群点检测,表明了算法的优越性。
The data dimension reduction is the main method that can enhance the outliers mining efficiency based on higher-dimension data set. A novel outlier detection algorithm is proposed after analyzing the advantages and disadvantages of the classical outlier mining algorithm in the paper.In this paper, we can transform nonlinear large-scale data into linear data in the feature space,and introduce a nonlinear data transformation to reduce data dimension. On the basis of each resulting vector,it determins whether the data is outlier data or not one by one. This paper shows that the algorithm is not only used to detect linear separable outlier data,but also used to detect nonlinear inseparable outlier data. This indicate that the algorithm has its obvious superiority.
dimension reduction / kernel function / principal component / outliers
/
〈 |
|
〉 |