基于非线性数据变换的离群点检测算法
南京理工大学计算机科学与技术学院,南京210094
下一篇 上一篇
摘要
为了提高高维数据集合离群数据挖掘效率,在分析了传统的离群数据挖掘算法优点和缺点的基础上,提出了一种离群点检测算法,首先将非线性问题转化为高维特征空间中的线性问题,然后利用非线性数据变换进行维数约减,对所得数据对象每个投影分量逐个判断数据点是否是离群点,通过实验证明该算法不仅可用于线性可分数据集的离群点检测,而且可用于线性不可分数据集的离群点检测,表明了算法的优越性。
参考文献
[ 1 ] 夏火松主编.数据仓库与数据挖掘技术[ M] .北京: 科学出版 社, 2004
[ 2 ] Beyer K, Goldstein J, Ramakri Shnan R, et al.When is nearest neighbor meaningful [ A] .Been C, Buneman P ed.Proceedings of the 7th Intimation Conference on Data Theory Lecture Notes In Computer Science 1 540 [ C] .Jerusalem: Spnnger, 1999.217 - 235
[ 3 ] Li Yajun.Reforming the theory of invariant moments for pattern recognition [ J] .Pattern Recognition, 1992 , 25 ( 7 ) : 723 -730 链接1
[ 4 ] Scholkopf B, Smola A, Muller K R.Nonlinear component analysisas a kernel eigenvalue problem [ J] .Neural Computation, 1998 , 10 : 1299 -1319
[ 5 ] Giudici P.Applied Data Ming: Statistical Methods for Business and Industry [ M] .Beijing: Electronics Industry Press, 2004
[ 6 ] Suykens J A K, Gestel T V, Vandewalle J, et al.A Support Vector Machine formulation to PCA Analysis and Its Kernel Ver- sion [ R] .ESAT -SCD -SISTA Technical Report 2002 -68 , Belgium: Katholieke Universiteit Leuven, 2002
[ 7 ] The third international knowledge discovery and data mining tools competition dataset KDD99 -Cup [ EB /OL ] .http: //kdd.ics. uci.edu /databases /kddcup99 /kddcup99.html, 1999
[ 8 ] Pal N R, Bezdek J C.On cluster validity for the fuzzy c -mesns model [ J] .IEEE Trans Fuzzy System 1995 , 3 ( 3 ) : 370 -379 链接1