Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Strategic Study of CAE >> 2008, Volume 10, Issue 9

Outliers detection algorithm based on nonlinear data transformation

Department of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094, China

Funding project:国家自然科学基金资助项目(60273035) Received: 2007-03-09 Revised: 2007-04-24 Available online: 2008-09-18 14:52:18.000

Next Previous

Abstract

The data dimension reduction is the main method that can enhance the outliers mining efficiency based on higher-dimension data set. A novel outlier detection algorithm is proposed after analyzing the advantages and disadvantages of the classical outlier mining algorithm in the paper.In this paper, we can transform nonlinear large-scale data into linear data in the feature space,and introduce a nonlinear data transformation to reduce data dimension. On the basis of each resulting vector,it determins whether the data is outlier data or not one by one. This paper shows that the algorithm is not only used to detect linear separable outlier data,but also used to detect nonlinear inseparable outlier data. This indicate that the algorithm has its obvious superiority.

Figures

图1

图2

图3

References

[ 1 ] 夏火松主编.数据仓库与数据挖掘技术[ M] .北京: 科学出版 社, 2004

[ 2 ] Beyer K, Goldstein J, Ramakri Shnan R, et al.When is nearest neighbor meaningful [ A] .Been C, Buneman P ed.Proceedings of the 7th Intimation Conference on Data Theory Lecture Notes In Computer Science 1 540 [ C] .Jerusalem: Spnnger, 1999.217 - 235

[ 3 ] Li Yajun.Reforming the theory of invariant moments for pattern recognition [ J] .Pattern Recognition, 1992 , 25 ( 7 ) : 723 -730 link1

[ 4 ] Scholkopf B, Smola A, Muller K R.Nonlinear component analysisas a kernel eigenvalue problem [ J] .Neural Computation, 1998 , 10 : 1299 -1319

[ 5 ] Giudici P.Applied Data Ming: Statistical Methods for Business and Industry [ M] .Beijing: Electronics Industry Press, 2004

[ 6 ] Suykens J A K, Gestel T V, Vandewalle J, et al.A Support Vector Machine formulation to PCA Analysis and Its Kernel Ver- sion [ R] .ESAT -SCD -SISTA Technical Report 2002 -68 , Belgium: Katholieke Universiteit Leuven, 2002

[ 7 ] The third international knowledge discovery and data mining tools competition dataset KDD99 -Cup [ EB /OL ] .http: //kdd.ics. uci.edu /databases /kddcup99 /kddcup99.html, 1999

[ 8 ] Pal N R, Bezdek J C.On cluster validity for the fuzzy c -mesns model [ J] .IEEE Trans Fuzzy System 1995 , 3 ( 3 ) : 370 -379 link1

Related Research