×

A disk-based algorithm for fast outlier detection in large datasets. (English) Zbl 1161.68432

Ma, Zongmin (ed.), Intelligent databases: Technologies and applications. Hershey, PA: Idea Group Publishing (IGI) (ISBN 1-59904-120-0/hbk). 29-43 (2007).
Summary: In data mining fields, outlier detection is an important research issue. The number of cells in the cell-based disk algorithm increases exponentially. The performance of this algorithm will decrease dramatically with the increasing of the number of cells and datapoints. Through further analysis, we find that there are many empty cells that are useless to outlier detection. So this chapter proposes a novel index structure, called CD-Tree, in which only non-empty cells are stored, and a cluster technique is adopted to store the data objects in the same cell into linked disk pages. Some experiments are made to test the performance of the proposed algorithms.
The experimental results show that the performance of the CD-Tree structure and of the cluster technique based disk algorithm outperforms that of the cell-based disk algorithm, and the dimensionality processed by the proposed algorithm is higher than that of the old one.
For the entire collection see [Zbl 1134.68007].

MSC:

68P15 Database theory
68T05 Learning and adaptive systems in artificial intelligence

Keywords:

data mining; CD-Tree