×

Asymptotic expansions of the \(k\) nearest neighbor risk. (English) Zbl 0929.62070

Summary: The finite-sample risk of the \(k\) nearest neighbor classifier that uses a weighted \(L^p\)-metric as a measure of class similarity is examined. For a family of classification problems with smooth distributions in \(\mathbb{R}^n\), an asymptotic expansion for the risk is obtained in decreasing fractional powers of the reference sample size. An analysis of the leading expansion coefficients reveals that the optimal weighted \(L^p\)-metric, that is, the metric that minimizes the finite-sample risk, tends to a weighted Euclidean (i.e., \(L^2)\) metric as the sample size is increased. Numerical simulations corroborate this finding for a pattern recognition problem with normal class-conditional densities.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G20 Asymptotic properties of nonparametric inference
41A60 Asymptotic approximations, asymptotic expansions (steepest descent, etc.)
Full Text: DOI

References:

[1] BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A. and STONE, C. J. 1984. Classification and Regression Trees. Wadsworth & Brooks Cole, Pacific Grove, CA. · Zbl 0541.62042
[2] COVER, T. M. 1968. Rates of convergence of nearest neighbor decision procedures. In Proceedings First Annual Hawaii Conference on Sy stems Theory 413 415.
[3] COVER, T. M. and HART, P. E. 1967. Nearest neighbor pattern classification. IEEE Trans. Inform. Theory 13 21 27. · Zbl 0154.44505 · doi:10.1109/TIT.1967.1053964
[4] DEVROy E, L. 1982. Any discrimination rule can have an arbitrarily bad probability of error for finite sample size. IEEE Trans. Pattern Anal. Machine Intelligence 4 154 157. · Zbl 0484.62072
[5] ERDELy I, A. 1956. Asy mptotic Expansions. Dover, New York. \'
[6] FIX, E. and HODGES, J. L., JR. 1951. Discriminatory analysis nonparametric discrimination: consistency properties. Project 21-49-004, Report No. 4. 261 279. USAF School of Aviation Medicine, Randolf Field, TX.
[7] FRIEDMAN, J. H., BENTLEY, J. L. and FINKEL, R. A. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Software 3 209 226. · Zbl 0364.68037 · doi:10.1145/355744.355745
[8] FUKUNAGA, K. and HUMMELS, D. M. 1987. Bias of nearest neighbor estimates. IEEE Trans. Pattern Anal. Machine Intelligence 9 103 112. · Zbl 0623.62055 · doi:10.1109/TPAMI.1987.4767875
[9] FUKUNAGA, K. and FLICK, T. E. 1984. An optimal global nearest neighbor metric. IEEE Trans. Pattern Anal. Machine Intelligence 6 314 318. · Zbl 0534.62041 · doi:10.1109/TPAMI.1984.4767523
[10] FULKS, W. and SATHER, J. O. 1961. Asy mptotics II: Laplace’s method for multiple integrals. Pacific J. Math. 11 185 192. · Zbl 0143.34804 · doi:10.2140/pjm.1961.11.185
[11] HELLMAN, M. E. 1970. The nearest-neighbor classification rule with a reject option. IEEE Trans. Sy stems Man Cy bernet. 6 179 185. · Zbl 0204.52201
[12] KNUTH, D. E. 1976. Big omicron and big omega and big theta. ACM SIGACT News 8 18 23.
[13] PSALTIS, D., SNAPP, R. R. and VENKATESH, S. S. 1994. On the finite sample performance of the nearest neighbor classifier. IEEE Trans. Inform. Theory 40 820 837. · Zbl 0820.62060 · doi:10.1109/18.335893
[14] SMITH, S. J., BOURGOIN, M. O., SIMS, K. and VOORHEES, H. L. 1994. Handwritten character classification using nearest neighbor in large databases. IEEE Trans. Pattern Anal. Machine Intelligence 16 915 919, 1994.
[15] SNAPP, R. R. and VENKATESH, S. S. 1994. Asy mptotic predictions of the finite-sample risk of the k-nearest-neighbor classifier. In Proceedings of the 12th International Conference on Pattern Recognition 2 1 7. IEEE Computer Society Press, Los Alamitos, CA.
[16] SNAPP, R. R. and VENKATESH, S. S. 1998. Asy mptotic derivation of the finite-sample risk of the k nearest neighbor classifier. Technical Report UVM-CS-1998-0101, Dept. Computer Science, Univ. Vermont.
[17] SNAPP, R. R. and XU, T. 1996. Estimating the Bay es risk from sample data. In Advances Z in Neural Information Processing Sy stems 8 D. S. Touretzky, M. C. Moser, and M. E.. Hasselmo, eds. MIT Press.
[18] STONE, C. J. 1977. Consistent nonparametric regression. Ann. Statist. 5 595 645. · Zbl 0366.62051 · doi:10.1214/aos/1176343886
[19] WATSON, G. N. 1918. The harmonic functions associated with the parabolic cy linder. Proc. London Math. Soc. 17 116 148. · JFM 46.0576.04
[20] BURLINGTON, VERMONT 05405 PHILADELPHIA, PENNSy LVANIA 19104 E-MAIL: snapp@cs.uvm.edu E-MAIL: venkatesh@ee.upenn.edu
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.