×

A constrained \(k\)-means clustering algorithm for classifying spatial units. (English) Zbl 1154.62348

Summary: In some classification problems it may be important to impose constraints on the set of allowable solutions. In particular, in regional taxonomy, urban and regional studies often try to segment a set of territorial data in homogeneous groups with respect to a set of socio-economic variables taking into account, at the same time, contiguous neighbourhoods. The objects in a class are thus required not only to be similar to one another but also to be part of a spatially contiguous set. The rationale behind this is that if a spatially varying phenomenon influences the objects, as could occur in the case of geographical units, and this spatial information were ignored in constructing the classes then it would be less likely to be detected. In this paper a constrained version of the\(k\)-means clustering method and a new algorithm for devising such a procedure are proposed; the latter is based on the efficient algorithm proposed by J. A. Hartigan and M. A. Wong [J. R. Stat. Soc., Ser. C 28, 100–108 (1979; Zbl 0447.62062)]. This algorithm has proved its usefulness in zoning two large regions in Italy (Calabria and Puglia).

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
65C60 Computational problems in statistics (MSC2010)

Citations:

Zbl 0447.62062

Software:

AS 136
Full Text: DOI

References:

[1] Anania G, Cersosimo D, Costanzo GD (2001) Le Calabrie contemporanee. Un’analisi delle caratteristiche degli ambiti economico produttivi sub-regionali. In: Scelte pubbliche, strategie private e sviluppo economico in Calabria. Conoscere per Decidere, Rubbettino, Soveria Mannelli, 333–380
[2] Ball GH, Hall DJ (1967) A clustering technique for summarizing multivariate data. Behavioural, Science12, 153–155 · doi:10.1002/bs.3830120210
[3] Batagelj V (1984) Agglomerative methods in clustering with constrains. Preprint Series Dept. Math. Univ. Ljublijana22 (102), 5–19
[4] Caliñski T, Harabasz J (1974) A dendrite method for cluster analysis. Communications in Statistics3, 1–27 · Zbl 0273.62010 · doi:10.1080/03610927408827101
[5] Christofides N (1975) Graph Theory. Academic Press, London. · Zbl 0321.94011
[6] Cressie NAC (1993) Statistics for spatial data. Wiley, New York
[7] De Soete G, DeSarbo WS, Furnas GW, Carrol JD (1984) The estimation of ultrametric and path trees from rectangular proximity data. Psychometrika49, 289–310 · doi:10.1007/BF02306021
[8] De Soete G, Carrol JD (1994)K-means clustering in a low-dimensional Euclidean space. In: Diday E et al. (eds.) New approaches in classification and data analysis, pp. 212–219. Springer, Berlin Heidelberg New York
[9] DeSarbo WS, Mahajan V (1984) Constrained classification: the use of a priori information in cluster analysis. Psychometrika49, 187–215 · Zbl 0554.62050 · doi:10.1007/BF02294172
[10] Ferligoj A, Batagelj V (1982) Clustering with relational constraint. Psychometrika47, 413–426 · Zbl 0568.62059 · doi:10.1007/BF02293706
[11] Ferligoj A, Batagelj V (1983) Some types of clustering with relational constraint. Psychometrika48, 541–522. · Zbl 0532.62038 · doi:10.1007/BF02293878
[12] Ferligoj A, Batagelj V (1992) Direct multicriteria clustering algorithms. Journal of Classification9 (1), 43–61 · Zbl 0756.92027 · doi:10.1007/BF02618467
[13] Ferligoj A, Batagelj, V (1998) Constrained clustering problems. In: Proceedings of IFCS ’98, Rome, 541–522 · Zbl 1051.91522
[14] Ferligoj A, Batagelj V (2000). Clustering relational data. In: Gaul W, Opitz O., Schader M (eds.) Data analysis, Springer, Berlin heidelberg New York, 3–15
[15] Gordon AD (1973) Classifications in the presence of constraints. Biometrics29, 821–827 · doi:10.2307/2529148
[16] Gordon AD (1980) Methods of constrained classification. In: Tomassone R (ed.) Analyse de données et informatique. (INRIA, Le Chesnay), 149–160.
[17] Gordon AD (1999) Classification. Chapmann & Hall, London
[18] Gordon AD (1987) Parsimonious trees. Journal of Classification4, 85–101 · Zbl 0612.62090 · doi:10.1007/BF01890077
[19] Gordon AD (1996) A survey of constrained classification. Computational Statistics & Data Analysis21, 17–29 · Zbl 0900.62313 · doi:10.1016/0167-9473(95)00005-4
[20] Gordon AD (1996) (a). How many clusters? An Investigation of five procedures for detecting nested cluster structure. In: Hayashi C et al. (eds.) Data science, classification, and related methods. Berlin Heidelberg New York, Springer, 109–116
[21] Gordon AD, Vichi M (2001) Fuzzy partition models for fitting a set of partitions.Psychometrika 66, 229–248 · Zbl 1293.62243 · doi:10.1007/BF02294837
[22] Harary F (1969) Graph theory Addison-Wesley, Reading, MA
[23] Hartigan JA (1975) Clustering algorithms Wiley, New York
[24] Hartigan JA, Wong MA (1979) Algorithm AS 136: Ak-means clustering algorithm. Applied Statistics28 (1), 100–108 · Zbl 0447.62062 · doi:10.2307/2346830
[25] Hubert LJ (1974) Some applications of graph theory to clustering. Psychometrika39 (3), 283–308 · Zbl 0317.62079 · doi:10.1007/BF02291704
[26] Lebart L (1978) Programme d’agrégation avec contraintes. Le Cahiers de l’Analyse des Données3, 275–287
[27] Lechevallier Y (1980) Classification sous contraintes. In: Diday E et al. (eds.) Optimisation en classification automatique INRIA, Paris, 677–696
[28] Lefkovitch LP (1980) Conditional clustering. Biometrics36, 43–58 · Zbl 0424.62041 · doi:10.2307/2530494
[29] Legendre P (1987) Constrained clustering. In: Legendre P et al. (eds.) Developments in numerical ecology. Springer, Berlin Heidelberg New York · Zbl 0633.92016
[30] MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: LeCam LM et al. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematic, Statistics and Probability, vol. 1, Statistics, University of California Press, Berkeley, CA, 281–298 · Zbl 0214.46201
[31] Maravalle M, Simeone B, Naldini, R (1997). Clustering on trees. Computational Statistics & Data Analysis24, 217–234 · Zbl 0900.62327 · doi:10.1016/S0167-9473(96)00062-X
[32] Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika50, 159–179 · doi:10.1007/BF02294245
[33] Mills G (1967) The determination of local government boundaries. Operational Research Quarterly18, 243–255 · doi:10.1057/jors.1967.41
[34] Monestiez P (1977) Méthode de classification automatique sous contraintes spatiales. Statistique et Analyse des Données3, 75–84
[35] Murtagh F (1985) A survey of algorithms for contiguity-constrained clustering and related problems. Computer Journal28, 82–88 · doi:10.1093/comjnl/28.1.82
[36] Openshaw S (1977) A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modelling. Transaction of the Institute of British Geographers52, 247–258
[37] Seber GAF (1984) Multivariate observations, Wiley, New York · Zbl 0627.62052
[38] Späth H (1980) Cluster analysis algorithms. Ellis Horwood, Chichester
[39] Taylor PJ (1973) Some implications of the spatial organizations of elections. Transaction of the Institute of British Geographers60, 121–136 · doi:10.2307/621509
[40] Upton G, Fingleton B (1985) Spatial data analysis by example, vol. 1, Wiley, New York · Zbl 0646.62085
[41] Vicari D (1990) Indici per la scelta del numero dei gruppi. Metron49, 473–492
[42] Webster R (1977) Quantitative and numerical methods in soil classification and survey. Clarendon Press, Oxford New York
[43] Wilson RJ (1996) Introduction to graph theory. Addison Wesley Longman, England
[44] Zani S (1993) Classificazione di unità territoriali e spaziali. In: Zani S (ed.) Metodi statistici per le analisi territoriali. Franco Angeli, Milano, 93–121
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.