×

Retrieving Chinese characters with similar appearance from a reorganized Chinese dictionary. (English) Zbl 0861.68025

Summary: Most of the Chinese characters are composed of primitive components (or key tokens) with spatial relationship. If two Chinese characters contain the same primitive components with the same spatial relationship among these components, say that they have similar appearance to each other. This paper presents a hashing-oriented scheme for retrieving Chinese characters with similar appearance from a computerized dictionary (a Chinese character database). Initially, each character in the dictionary is encoded into a set of triples (PC\(_i\), PC\(_j\), REL\(_{ij}\))’s, constructed from the primitive components PC\(_i\) and PC\(_j\), along with their spatial relationship REL\(_{ij}\) and keyin sequence. Associated with these triples, we can construct a set of hashing functions, each corresponding to a predefined spatial relationship. By the constructed hashing functions, one can efficiently retrieve the Chinese characters with similar appearance for to referenced one from the dictionary. The potential extension of the proposed scheme in handling the Chinese key in processing problem is also discussed.

MSC:

68P20 Information storage and retrieval of data
68T50 Natural language processing
68T10 Pattern recognition, speech recognition
68U15 Computing methodologies for text processing; mathematical typography