Document Zbl 0861.68025

Retrieving Chinese characters with similar appearance from a reorganized Chinese dictionary. (English) Zbl 0861.68025

Int. J. Inf. Manage. Sci. 7, No. 2, 31-43 (1996).

Summary: Most of the Chinese characters are composed of primitive components (or key tokens) with spatial relationship. If two Chinese characters contain the same primitive components with the same spatial relationship among these components, say that they have similar appearance to each other. This paper presents a hashing-oriented scheme for retrieving Chinese characters with similar appearance from a computerized dictionary (a Chinese character database). Initially, each character in the dictionary is encoded into a set of triples (PC\(_i\), PC\(_j\), REL\(_{ij}\))’s, constructed from the primitive components PC\(_i\) and PC\(_j\), along with their spatial relationship REL\(_{ij}\) and keyin sequence. Associated with these triples, we can construct a set of hashing functions, each corresponding to a predefined spatial relationship. By the constructed hashing functions, one can efficiently retrieve the Chinese characters with similar appearance for to referenced one from the dictionary. The potential extension of the proposed scheme in handling the Chinese key in processing problem is also discussed.

MSC:

68P20	Information storage and retrieval of data
68T50	Natural language processing
68T10	Pattern recognition, speech recognition
68U15	Computing methodologies for text processing; mathematical typography

Keywords:

Cite Review PDF