Retrieving Chinese characters with similar appearance from a reorganized Chinese dictionary. (English) Zbl 0861.68025
Summary: Most of the Chinese characters are composed of primitive components (or key tokens) with spatial relationship. If two Chinese characters contain the same primitive components with the same spatial relationship among these components, say that they have similar appearance to each other. This paper presents a hashing-oriented scheme for retrieving Chinese characters with similar appearance from a computerized dictionary (a Chinese character database). Initially, each character in the dictionary is encoded into a set of triples (PC\(_i\), PC\(_j\), REL\(_{ij}\))’s, constructed from the primitive components PC\(_i\) and PC\(_j\), along with their spatial relationship REL\(_{ij}\) and keyin sequence. Associated with these triples, we can construct a set of hashing functions, each corresponding to a predefined spatial relationship. By the constructed hashing functions, one can efficiently retrieve the Chinese characters with similar appearance for to referenced one from the dictionary. The potential extension of the proposed scheme in handling the Chinese key in processing problem is also discussed.
MSC:
68P20 | Information storage and retrieval of data |
68T50 | Natural language processing |
68T10 | Pattern recognition, speech recognition |
68U15 | Computing methodologies for text processing; mathematical typography |