×

A statistical domain terminology extraction method based on word length and grammatical feature. (Chinese. English summary) Zbl 1399.68121

Summary: A statistical domain terminology extraction method based on word length and grammatical feature is proposed to resolve the incorrect segmentation of long terminology. Constraint rules based on word length and grammatical feature are added in when machine learning is utilized to extract candidate terminology. When a statistical method is used to determine the domain of candidate terminology, the importance of the concept of word length ratio is fully considered and is used as an important weight for judging the terminology domain. The experiment shows that long terminology can be correctly extracted through this method. Moreover, the precision and recall rate of the proposed extraction method are superior to those of traditional methods.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68T50 Natural language processing
Full Text: DOI