Abstract
Evaluation in natural language processing guides and promotes research onmodels and methods. In recent years, new evalua-tion data sets and evaluationtasks have been continuously proposed. At the same time, a series of problemsexposed by ex-isting evaluation have also restricted the progress of naturallanguage processing technology. Starting from the concept, com-position,development and meaning of natural language evaluation, this article classifiesand summarizes the tasks and char-acteristics of mainstream natural languageevaluation, and then summarizes the problems and causes of natural languagepro-cessing evaluation. Finally, this article refers to the human languageability evaluation standard, puts forward the concept of human-like machinelanguage ability evaluation, and proposes a series of basic principles andimplementation ideas for hu-man-like machine language ability evaluation fromthe three aspects of reliability, difficulty and validity.