Large language models based on self-attention mechanisms have achievedastonishing performances not only in natural language itself, but also in avariety of tasks of different nature. However, regarding processing language,our human brain may not operate using the same principle. Then, a debate isestablished on the connection between brain computation and artificialself-supervision adopted in large language models. One of most influentialhypothesis in brain computation is the predictive coding framework, whichproposes to minimize the prediction error by local learning. However, the roleof predictive coding and the associated credit assignment in languageprocessing remains unknown. Here, we propose a mean-field learning model withinthe predictive coding framework, assuming that the synaptic weight of eachconnection follows a spike and slab distribution, and only the distribution istrained. This meta predictive learning is successfully validated on classifyinghandwritten digits where pixels are input to the network in sequence, and onthe toy and real language corpus. Our model reveals that most of theconnections become deterministic after learning, while the output connectionshave a higher level of variability. The performance of the resulting networkensemble changes continuously with data load, further improving with moretraining data, in analogy with the emergent behavior of large language models.Therefore, our model provides a starting point to investigate the physics andbiology correspondences of the language processing and the unexpected generalintelligence.