Abstract
In the 21st-century information age, with the development of big datatechnology, effectively extracting valuable information from massive data hasbecome a key issue. Traditional data mining methods are inadequate when facedwith large-scale, high-dimensional and complex data. Especially when labeleddata is scarce, their performance is greatly limited. This study optimizes datamining algorithms by introducing semi-supervised learning methods, aiming toimprove the algorithm's ability to utilize unlabeled data, thereby achievingmore accurate data analysis and pattern recognition under limited labeled dataconditions. Specifically, we adopt a self-training method and combine it with aconvolutional neural network (CNN) for image feature extraction andclassification, and continuously improve the model prediction performancethrough an iterative process. The experimental results demonstrate that theproposed method significantly outperforms traditional machine learningtechniques such as Support Vector Machine (SVM), XGBoost, and Multi-LayerPerceptron (MLP) on the CIFAR-10 image classification dataset. Notableimprovements were observed in key performance metrics, including accuracy,recall, and F1 score. Furthermore, the robustness and noise-resistancecapabilities of the semi-supervised CNN model were validated throughexperiments under varying noise levels, confirming its practical applicabilityin real-world scenarios.