Deep learning, which was developed by Hinton and Salakhutdinov
[5], has become the key technology of big data intelligence
[6] and has led to major breakthroughs, such as intelligent driving
[7], smart cities
[8], voice recognition
[9], and information retrieval
[10]. Compared with classical statistical machine-learning methodologies, deep learning, as the core of the big data intelligence method, has a relatively complex model structure. The size and quality of a dataset can significantly affect a deep-learning classifier. Large-scale annotated sample data are required in order to fully optimize the model parameters and obtain superior performance
[11]. In other words, under the existing framework, the performance of a deep-learning model is determined by the scale and quality of the annotated data; this situation also influences the development of the new generation of artificial intelligence. Nevertheless, it is both difficult and expensive to obtain labeled sample data in many real-world applications. For example, a series of long-term and expensive experiments
[12] is required to generate a training sample in biology, which can then be used to train the classifiers with high accuracy. In the field of computerized numerical control (CNC) machine tools, it takes decades to accumulate annotation datasets of a sufficient size, while data on certain specific cases of CNC are rare
[13]. Meanwhile, the implementation of big data methods in CNC can be even more difficult in China, where CNC is still in the developmental stage. In strategic intelligence analysis, the labeling of samples requires close and seamless cooperation among outstanding experts from multiple fields [
14–
20]; thus, it is extremely expensive to obtain sufficiently sized datasets. In addition to the high cost, the data features are complicated and high dimensionality exists. In this situation, the dimensionality of the original feature space is approximately equal to or greater than the number of samples, which is known as “the small sample size problem”
[21]. With a small sample size, deep learning is restricted to good generalization performance. Furthermore, the development of a new generation of artificial intelligence is limited because there are considerably more fields with a small sample size than fields with a big data environment [
22–
26].