基于机器学习的设计与制造中的数据质量与不平衡——系统综述

Jiarui Xie; Lijun Sun; Yaoyao Fiona Zhao

doi:10.1016/j.eng.2024.04.024

PDF(3459 KB)

工程（英文） ›› 2025, Vol. 45 ›› Issue (2) : 105-131. DOI: 10.1016/j.eng.2024.04.024

研究论文

Review

基于机器学习的设计与制造中的数据质量与不平衡——系统综述

Jiarui Xie ^a ,
Lijun Sun ^a^,^b ,
Yaoyao Fiona Zhao ^a^,^*

作者信息 +

On the Data Quality and Imbalance in Machine Learning-based Design and Manufacturing—A Systematic Review

Jiarui Xie ^a ,
Lijun Sun ^a^,^b ,
Yaoyao Fiona Zhao ^a^,^*

Author information +

History +

Abstract

Machine learning (ML) has recently enabled many modeling tasks in design, manufacturing, and condition monitoring due to its unparalleled learning ability using existing data. Data have become the limiting factor when implementing ML in industry. However, there is no systematic investigation on how data quality can be assessed and improved for ML-based design and manufacturing. The aim of this survey is to uncover the data challenges in this domain and review the techniques used to resolve them. To establish the background for the subsequent analysis, crucial data terminologies in ML-based modeling are reviewed and categorized into data acquisition, management, analysis, and utilization. Thereafter, the concepts and frameworks established to evaluate data quality and imbalance, including data quality assessment, data readiness, information quality, data biases, fairness, and diversity, are further investigated. The root causes and types of data challenges, including human factors, complex systems, complicated relationships, lack of data quality, data heterogeneity, data imbalance, and data scarcity, are identified and summarized. Methods to improve data quality and mitigate data imbalance and their applications in this domain are reviewed. This literature review focuses on two promising methods: data augmentation and active learning. The strengths, limitations, and applicability of the surveyed techniques are illustrated. The trends of data augmentation and active learning are discussed with respect to their applications, data types, and approaches. Based on this discussion, future directions for data quality improvement and data imbalance mitigation in this domain are identified.

Keywords

Machine learning / Design and manufacturing / Data quality / Data augmentation / Active learning

引用本文

EndNote

Ris (Procite)

Bibtex

导出引用

Jiarui Xie, Lijun Sun, Yaoyao Fiona Zhao. 基于机器学习的设计与制造中的数据质量与不平衡——系统综述. Engineering. 2025, 45(2): 105-131 https://doi.org/10.1016/j.eng.2024.04.024

参考文献

原文顺序 | 文献年度倒序 | 文中引用次数倒序

[1]	Kumar P, Bhamu J, Sangwan KS.Analysis of barriers to Industry 4.0 adoption in manufacturing organizations: an ISM approach.Procedia CIRP 2021; 98:85-90.
[2]	Silva N, Barros J, Santos MY, Costa C, Cortez P, Carvalho MS, et al.Advancing logistics 4.0 with the implementation of a big data warehouse: a demonstration case for the automotive industry.Electronics 2021; 10(18):2221.
[3]	Carvalho TP, Soares FA, Vita R, Francisco RP, Basto JP, Alcalá SG.A systematic literature review of machine learning methods applied to predictive maintenance.Comput Ind Eng 2019; 137:106024.
[4]	Wilhelm Y, Reimann P, Gauchel W, Mitschang B.Overview on hybrid approaches to fault detection and diagnosis: combining data-driven, physics-based and knowledge-based models.Procedia CIRP 2021; 99:278-283.
[5]	Fentaye AD, Baheta AT, Gilani SI, Kyprianidis KG.A review on gas turbine gas-path diagnostics: state-of-the-art methods, challenges and opportunities.Aerospace 2019; 6(7):83.
[6]	Fan CM, Lu YP.A Bayesian framework to integrate knowledge-based and data-driven inference tools for reliable yield diagnoses. In: Proceedings of the 2008 Winter Simulation Conference; 2008 Dec 7–10; Miami, FL, USA. Piscataway: IEEE; 2008. p. 2323–9.
[7]	Xie J, Sage M, Zhao YF.Feature selection and feature learning in machine learning applications for gas turbines: a review.Eng Appl Artif Intl 2023; 117:105591.
[8]	Goodfellow I, Bengio Y, Courville A. Deep learning. Natrue, 521 (2015), pp. 436-444
[9]	Liu D, Wang Y.Multi-fidelity physics-constrained neural network and its application in materials modeling.J Mech Des 2019; 141(12):121403.
[10]	Kotsiopoulos T, Sarigiannidis P, Ioannidis D, Tzovaras D.Machine learning and deep learning in smart manufacturing: the smart grid paradigm.Comput Sci Rev 2021; 40:100341.
[11]	Wu J, Qian X, Wang MY.Advances in generative design.Comput Aided Des 2019; 116:102733.
[12]	Jang S, Yoo S, Kang N.Generative design by reinforcement learning: enhancing the diversity of topology optimization designs.Comput Aided Des 2022; 146:103225.
[13]	Zhang C, Xie J, Shanian A, Kibsey M, Zhao YF.A hybrid deep learning approach for the design of 2D low porosity auxetic metamaterials.Eng Appl Artif Intell 2023; 123:106413.
[14]	Xu H, Liu R, Choudhary A, Chen W.A machine learning-based design representation method for designing heterogeneous microstructures.J Mech Des 2015; 137(5):051403.
[15]	Ling C, Kuo W, Xie M.An overview of adaptive-surrogate-model-assisted methods for reliability-based design optimization.IEEE Trans Reliab 2023; 72(3):1243-1264.
[16]	Zhang C, Ridard A, Kibsey M, Zhao YF.Variant design generation and machine learning aided deformation prediction for auxetic metamaterials.Mech Mater 2023; 181:104642.
[17]	Edwards K.Design for manufacturing: a structured approach.Mater Des 2003; 24:157-158.
[18]	Xie J, Saluja A, Rahimizadeh A, Fayazbakhsh K.Development of automated feature extraction and convolutional neural network optimization for real-time warping monitoring in 3D printing.Int J Comput Integr Manuf 2022; 5(8):813-830.
[19]	Zhang Y, Safdar M, Xie J, Li J, Sage M, Zhao YF.A systematic review on data of additive manufacturing for machine learning applications: the data quality, type, preprocessing, and management.J Intell Manuf 2022; 34:3305-3340.
[20]	Yang M, Liu J.In situ monitoring of corrosion under insulation using electrochemical and mass loss measurements.Int J Corrosion 2022; 2022:6681008.
[21]	Yang S, Page T, Zhang Y, Zhao YF.Towards an automated decision support system for the identification of additive manufacturing part candidates.J Intell Manuf 2020; 31(8):1917-1933.
[22]	Saluja A, Xie J, Fayazbakhsh K.A closed-loop in-process warping detection system for fused filament fabrication using convolutional neural networks.J Manuf Process 2020; 58:407-415.
[23]	Yang M, Keshavarz MK, Vlasea M, Molavi-Kakhki A, Laher M.Supersolidus liquid phase sintering of water-atomized low-alloy steel in binder jetting additive manufacturing.Heliyon 2023; 9(3):e13882.
[24]	Chuo YS, Lee JW, Mun CH, Noh IW, Rezvani S, Kim DC, et al.Artificial intelligence enabled smart machining and machine tools.J Mech Sci Technol 2022; 36(1):1-23.
[25]	Xu J, Kovatsch M, Mattern D, Mazza F, Harasic M, Paschke A, et al.A review on AI for smart manufacturing: deep learning challenges and solutions.Appl Sci 2022; 12(16):8239.
[26]	Ito A, Hagström M, Bokrantz J, Skoogh A, Nawcki M, Gandhi K, et al.Improved root cause analysis supporting resilient production systems.J Manuf Syst 2022; 64:468-478.
[27]	Hagemann S, Sünnetcioglu A, Stark R.Hybrid artificial intelligence system for the design of highly-automated production systems.Procedia Manuf 2019; 28:160-166.
[28]	Apostolidis A, Pelt M, Stamoulis KP.Aviation data analytics in MRO operations: prospects and pitfalls. In: Proceedings of the 2020 Annual Reliability and Maintainability Symposium (RAMS); 2020 Jan 27–30; Palm Springs, CA, USA. Piscataway: IEEE; 2020. p. 1–7.
[29]	Williams G, Meisel NA, Simpson TW, McComb C.Design for artificial intelligence: proposing a conceptual framework grounded in data wrangling.J Comput Inf Sci Eng 2022; 22(6):060903.
[30]	Ehrlinger L, Wö Wß.A survey of data quality measurement and monitoring tools.Front Big Data 2022; 5:850611.
[31]	Chandran DR, Gupta V.A short review of the literature on automatic data quality.J Compu Commun 2022; 10(5):55-73.
[32]	Kamm S, Veekati SS, Müller T, Jazdi N, Weyrich M.A survey on machine learning based analysis of heterogeneous data in industrial automation.Comput Ind 2023; 149:103930.
[33]	Lee D, Chen W, Wang L, Chan Y, Chen W.Data-driven design for metamaterials and multiscale systems: a review.Adv Mater 2023; 36(8):2305254.
[34]	Kirianaki NV, Yurish SY, Shpak NO, Deynega VP.Data acquisition and signal processing for smart sensors. Hoboken: Wiley (2002)
[35]	Schmetz A, Lee TH, Zontar D, Brecher C.The time synchronization problem in data-intense manufacturing.Procedia CIRP 2022; 107:827-832.
[36]	Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton J, Axton M, Baak A, et al.The FAIR guiding principles for scientific data management and stewardship.Sci Data 2016; 3(1):160018.
[37]	Simmhan Y, Plale B, Gannon D.A survey of data provenance techniques [dissertation]. Indiana University, Bloomington (2005)
[38]	Askham N, Cook D, Doyle M, Fereday H, Gibson M, Landbeck U, et al.The six primary dimensions for data quality assessment. Report. Olympia: Washington State Board for Community and Technical Colleges. 2013.
[39]	Lawrence ND.Data readiness levels.2017. arXiv: 1705.02245.
[40]	Kenett RS, Shmueli G.Information quality: the potential of data and analytics to generate knowledge.Wiley, Hoboken (2017)
[41]	Gebru T, Morgenstern J, Vecchione B, Vaughan JW, Wallach H, Iii HD, et al.Datasheets for datasets.Commun ACM 2021; 64(12):86-92.
[42]	Bender EM, Friedman B.Data statements for natural language processing: toward mitigating system bias and enabling better science.Trans Assoc Comput Linguist 2018; 6:587-604.
[43]	Arnold M, Bellamy RKE, Hind M, Houde S, Mehta S, et al.FactSheets: increasing trust in AI services through supplier's declarations of conformity. IBM J Res Dev 2019;63:6:1–13.
[44]	Holland S, Hosny A, Newman S, Joseph J, Chmielinski K.The dataset nutrition label: a framework to drive higher data quality standards.2018. arXiv: 1805.03677.
[45]	Alhassan I, Sammon D, Daly M.Data governance activities: an analysis of the literature.J Decis Systems 2016; 25:64-75.
[46]	Lismont J, Vanthienen J, Baesens B, Lemahieu W.Defining analytics maturity indicators: a survey approach.Int J Inf Manage 2017; 37(3):114-124.
[47]	Gökalp MO, Gökalp E, Kayabay K, Ko Açyiğit, Eren PE.Data-driven manufacturing: an assessment model for data science maturity.J Manuf Syst 2021; 60:527-546.
[48]	Rosenbaum S.Data governance and stewardship: designing data stewardship entities and advancing data access.Health Serv Res 2010; 45:1442-1455.
[49]	Endel F, Piringer H.Data wrangling: making data useful again.IFAC-PapersOnLine 2015; 48(1):111-112.
[50]	Meng T, Jing X, Yan Z, Pedrycz W.A survey on machine learning for data fusion.Inform Fusion 2020; 57:115-129.
[51]	Ali H, Salleh M, Saedudin R, Hussain K, Mushtaq M.Imbalance class problems in data mining: a review.Indonesian J Electr Eng Comput Sci 2019; 14(3):1552-1563.
[52]	Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A.A survey on bias and fairness in machine learning.ACM Comput Surv 2021; 54(6):1-35.
[53]	Safdar M, Lamouche G, Paul PP, Wood G, Zhao YF.Feature engineering in additive manufacturing. In: Safdar M, Lamouche G, Paul PP, Wood G, Zhao Y, editors. Engineering of additive manufacturing features for data-driven solutions: sources, techniques, pipelines, and applications. Cham: Springer; 2023. p. 17–43.
[54]	Kim J, Yang Z, Ko H, Cho H, Lu Y.Deep learning-based data registration of melt-pool-monitoring images for laser powder bed fusion additive manufacturing.J Manuf Syst 2023; 68:117-129.
[55]	Shahbazi N, Lin Y, Asudeh A, Jagadish H.A survey on techniques for identifying and resolving representation bias in data.2022. arXiv: 2203.11852.
[56]	Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L. Hutchinson B, et al.Model cards for model reporting. In: Proceedings of the FAT* '19: Conference on Fairness, Accountability, and Transparency; 2019 Jan 29–31; Atlanta, GA, USA. New York City: Association for Computing Machinery; 2019. p. 220–9.
[57]	Zaccaria V, Rahman M, Aslanidou I, Kyprianidis K.A review of information fusion methods for gas turbine diagnostics.Sustainability 2019; 11(22):6202.
[58]	Tan YT, Kunapareddy A, Kobilarov M.Gaussian process adaptive sampling using the cross-entropy method for environmental sensing and monitoring. In: Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA); 2018 May 21–25; Brisbane, QLD, Australia: Piscataway: IEEE; 2018. p. 6220–7.
[59]	Ngoc N, Lasa G, Lriarte L.Human-centred design in Industry 4.0: case study review and opportunities for future research.J Intell Manuf 2022; 33(1):35-76.
[60]	Robert M, Giuliani P, Gurau C.Implementing Industry 4.0 real-time performance management systems: the case of schneider electric.Prod Plan Control 2022; 33(2–3):244-260.
[61]	Leon-Urrutia M, Taibi D, Pospelova V, Splendore S, Urbsiene L, Marjanovic U.Data literacy: an essential skill for the industry. In: Lalic B, Gracanin D, Tasic N, Simeunović N, editors. Proceedings on 18th International Conference on Industrial Systems–IS’20. Cham: Springer; 2022. p. 326–31.
[62]	Verleysen M, François D.The curse of dimensionality in data mining and time series prediction. In: Cabestany J, Prieto A, Sandoval F, editors. Computational intelligence and bioinspired systems. Berlin: Springer; 2005. p. 758–70.
[63]	Lee D, Chan Y, Chen W, Wang L, Chen W.T-METASET: task-aware generation of metamaterial datasets by diversity-based active learning.2022. arXiv: 2202.10565.
[64]	Volponi AJ.Gas turbine engine health management: past, present, and future trends.J Eng Gas Turbines Power 2014; 136(5):051201.
[65]	Wang RY.A product perspective on total data quality management.Commun ACM 1998; 41(2):58-65.
[66]	Günther LC, Colangelo E, Wiendahl HH, Bauer C.Data quality assessment for improved decision-making: a methodology for small and medium-sized enterprises.Procedia Manuf 2019; 29:583-591.
[67]	Wiemer H, Dementyev A, Ihlenfeldt S.A holistic quality assurance approach for machine learning applications in cyber-physical production systems.Appl Sci 2021; 11(20):9590.
[68]	Liewald M, Bergs T, Groche P, Behrens BA, Briesenick D, Müller M, et al.Perspectives on data-driven models and its potentials in metal forming and blanking technologies.Prod Eng 2022; 16(5):607-625.
[69]	Schelter S, Lange D, Schmidt P, Celikel M, Biessmann F, Grafberger A.Automating large-scale data quality verification.Proc VLDB Endow 2018; 11(12):1781-1794.
[70]	Byabazaire J, O GMP’Hare, Delaney DT.End-to-end data quality assessment using trust for data shared IoT deployments.IEEE Sens J 2022; 22(20):19995-20009.
[71]	Zacarias AGV, Reimann P, Mitschang B.A framework to guide the selection and configuration of machine-learning-based data analytics solutions in manufacturing.Procedia CIRP 2018; 72:153-158.
[72]	Frye M, Robert H.Structured data preparation pipeline for machine learning-applications inpro-duction. In: Proceedings of the 17th IMEKO TC 10 and EUROLAB Virtual Conference; 2020 Oct 20–22; Aachen, Germany. London: IMEKO; 2020. p. 241–6.
[73]	Malik S, Rouf R, Mazur K, Kontsos A.The Industry Internet of Things (IIoT) as a methodology for autonomous diagnostics in aerospace structural health monitoring.Aerospace 2020; 7(5):64.
[74]	Bekar ET, Nyqvist P, Skoogh A.An intelligent approach for data pre-processing and analysis in predictive maintenance with an industrial case study. Adv Mech Eng 2020;12(5):1–14.
[75]	Frye M, Gyulai D, Bergmann J, Schmitt RH.Production rescheduling through product quality prediction.Procedia Manuf 2021; 54:142-147.
[76]	Chen Q, Liu Y, Hou S, Duan F, Cai Z.Data-driven methodology for state detection of gearbox in PHM context. In: Proceedings of the 2021 Global Reliability and Prognostics and Health Management (PHM-Nanjing); 2021 Oct 15–17; Nanjing, China. Piscataway: IEEE; 2021. p. 1–6.
[77]	Xie Q, Suvarna M, Li J, Zhu X, Cai J, Wang X.Online prediction of mechanical properties of hot rolled steel plate using machine learning.Mater Des 2021; 197:109201.
[78]	Guo S, Wang D, Feng Z, Guo W.UIR–NET: object detection in infrared imaging of thermomechanical processes in automotive manufacturing.IEEE Trans Autom Sci Eng 2022; 19(4):3276-3287.
[79]	Iantovics LB, En Căchescu.Method for data quality assessment of synthetic industrial data.Sensors 2022; 22(4):1608.
[80]	Segreto T, Teti R.Data quality evaluation for smart multi-sensor process monitoring using data fusion and machine learning algorithms.Prod Eng 2022; 19:197-210.
[81]	Klaproth T, Hornung M.Off-design mission performance prediction for unmanned aerial vehicles based on machine learning. In: Proceedings of the 2022 IEEE Aerospace Conference (AERO); 2022 Mar 5–12; Big Sky, MT, USA. Piscataway: IEEE; 2022. p. 1–13.
[82]	Sen S, Husom EJ, Goknil A, Politaki D, Tverdal S, Nguyen P, et al.Virtual sensors for erroneous data repair in manufacturing a machine learning pipeline.Comput Ind 2023; 149:103917.
[83]	Lee YW, Strong DM, Kahn BK, Wang RY.AIMQ: a methodology for information quality assessment.Inf Manag 2002; 40(2):133-146.
[84]	Kenett RS.Reviewing of applied research with an Industry 4.0 perspective. Report. Rochester: Social Science Research Network. 2020. SSRN scholarly paper ID 3591808.
[85]	Coleman SY, Kenett RS.The information quality framework for evaluating data science programs.Encycl Semant Comput Robot Intell 2018; 2(2):1730001.
[86]	Yang K, Stoyanovich J, Asudeh A, Howe B, Jagadish, HV, Miklau, G. A nutritional label for rankings. In: Proceedings of the 2018 International Conference on Management of Data; 2018 Jul 10–15; Houston, TX, USA. New York City: Association for Computing Machinery; 2018. p.1773–6.
[87]	Stoyanovich J, Howe B.Nutritional labels for data and models.IEEE Tech Comm Data Eng 2019; 42(3):13-23.
[88]	Chmielinski KS, Newman S, Taylor M, Joseph J, Thomas K, Yurkofsky J, et al.The dataset nutrition label (2nd Gen): leveraging context to mitigate harms in artificial intelligence.2022. arXiv: 2201.03954.
[89]	Sun C, Asudeh A, Jagadish HV, Howe B, Stoyanovich J.Mithralabel: flexible dataset nutritional labels for responsible data science. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management; 2019 Nov 3–7; Beijing; China. New York City: Association for Computing Machinery; 2019. p. 2893–6.
[90]	Catania B, Guerrini G, Accinelli C.Fairness & friends in the data science era.AI Soc 2023; 38:721-731.
[91]	Chan YC, Ahmed F, Wang L, Chen W.METASET: exploring shape and property spaces for data-driven metamaterials design.J Mech Des 2021; 143(3):031707.
[92]	Simpson T, Lin D, Chen W.Sampling strategies for computer experiments: design and analysis.International Journal of Reliability and applications 2001; 2(3):209-240.
[93]	Celis L, Vishnoi N.Data preprocessing to mitigate bias: a maximum entropy based approach. In: Proceedings of the 37th International Conference on Machine Learning; 2020 Jul 13–18; online. Cambridge: JMLR; 2020. p. 1349–59.
[94]	[94] Tea KH, Whang SE.Slice tuner: a selective data acquisition framework for accurate and fair machine learning models. In: Proceedings of the 2021 International Conference on Management of Data; 2021 Jun 20–25; Xi'an, China. New York City: Association for Computing Machinery; 2021. p. 1771–83.
[95]	Lin Y, Guan Y, Asudeh A, Jagadish HV.Identifying insufficient data coverage in databases with multiple relations.Proc VLDB Endow 2020; 13(12):2229-2242.
[96]	Asudeh A, Shahbazi N, Jin Z, Jagadish HV.Identifying insufficient data coverage for ordinal continuous-valued attributes. In: Proceedings of the 2021 International Conference on Management of Data; 2021 Jun 20–25; Xi'an, Chinsa. New York: Association for Computing Machinery; 2021. p. 129–41.
[97]	Asudeh A, Jin Z, Jagadish HV.Assessing and remedying coverage for a given dataset. In: Proceedings of the 2019 IEEE 35th International Conference on Data Engineering (ICDE); 2019 Apr 8–11; Macao, China. Piscataway: IEEE; 2019. p. 554–65.
[98]	Verma S, Rubin J.Fairness definitions explained. In: Proceedings of the International Workshop on Software Fairness; 2018 May 29; Gothenburg, Sweden. New York City: Association for Computing Machinery; 2018. p. 1–7.
[99]	Oneto L, Chiappa S.Fairness in machine learning. In: Oneto L, Navarin N, Sperduti A, Anguita D, editors. Recent trends in learning from data. Cham: Springer; 2020. p. 155–96.
[100]	Drosou M, Jagadish HV, Pitoura E, Stoyanovich J.Diversity in big data: a review.Big Data 2017; 5(2):73-84.
[101]	Wang L, Chan YC, Liu Z, Zhu P, Chen W.Data-driven metamaterial design with laplace-beltrami spectrum as “shape-DNA”.Struc Multidiscip Optim 2020; 61(6):2613-2628.
[102]	Brownlee J.Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. San Francisco: Machine Learning Mastery (2020)
[103]	Slater K, Li Y, Wang Y, Shan Y, Liu C.A generative adversarial network (GAN)-assisted data quality monitoring approach for out-of-distribution detection of high dimensional data.Report. Norcross: Institute of Industrial and Systems Engineers; 2023.
[104]	Chang KH.E-design: computer-aided engineering design. Academic Press, New York City (2015)
[105]	Chen W, Ahmed F.MO-PaDGAN: reparameterizing engineering designs for augmented multi-objective optimization.Appl Soft Comput 2021; 113:107909.
[106]	Guyon I, Gunn S, Nikravesh M, Zadeh L.Feature extraction: foundations and applications. Springer, Cham (2008)
[107]	Yazdi RM, Imani F, Yang H.A hybrid deep learning model of process-build interactions in additive manufacturing.J Manuf Syst 2020; 57:460-468.
[108]	Roach DJ, Rohskopf A, Hamel CM, Reinholtz WD, Bernstein R, Qi HJ, et al.Utilizing computer vision and artificial intelligence algorithms to predict and design the mechanical compression response of direct ink write 3D printed foam replacement structures.Addit Manuf 2021; 41:101950.
[109]	Lee H, Lee J.Neural network prediction of sound quality via domain knowledge-based data augmentation and bayesian approach with small data sets.Mech Syst Signal Process 2021; 157:107713.
[110]	De Santo A, Ferraro A, Galli A, Moscato V, Sperl Gì.Evaluating time series encoding techniques for predictive maintenance.Expert Syst Appl 2022; 210:118435.
[111]	Blum AL, Langley P.Selection of relevant features and examples in machine learning.Artif Intell 1997; 97(1–2):245-271.
[112]	Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, et al.Feature selection: a data perspective.ACM Comput Surv 2017; 50(6):1-45.
[113]	Pfingsten T, Herrmann DJL, Schnitzler T, Feustel A, Scholkopf B.Feature selection for troubleshooting in complex assembly lines.IEEE Trans Automn Sci Eng 2007; 4(3):465-469.
[114]	Janssens O, Slavkovikj V, Vervisch B, Stockman K, Loccufier M, Verstockt S, et al.Convolutional neural network based fault detection for rotating machinery.J Sound Vib 2016; 377:331-345.
[115]	Bengio Y, Courville A, Vincent P.Representation learning: a review and new perspectives.IEEE Trans Pattern Anal Mach Intell 2013; 35(8):1798-1828.
[116]	Alasadi SA, Bhaya WS.Review of data preprocessing techniques in data mining.ARPN J Eng Appl Sci 2017; 12(16):4102-4417.
[117]	Chaki J, Dey N.A beginner’s guide to image preprocessing techniques. CRC Press, Boca Raton (2018)
[118]	Singh D, Singh B.Investigating the impact of data normalization on classification performance.Appl Soft Comput 2020; 97:105524.
[119]	Yu L, Zhu J, Zhao Q, Wang Z.An efficient YOLO algorithm with an attention mechanism for vision-based defect inspection deployed on FPGA.Micromachines 2022; 13(7):1058.
[120]	You Z, Gao H, Li S, Guo L, Liu Y, Li J.Multiple activation functions and data augmentation-based lightweight network for in situ tool condition monitoring.IEEE Trans Ind Electron 2022; 69(12):13656-13664.
[121]	Wang Y, Joseph J, Unni TPA, Yamakawa S, Farimani A, Shimada K.Three-dimensional ship hull encoding and optimization via deep neural networks.J Mech Des 2022; 144(10):101701.
[122]	Ruediger-Flore P, Glatt M, Hussong M, Aurich JC.CAD-based data augmentation and transfer learning empowers part classification in manufacturing.Int J Adv Manuf Technol 2023; 125:5065-5118.
[123]	De la Rosa FL, Gómez-Sirvent JL, Sánchez-Reolid R, Morales R, Fernández-Caballero A.Geometric transformation-based data augmentation on defect classification of segmented images of semiconductor materials using a ResNet50 convolutional neural network.Expert Syst Appl 2022; 206:117731.
[124]	Jain S, Seth G, Paruthi A, Soni U, Kumar G.Synthetic data augmentation for surface defect detection and classification using deep learning.J Intell Manuf 2022; 33(4):1007-1020.
[125]	Davtalab O, Kazemian A, Yuan X, Khoshnevis B.Automated inspection in robotic additive manufacturing using deep learning for layer deformation detection.J Intell Manuf 2022; 33(3):771-784.
[126]	Xie Y, Li S, Wu CT, Lai Z, Su M.A novel hypergraph convolution network for wafer defect patterns identification based on an unbalanced dataset.J Intell Manuf 2024; 35:633-646.
[127]	Molitor DA, Kubik C, Becker M, Hetfleisch RH, Lyu F, Groche P.Towards high-performance deep learning models in tool wear classification with generative adversarial networks.J Mater Process Technol 2022; 302:117484.
[128]	Zhang Z, Wen G, Chen S.Weld image deep learning-based on-line defects detection using convolutional neural networks for Al alloy in robotic arc welding.J Manuf Process 2019; 45:208-216.
[129]	Donda K, Zhu Y, Merkel A, Wan S, Assouar B.Deep learning approach for designing acoustic absorbing metasurfaces with high degrees of freedom.Extreme Mech Lett 2022; 56:101879.
[130]	Shi P, Qi Q, Qin Y, Scott PJ, Jiang X.A novel learning-based feature recognition method using multiple sectional view representation.J Intell Manuf 2020; 31(5):1291-1309.
[131]	Dai W, Li D, Tang D, Jiang Q, Wang D, Wang H, et al.Deep learning assisted vision inspection of resistance spot welds.J Manuf Process 2021; 62:262-274.
[132]	Singh SA, Desai KA.Automated surface defect detection framework using machine vision and convolutional neural networks.J Intell Manuf 2023; 34(4):1995-2011.
[133]	Ma G, Yu L, Yuan H, Xiao W, He Y.A vision-based method for lap weld defects monitoring of galvanized steel sheets using convolutional neural network.J Manuf Process 2021; 64:130-139.
[134]	Dong L, Chen W, Yang S, Yu H.A new machine vision–based intelligent detection method for gear grinding burn.Int J Adv Manuf Technol 2023; 125(9–10):4663-4677.
[135]	Tang J, Zhou H, Wang T, Jin Z, Wang Y, Wang X.Cascaded foreign object detection in manufacturing processes using convolutional neural networks and synthetic data generation methodology.J Intell Manuf 2022; 34:2925-2941.
[136]	Wong V, Ferguson M, Law K, Lee Y, Witherell P.Segmentation of additive manufacturing defects using U-Net.J Comput Inf Sci Eng 2022; 22(3):31005.
[137]	Kumaresan S, Aultrin K, Kumar S, Anand M.Deep learning-based weld defect classification using VGG16 transfer learning adaptive fine-tuning.Int J Interact Des Manuf 2023; 17:2999-3010.
[138]	Sha Y, Faber J, Gou S, Liu B, Li W, Schramm S, et al.A multi-task learning for cavitation detection and cavitation intensity recognition of valve acoustic signals.Eng Appl Artif Intell 2022; 113:104904.
[139]	Ye Y, Huang C, Zeng J, Zhou Y, Li F.Shock detection of rotating machinery based on activated time-domain images and deep learning: an application to railway wheel flat detection.Mech Syst Sig Process 2023; 186:109856.
[140]	Li X, Zhang W, Ding Q, Sun JQ.Intelligent rotating machinery fault diagnosis based on deep learning using data augmentation.J Intell Manuf 2020; 31:433-452.
[141]	Becker P, Roth C, Roennau A, Dillmann R.Acoustic anomaly detection in additive manufacturing with long short-term memory neural networks. In: Proceeding of the 2020 IEEE 7th International Conference on Industrial Engineering and Applications (ICIEA); 2020 Apr 16–21; Bangkok, Thailand. Piscataway: IEEE; 2020. p. 921–6.
[142]	Zhang W, Joseph J, Chen Q, Koz C, Xie L, Regmi A, et al.A data augmentation method for data-driven component segmentation of engineering drawings.J Comput Inf Sci Eng 2024; 14(1):011001.
[143]	Lyu Y, Yang Z, Liang H, Zhang B, Ge M, Liu R, et al.Artificial intelligence-assisted fatigue fracture recognition based on morphing and fully convolutional networks.Fatigue Fract Eng Mater Struct 2022; 45(6):1690-1702.
[144]	Martins D, Lima A, Pinto M, Hemerly D, Prego T, Silva F, et al.Hybrid data augmentation method for combined failure recognition in rotating machines.J Intell Manuf 2022; 34:1795-1813.
[145]	Fan SKS, Cheng CW, Tsai DM.Fault diagnosis of wafer acceptance test and chip probing between front-end-of-line and back-end-of-line processes.IEEE Trans Autom Sci Eng 2022; 19(4):3068-3082.
[146]	Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP.SMOTE: synthetic minority over-sampling technique.J Artif Intell Res 2002; 16:321-357.
[147]	Li Y, Shi Z, Liu C, Tian W, Kong Z, Williams CB.Augmented time regularized generative adversarial network (ATR–GAN) for data augmentation in online process anomaly detection.IEEE Trans Autom Sci Eng 2022; 19(4):3338-3355.
[148]	Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al.Generative adversarial networks.Commun ACM 2020; 63(11):139-144.
[149]	Chen W, Ahmed F.PaDGAN: learning to generate high-quality novel designs.J Mech Des 2021; 143(3):031703.
[150]	Nobari AH, Chen W, Ahmed F.PcDGAN: a continuous conditional diverse generative adversarial network for inverse design. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining; 2021 Aug 14–18; Singapore; online. New York City: Association for Computing Machinery; 2021 p. 606–16.
[151]	Yoo Y, Jung UJ, Han YH, Lee J.Data augmentation-based prediction of system level performance under model and parameter uncertainties: role of designable generative adversarial networks (DGAN).Reliab Eng Syst Saf 2021; 206:107316.
[152]	Wu H, Liu X, An W, Lyu H.A generative deep learning framework for airfoil flow field prediction with sparse data.Chinese J Aeronaut 2022; 35(1):470-484.
[153]	Wang J, Yang Z, Zhang J, Zhang Q, Chien WTK.AdaBalGAN: an improved generative adversarial network with imbalanced learning for wafer defective pattern recognition.IEEE Trans Semicond Manuf 2019; 32(3):310-319.
[154]	Alawieh MB, Boning D, Pan DZ.Wafer map defect patterns classification using deep selective learning. In: Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC); 2020 Jul 20–24; San Francisco, CA, USA. Piscataway: IEEE; 2020. p. 1–6.
[155]	Yun JP, Shin WC, Koo G, Kim MS, Lee C, Lee SJ.Automated defect inspection system for metal surfaces based on deep learning and data augmentation.J Manuf Syst 2020; 55:317-324.
[156]	Niu S, Li B, Wang X, Lin H.Defect image sample generation with GAN for improving defect recognition.IEEE Trans Autom Sci Eng 2020; 17(3):1611-1622.
[157]	Li H, Fan R, Shi Q.oversampling and deep forest based minorityclass sensitive fault diagnosis approach. In: Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC); 2020 Oct 11–14; Toronto, ON, Canada. Piscataway: IEEE; 2020. p. 3629–36.
[158]	Li XY, Li J, Qu Y, He D.Semi-supervised gear fault diagnosis using raw vibration signal based on deep learning.Chinese J Aeronaut 2020; 33(2):418-426.
[159]	Behera S, Misra R.Generative adversarial networks based remaining useful life estimation for IIoT.Comput Electr Eng 2021; 92:107195.
[160]	Meister S, Möller N, Stüve J, Groves RM.Synthetic image data augmentation for fibre layup inspection processes: techniques to enhance the data set.J Intell Manuf 2021; 32:1767-1789.
[161]	Wiederkehr P, Finkeldey F, Merhofe T.Augmented semantic segmentation for the digitization of grinding tools based on deep learning.CIRP Annals 2021; 70(1):297-300.
[162]	Che C, Wang H, Fu Q, Ni X.Intelligent fault prediction of rolling bearing based on gate recurrent unit and hybrid autoencoder.Proc Inst Mech Eng C 2021; 235(6):1106-1114.
[163]	Zhou X, Hu Y, Wu J, Liang W, Ma J, Jin Q.Distribution bias aware collaborative generative adversarial network for imbalanced deep learning in industrial IOT.IEEE Trans Ind Inf 2023; 19(1):570-580.
[164]	Yang Z, Zhang M, Chen Y, Hu N, Gao L, Liu L, et al.Surface defect detection method for air rudder based on positive samples.J Intell Manuf 2022; 35(1):99-113.
[165]	Yang C, Liu J, Zhou K, Li X.Dynamic spatial–temporal graph-driven machine remaining useful life prediction method using graph data augmentation.J Intell Manuf 2022; 35:355-366.
[166]	Peng P, Lu J, Xie T, Tao S, Wang H, Zhang H.Open-set fault diagnosis via supervised contrastive learning with negative out-of-distribution data augmentation.IEEE Trans Ind Inf 2023; 19(3):2463-2473.
[167]	Farady I, Lin CY, Chang MC.PreAugNet: improve data augmentation for industrial defect classification with small-scale training data.J Intell Manuf 2024; 35:1233-1246.
[168]	Niu S, Peng Y, Li B, Qiu Y, Niu T, Li W.A novel deep learning motivated data augmentation system based on defect segmentation requirements.J Intell Manuf 2024; 35:687-701.
[169]	Nguyen T, Le T, Vu H, Phung D.Dual discriminator generative adversarial nets.2017. arXiv: 1709.03831.
[170]	Zhu JY, Park T, Isola P, Efros AA.Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceeding of the 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22–29; Venice, Italy. Piscataway: IEEE; 2017. p. 2242–51.
[171]	Figueira A, Vaz B.Survey on synthetic data generation, evaluation methods and GANs.Mathematics 2022; 10(15):2733.
[172]	Anscombe FJ.Graphs in statistical analysis.Am Stat 1973; 27(1):17-21.
[173]	Shmelkov K, Schmid C, Alahari K.How good is my GAN? In: Proceedings of Computer Vision–ECCV 2018; 2018 September 8–14; Munich, Germany. Berlin: Springer; 2018. p. 218–34.
[174]	Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X.Improved techniques for training gans. In: Proceedings of the 30th International Conference on Neural Information Processing Systems; 2016 Dec 5–10; Barcelona, Spain. New York City: Curran Associates Inc.; 2016. p. 2234–42.
[175]	Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S.Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017 Dec 4–9; Long Beach, CA, USA. New York City: Curran Associates Inc.; 2017. p. 6629–40.
[176]	Karras T, Aila T, Laine S, Lehtinen J.Progressive growing of gans for improved quality, stability, and variation.2017. arXiv: 1710.10196.
[177]	Alaa A, Von Breugel B, Saveliev E, van de Schaar M.How faithful is your synthetic data? Sample-level metrics for evaluating and auditing generative models.2022. arXiv: 2102.08921.
[178]	Ho J, Jain A, Abbeel P.Denoising diffusion probabilistic models. In: Proceedings of the 34th International Conference on Neural Information Processing Systems; 2020 Dec 6–12; Vancouver, BC, Canada. New York City: Curran Associates Inc.; 2017. p. 6840–50.
[179]	Trabucco B, Doherty K, Gurinas M, Salakhutdinov R.Effective data augmentation with diffusion models.2023. arXiv: 2302.07944.
[180]	Kebaili A, Lapuyade-Lahorgue J, Ruan S.Deep learning approaches for data augmentation in medical imaging: a review.J Imaging 2023; 9(4):81.
[181]	Xiao Z, Kreis K, Vahdat A.Tackling the generative learning trilemma with denoising diffusion GANs. 2021. arXiv:2112.07804.
[182]	Chlap P, Min H, Vandenberg N, Dowling J, Holloway L, Haworth A.A review of medical image data augmentation techniques for deep learning applications.J Med Imaging Radiat Oncol 2021; 65(5):545-563.
[183]	Kapusuzoglu B, Mahadevan S, Matsumoto S, Miyagi Y, Watanabe D.Adaptive surrogate modeling for high-dimensional spatio-temporal output.Struct Multidiscip Optim 2022; 65(10):300.
[184]	Yang H, Li S, Tabery C, Lin B, Yu B.Bridging the gap between layout pattern sampling and hotspot detection via batch active learning.IEEE Trans Comput-Aided Des Integr Circuits Syst 2020; 40(7):1464-1475.
[185]	Ro Jžanec, Bizjak L, Trajkova E, Zajec P, Keizer J, Fortuna B, et al.Active learning and novel model calibration measurements for automated visual inspection in manufacturing.J Intell Manuf 2023; 35:1963-1984.
[186]	Van Houtum GJJ, Vlasea ML.Active learning via adaptive weighted uncertainty sampling applied to additive manufacturing.Addit Manuf 2021; 48:102411.
[187]	Xiao Y, Su M, Yang H, Chen J, Yu J, Yu B.Low-cost lithography hotspot detection with active entropy sampling and model calibration. In: Proceedings of the 2021 58th ACM/IEEE Design Automation Conference (DAC); 2021 Dec 5–9; San Francisco, CA, USA. Piscataway: IEEE; 2021. p. 907–21.
[188]	Seung H, Opper M, Sompolinsky H. Query by committee. Proceedings of the Fifth Annual Workshop on Computational Learning Theory; 1992 Jul 27–29; Pittsburgh, PA, USA. New York City: Association for Computing Machinery; 1992. p. 287–94.
[189]	Settles B.Active learning literature survey [dissertation]. Madison: University of Wisconsin–Madison; 2009.
[190]	Borodin A.Determinantal point processes.2009. arXiv: 0911.1153.
[191]	Samavatian V, Fotuhi-Firuzabad M, Samavatian M, Dehghanian P, Blaabjerg F.Iterative machine learning-aided framework bridges between fatigue and creep damages in solder interconnections.IEEE Trans Compon Packag Manuf Technol 2022; 12(2):349-358.
[192]	Xie J, Zhang C, Sun L, Zhao YF.Fairness-and uncertainty-aware data generation for data-driven design based on active learning.J Comput Inf Sci Eng 2024; 24(5):051004.
[193]	Zhang H, Chen W, Rondinelli JM, Wei C.et al: entropy-targeted active learning for bias mitigation in materials data.Appl Phys Rev 2023; 10(2):021403.
[194]	Lin Y, Li M, Watanabe Y, Kimura T, Matsunawa T, Nojima S, et al.Data efficient lithography modeling with transfer learning and active data selection.IEEE Trans Comput-Aided Des Integr Circuits Syst 2019; 38(10):1900-1913.
[195]	Shao H, Ping H, Chen K, Su W, Lin C, Fang S, et al.Keeping deep lithography simulators updated: global-local shape-based novelty detection and active learning.IEEE Trans Comput-Aided Des Integr Circuits Syst 2023; 42(3):1000-1014.
[196]	Bull LA, Worden K, Rogers TJ, Wickramarachchi C, Cross EJ, McLeay T, et al.A probabilistic framework for online structural health monitoring: active learning from machining data streams.J Phys Conf Ser 2019; 1264(1):012028.
[197]	Sarkar S, Mondal S, Joly M, Lynch ME, Bopardikar SD, Acharya R, et al.Multifidelity and multiscale Bayesian framework for high-dimensional engineering design and calibration.J Mech Des 2019; 141(12):121001.
[198]	Cui F, Ghosn M.Implementation of machine learning techniques into the subset simulation method.Struct Saf 2019; 79:12-25.
[199]	Shim J, Kang S, Cho S.Active learning of convolutional neural network for cost-effective wafer map pattern classification.IEEE Trans Semicond Manuf 2020; 33(2):258-266.
[200]	Wang Y, Franzon PD, Smart D, Swahn B.Multi-fidelity surrogate-based optimization for electromagnetic simulation acceleration.ACM Trans Des Autom Electron Syst 2020; 25(5):45.
[201]	Yue X, Wen Y, Hunt JH, Shi J.Active learning for gaussian process considering uncertainties with application to shape control of composite fuselage.IEEE Trans Autom Sci Eng 2020; 18(1):36-46.
[202]	Sun Q, Bai C, Geng H, Yu B.Deep neural network hardware deployment optimization via advanced active learning. In: Proceedings of the 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE); 2021 Feb 1–5; Grenoble, France. Piscataway: IEEE; 2021. p. 1510–5.
[203]	Botcha B, Iquebal AS, Bukkapatnam STS.Efficient manufacturing processes and performance qualification via active learning: application to a cylindrical plunge grinding platform.Procedia Manuf 2021; 53:716-725.
[204]	Verduzco JC, Marinero EE, Strachan A.An active learning approach for the design of doped LLZO ceramic garnets for battery applications.Integr Mater Manuf Innov 2021; 10:299-310.
[205]	Cheng J, Jin H.An adaptive extreme learning machine based on an active learning method for structural reliability analysis.J Brazilian Soc Mech Sci Eng 2021; 43(12):546.
[206]	Owoyele O, Pal P.A novel active optimization approach for rapid and efficient design space exploration using ensemble machine learning.J Energy Resour Technol 2021; 143(3):032307.
[207]	Yang S, Lee S, Yee K.Inverse design optimization framework via a two-step deep learning approach: application to a wind turbine airfoil.Eng Comput 2022; 39:2239-2255.
[208]	Zhang Q, Wu Y, Lu L, Qiao P.An adaptive dendrite-HAMR metamodeling technique for high-dimensional problems.J Mech Des 2022; 144(8):081701.
[209]	Xu Y, Zheng Z, Arora K, Senesky D, Wang P.Hall effect sensor design optimization with multi-physics informed gaussian process modeling. In: Proceedings of the International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. 2022 Aug 14–17; St. Louis, MO, USA. New York City: ASME; 2022. p. V03BT03A028.
[210]	Liu Z, Renteria A, Zheng Z, Wang P, Li Y.Design of additively manufactured functionally graded cellular structures. In: Proceedings of the IISE Annual Conference and Expo 2022; 2022 May 21–24; Seattle, WA, USA. Montreal: IISE; 2022.
[211]	Hughes AJ, Bull LA, Gardner P, Barthorpe RJ, Dervilis N, Worden K.On risk-based active learning for structural health monitoring.Mech Syst Signal Process 2022; 167:108569.
[212]	Kolesnikov VI, Pashkov DM, Belyak OA, Guda AA, Danilchenko SA, Manturov DS, et al.Design of double layer protective coatings: finite element modeling and machine learning approximations.Acta Astronaut 2023; 204:869-877.
[213]	Zhu R, Peng W, Wang D, Huang CG.Bayesian transfer learning with active querying for intelligent cross-machine fault prognosis under limited data.Mech Syst Signal Process 2023; 183:109628.
[214]	Wan J, Che Y, Wang Z, Cheng C.Uncertainty quantification and optimal robust design for machining operations.J Comput Inf Sci Eng 2023; 23(1):011005.
[215]	Li Z, Segura LJ, Li Y, Zhou C, Sun H.Multiclass reinforced active learning for droplet pinch-off behaviors identification in inkjet printing.J Manuf Sci Eng 2023; 145(7):071002.
[216]	Hao P, Duan Y, Liu D, Yang H, Liu D, Wang B.Image-driven intelligent prediction of buckling behavior for geometrically imperfect cylindrical shells.AIAA J 2023; 61(5):2266-2280.
[217]	Farrokh M, Fallah MR.Flutter instability boundary determination of composite wings using adaptive support vector machines and optimization.J Brazilian Soc Mech Sci Eng 2023; 45(3):181.
[218]	Luo J, Fu Z, Zhang Y, Fu W, Chen J.Aerodynamic optimization of a transonic fan rotor by blade sweeping using adaptive Gaussian process.Aerosp Sci Technol 2023; 137:108255.
[219]	Pidaparthi B, Missoum S.A multi-fidelity approach for reliability assessment based on the probability of classification inconsistency.J Comput Inf Sci Eng 2023; 23(1):011008.
[220]	Xie J, Zhang C, Sun L, Zhao Y.Fairness-and uncertainty-aware data generation for data-driven design.2023. arXiv: 2309.05842.
[221]	Shorten C, Khoshgoftaar TM.A survey on image data augmentation for deep learning.J Big Data 2019; 6(1):60.
[222]	Niu Z, Yu K, Wu X.LSTM-based VAE–GAN for time-series anomaly detection.Sensors 2020; 20(13):3738.
[223]	Zhang C, Sedal A, Zhao YF.Differentiable surrogate models for design and trajectory optimization of auxetic soft robots. In: Proceedings of the 2023 IEEE International Conference on Soft Robotics (RoboSoft); 2023 Apr 3–7; Singapore. Piscataway: IEEE; 2023. p. 1–8.