1. Introduction
New-generation intelligent manufacturing represents a paradigm shift that is characterized by the synergistic integration of next-generation artificial intelligence (enabling technologies) with advanced manufacturing systems (foundational technologies). This paradigm shift has positioned this technology as the cornerstone of an ongoing industrial revolution [
1]. The accelerated maturation of large language models (LLMs) is currently augmenting this transformation and establishing LLMs as dual-function catalysts that simultaneously reinforce core technological capabilities and unlock emergent innovation pathways within human–cyber–physical systems (HCPS). LLMs have demonstrated strong capability and performance in understanding and logical reasoning, content generation, human–computer interaction, and other areas, and they have shown great application potential in industrial scenarios. There are several challenges associated with the application of LLMs in engineering scenarios. Industrial applications demand a high level of reliability for which errors and hallucinations are not acceptable. In addition, the integration of domain-specific knowledge such as technical documentation, operational data, forward design principles, and expert experience remains a complex task [
2]. Some problems are more prominent in fault diagnosis. Faults are usually highly dynamic with diverse and complex causes, and new fault cases are constantly appearing [
3,
4]. Therefore, the effective integration of domain-specific knowledge, complex fault scenarios, continuously generated new fault cases, and dynamic feedback information in LLMs poses a major challenge to the applications of these models to fault diagnosis [
5].
Fault diagnosis is critical to improving the operational reliability and efficiency of computer numerical control (CNC) machine tools, implementing predictive maintenance to extend the life of CNC systems, and ensuring the smooth operation of production lines [
[5],
[6],
[7]]. The traditional fault diagnosis of CNC systems mainly relies on experienced engineers. Although engineer-based diagnosis is direct, it is also time-consuming and dependent on personal experience, with diagnostic effectiveness limited by an engineer’s expertise and experience [
8,
9]. When a system fault occurs, a long diagnostic cycle is often required for maintenance, which can potentially lead to production halts and significant economic losses. The advent of expert systems has enhanced fault diagnosis capabilities, providing near-human expert-level problem-solving abilities in specific domains and the effective handling of knowledge reasoning issues [
10,
11]. However, expert systems are highly dependent on expert knowledge input, and a lack of updates to knowledge or an insufficient knowledge base can affect the accuracy of reasoning [
12,
13]. In addition, expert systems may not be user-friendly in terms of interaction design, making them difficult to understand and difficult to operate for non-experts. The ongoing development of CNC systems and artificial intelligence lays a solid foundation for intelligent fault diagnosis in CNC systems. Fault diagnosis should be based on real-time monitoring, and advancements in modern sensor technology have enabled CNC systems to collect critical performance indicators such as vibration signals and current signals in real-time [
[14],
[15],
[16]]. Furthermore, when a fault occurs in a CNC system, the corresponding alarm information is automatically recorded and transmitted to the backend, thereby providing a solid premise for fault information collection.
The rapid development of LLMs offers new methods and perspectives for fault diagnosis in CNC systems. LLMs are powerful tools for processing vast amounts of data, and they possess advanced language understanding and generation capabilities, which enable them to perform language-related tasks with high efficiency and accuracy [
17]. We believe that retrieval-augmented generation (RAG) is an effective approach for applying LLMs to vertical domains. In particular, the use of RAG with a domain-specific knowledge base is a mainstream method because it can address the hallucination problem in LLMs to some extent and improve the accuracy of question-answering in vertical domains [
18]. Microsoft’s release of GraphRAG validates our design, though we also recognize that general-purpose graphs cannot fully address the specific challenges in our domain. These capabilities are especially important for understanding and responding to the complex queries posed during fault diagnosis.
In this study, first, we designed a domain-specific LLM benchmark to address the issue of model adaptation to the domain. Second, to overcome the challenges of low accuracy in typical RAG responses as well as the difficulty in handling complex tasks, we developed a knowledge graph-based RAG framework. This framework not only resolved the inherent issues in traditional RAG frameworks but also better supported multi-turn dialogues by mapping these dialogues to traversals within the knowledge graph (KG). To implement the knowledge graph-based RAG, we constructed a KG tailored to fault diagnosis scenarios that integrated multi-source data to support fault diagnosis. Then, to meet the demands of different tasks, we designed customized prompt engineering to enhance the model’s ability to accurately respond to diverse fault scenarios based on its understanding of user intent. Based on this foundation, to continuously improve the system’s applicability in vertical domains, we designed a dynamic learning mechanism based on LLMs and expert input. This mechanism allowed the system to continuously learn and conduct optimization through user interactions during actual use, enabling it to update its KG and enhance its diagnostic capabilities. Finally, we integrated the entire system into a remote operation and maintenance system to achieve intelligent fault diagnosis and maintenance support.
2. Literature review
2.1. Applications of LLMs in vertical domains
In vertical domains, the primary task of LLMs is language understanding, which supports tasks such as entity extraction, relationship extraction, and document classification [
19]. These models have had widespread applications in fields such as healthcare and industry [
18,
20]. Currently, prompt-based methods are the dominant approach. For example, in the healthcare field, models such as ERQA [
21], MentaLLaMA [
22], ArgMed-Agents [
23], and diagnostic reasoning prompts [
24] are prominent. In addition, studies have combined LLMs with KGs. These KGs help organize complex information, thereby enhancing the applicability of LLMs [
[25],
[26],
[27]].
Building on prompt-based methods, fine-tuning has also become a common way to apply LLMs to vertical domains. Fine-tuning can deepen the integration of domain knowledge with LLMs, as has been seen in biomedical domain studies [
28]. Notably, in a study by Li et al. [
29], the key challenge was determining how to effectively leverage domain-specific datasets for fine-tuning. Liu et al. [
20] proposed a typical application in manufacturing by combining KGs with LLM fine-tuning, and the KG provided high-quality fine-tuning data to support this process.
In addition, some studies have explored “lower-level” modifications of LLMs through fine-tuning, which were more common in early research and have been proven to improve model adaptability in specific domains. However, as model sizes have increased, pre-training costs have risen significantly. Consequently, researchers working on vertical applications currently tend to focus their efforts on established open-source benchmark models, employing fine-tuning or RAG methods.
Some studies have explored the enhancement of domain-specific capabilities through pre-training. Pre-training has been more commonly seen in earlier research [
30,
31] and has been shown to improve the adaptability of models in specific domains. However, with the increasing sales of models, the cost of pre-training has also risen significantly. As a result, researchers focusing on vertical applications currently tend to concentrate on adapting large models to domain-specific tasks through fine-tuning or RAG methods based on well-established, high-performance open-source benchmark models [
32]. This approach helps to improve the performance of large models in solving domain-specific problems while mitigating the high costs of pre-training.
2.2. Existing intelligent fault diagnosis systems
Early fault diagnosis systems were primarily based on expert systems [
33,
34]. However, expert systems have inherent limitations, particularly in terms of knowledge updating, which is often challenging. Consequently, data-driven and signal analysis-driven methods have become essential tools for fault diagnosis [
5,
35]. With the popularity of machine learning and deep learning methods, these approaches have also been applied to fault diagnoses across various industrial scenarios [
36,
37].
As KG technology has matured, numerous studies have begun to explore knowledge graph-based approaches to fault diagnosis, with typical applications including knowledge graph-based fault diagnosis question-answering systems [
[38],
[39],
[40]]. With the advent of LLMs, researchers have also attempted to combine KGs with LLMs to enhance the intelligence of fault diagnosis systems. For example, in Liu et al.’s study [
41], filtered information from KGs was fed into LLMs to mitigate the “hallucination” issue. Guo et al.’s study [
42] further considered multi-hop path-finding within KGs to support optimal retrieval paths, thereby improving the accuracy of fault diagnosis. These studies have provided valuable inspiration and support for the implementation of an intelligent fault diagnosis system in the CNC domain based on LLMs and KGs.
3. Methodology
3.1. Framework
The intelligent fault diagnosis decision support system proposed in this study is depicted in
Fig. 1. This system is divided into three main parts. Initially, the data foundation stage involved the construction of a comprehensive KG for CNC systems to collect and integrate knowledge relevant to fault diagnosis. Concurrently, advanced data processing techniques were employed to extract features from engineering data, thus enhancing data usability and diagnostic accuracy. Subsequently, the diagnosis based on the KG included the benchmark testing of LLMs and fine-tuning to ensure optimal performance, as well as the local deployment of the model to guarantee real-time responses and data security. Building on this, the system employed knowledge graph-based retrieval enhancement to generate multi-turn question-answering capabilities, which enabled the system to understand and respond accurately to complex diagnostic queries and thereby improve the efficiency and quality of interactions. Finally, the evaluation system and continuous evolution were addressed. We provided a CNC system evaluation dataset to assess the fault diagnosis system and utilized two learning mechanisms for ongoing evolution: interaction content review based on LLMs and knowledge updates based on expert input.
3.2. Fault diagnosis KG
3.2.1. Ontology design for fault diagnosis KG
The construction of the ontology for a KG relies on the integration of domain expert knowledge with data analysis, employing both top-down and bottom-up approaches [
43,
44]. The top-down method utilizes expert systems and existing data schemas to guide the construction of a KG, whereas the bottom-up approach identifies and integrates relevant knowledge through information extraction techniques from semi-structured or unstructured data. These strategies not only fully leverage in-depth domain knowledge but also reflect the importance of data-driven methods in the discovery and verification of new knowledge. In our CNC system fault diagnosis KG project, the design of the ontology also stemmed from the general standards of fault diagnosis and the experience of experts [
45]. We defined seven key entity categories: equipment (machine tool), equipment modules, parameters, alarm numbers/information, phenomena (symptoms), causes, and solutions. The relationship between these entities is shown in
Fig. 2. The structure of the ontology maintained a moderate level of complexity, ensuring basic fault diagnosis capabilities while also being sufficiently generalizable for broader applications. Some equipment modules incorporated engineering data as features, providing robust support for in-depth fault analysis.
3.2.2. Knowledge extraction from PLC ladder diagrams
In a CNC system controlled by programmable logic controller (PLC), faults can be divided into hardware and software categories [
46]. By leveraging the PLC alarms specific to CNC systems, one can rapidly pinpoint and address these faults. The ladder diagram, with its intuitive and debug-friendly design, serves as an effective tool to represent discrete points [
47]. This diagram not only simplifies fault diagnosis by providing structured and visual information but also significantly enhances the management of diagnostic processes and the capability of decision support.
However, the specialized nature of PLC ladder diagrams can present challenges for engineers during fault diagnosis. To address this, this paper proposes a novel method of converting ladder diagrams into textual information and extracting knowledge to construct a KG [
48]. By analyzing the switch states and their corresponding serial and parallel logic within the PLC ladder diagrams, faults were categorized into operational actions and conditions under which faults occurred, which were then extracted as nodes in a KG. By integrating the fault alarm data with the corresponding solutions, a comprehensive KG pathway was constructed. This approach not only facilitated the rapid and accurate localization of faults but also enhanced the effectiveness of subsequent interventions.
3.2.3. Knowledge extraction from historical work orders
In the operation of a CNC system, faults must be handled on-site by engineers, who then fill out detailed fault work orders. These work orders document important fault cases, describing fault phenomena and the resolution strategies of the engineers, serving as an important resource for enriching the fault diagnosis knowledge base. However, these work orders are often unstructured, voluminous, and vary in content quality, posing significant challenges for data processing.
Leveraging the powerful deep semantic understanding capabilities of LLMs, this study employed LLMs and prompting engineering techniques to guide the behavior of the models, effectively extracting key information such as “fault phenomena,” “fault causes,” and “solutions” from historical work orders. By designing templates to guide the model’s output, we ensured that the extracted information aligned with the needs of the KG. Subsequently, through a series of data cleaning and expert review steps, the extracted data were transformed into high-quality structured information to construct and expand the existing fault diagnosis KG.
3.2.4. Feature extraction from engineering data
In the research on CNC system fault diagnosis, feature extraction from engineering data is a crucial step [
49]. Deep learning techniques have become the mainstream method for vibration signal processing and fault analysis, improving the accuracy and efficiency of diagnoses. With the introduction of new technologies such as graph neural networks and physics-based machine learning, this field continues to show immense development potential.
A three-stage engineering data feature extraction strategy was proposed to provide more refined features for CNC system fault diagnosis. The feature extraction framework is shown in
Fig. 3. First, the single-channel feature extraction focused on extracting features from a single sensor signal, which served as the basis for identifying key fault indicators [
50,
51]. Second, multi-channel feature extraction integrated data from different sensors, enabling a broader analysis of the equipment’s status and improving fault prediction accuracy [
52]. Finally, instruction-based multi-channel feature extraction leveraged machine learning algorithms to deeply analyze multi-source data. This not only captured time dependencies but also predicted potential anomalies based on changes in operational instructions, thereby providing more sophisticated fault diagnosis capabilities [
53].
Based on this, we built the associations between engineering data features and device module faults. A training set was created based on historical data for several typical faults, and a fault classification model was trained. In practical fault diagnosis, engineering data features can help more accurately and efficiently pinpoint faulty device modules or alarm information, leading to more precise fault diagnoses.
3.3. LLM-based diagnosis system
3.3.1. KG-based RAG system
In practical industrial applications, conventional RAG models combine external knowledge retrieval with LLMs to improve the accuracy of LLM responses, which partially mitigates the hallucination problem. However, RAG still faces limitations in specific fault diagnosis tasks such as those in CNC systems [
[54],
[55],
[56]]. In CNC system fault diagnosis, while the responses from large models need to be accurate, they also need to be concise so that engineers can quickly use the information for troubleshooting. This necessitates effective organization of the knowledge base. However, due to the complexity of CNC systems, users often require multi-turn interactions to accurately diagnose faults, which demands strong management capabilities for related information in a database to support prolonged and high-quality interactions.
To address these limitations, in this study, an innovative knowledge graph-based RAG method was developed. In this approach, fault diagnosis elements such as “fault symptoms,” “fault causes,” and “solutions” were treated as entities within a KG. When the system received a user alert or descriptive information, it quickly identified the relevant entities and associated subgraphs within the KG through entity and semantic recognition. During multi-turn interactions with the user, the system continuously recognized user intent and feedback, optimizing the traversal direction of paths within the KG to provide more accurate diagnostic support [
57,
58].
Utilizing this method, the system narrowed the scope of the knowledge it retrieved by identifying entities and semantic information from the user input, effectively reducing hallucinations and enhancing the focus and accuracy of the responses. In addition, the system supported user interactions to guide the traversal direction within the KG, better meeting user needs. This improvement significantly enhanced the efficiency and reliability of the model in the domain of fault diagnosis.
3.3.2. Prompt engineering for KG-based RAG
A prompt is used to provide text or instructions to guide a model to generate a specific output. The more specific the prompt instructions are, the more closely the LLMs’ response aligns with user needs [
59]. In this research, we categorized prompts into role prompts and task prompts. The role prompts guided the LLMs to communicate in a specific role by constructing a virtual character with a unique perspective, professional knowledge, and behavioral patterns. For instance, in this project, the artificial intelligence (AI) played the role of a CNC fault diagnosis assistant. We continually added instructions to deepen the LLM’s understanding of its responsibilities. The task prompts, however, were text inputs used to guide or instruct the model to perform a specific task. When generating diagnostic results, we particularly emphasized the importance of precise task prompt settings, especially for KG-directed walks [
60]. These specially crafted prompts were crucial for ensuring the accuracy of diagnostic outputs. We constructed task prompts to constrain the paths of the KG walks and produce answers that were highly relevant to the field of CNC system fault diagnosis.
3.4. Learning mechanisms
A learning mechanism is a crucial component of AI development that enables systems to continuously learn and improve [
61]. In this project, we enhanced the KG by expanding edge attributes and introducing the concept of path weights, thereby allowing the system to identify potential errors or deficiencies during interactions. By analyzing user satisfaction with current answers, the system adjusted the weights of the KG paths accordingly.
The framework of the learning mechanism is shown in
Fig. 4. Specifically, for answers without feedback, we increased the weight of the corresponding path to ensure that paths with higher weights were prioritized in future retrievals. For answers that included feedback, we extracted the paths using the LLM and added these paths to the feedback database [
57,
58]. Engineers regularly reviewed the paths in the feedback database, and those that passed the review process were incorporated into the KG. This approach enabled the continuous growth of the knowledge base, further enhancing the fault diagnosis capabilities.
4. Results and discussion
4.1. LLM benchmark for CNC system
Due to differences in the pre-training processes of LLMs, the adaptability of LLMs to various domains is variable. To assess the application effectiveness of LLMs in the CNC system domain, we constructed a multiple-choice question dataset based on the massive multitask language understanding (MMLU) [
62] method. This dataset, named CNC language understanding (CNCLU), contained 200 questions that covered essential knowledge points, including basic terminology definitions, functional introductions, and operational processes of CNC systems. Each question provided a correct answer along with three options that simulated common misunderstandings and errors, allowing for a more accurate evaluation and comparison of different LLMs in terms of their understanding and the application of professional knowledge in the CNC system. We also introduced the AI2 Reasoning Challenge (AI2-ARC) dataset, which was used to evaluate the basic reasoning capabilities of large models, thus serving as an additional reference for model selection.
In selecting the LLMs for comparison, we focused on evaluating ChatGLM3-6b, GLM4-9b-Chat [
63], Qwen1.5-7b-Chat, Qwen2-7b-Instruct, and Qwen2.5-7b-Instruct [
64] (b: billion), all of which had previously shown strong performance in Chinese language tasks. Considering the balance between accuracy, concurrent resources, and response speed, we chose models with sizes ranging from 6b to 9b for testing. This size selection ensured that the system could meet model concurrency and response time requirements while maximizing diagnostic accuracy. Quantization becomes an important consideration when considering the eventual practical deployment of LLMs. Quantization techniques can significantly reduce the storage and computational requirements of models, making them more suitable for operations in resource-constrained environments. In this study, we evaluated two quantization methods, accurate post-training quantization for generative pre-trained transformers (GPTQ) [
65] and GPT-Generated Unified Format (GGUF), to analyze the impact of quantization on a model’s accuracy and efficiency.
By comparing the performance of these methods in MMLU tasks, AI2-ARC, and CNC system fault diagnosis scenarios, we obtained the performance results shown in
Table 1 and
Fig. 5. First, regarding quantization methods, we observed that although GPTQ reduced the model size, the most significant issue was that it did not compress the inference time. By contrast, GGUF achieved higher compression rates but significantly reduced the inference speed. From the perspective of accuracy, the models using GGUF showed better performance (out of 12 comparable datasets, 11 sets showed better performance with GGUF). For model selection, we observed that for the unquantized models, GLM4-9b-Chat performed the best in both AI2-ARC and CNCLU. However, the newly released Qwen2.5-7b-Instruct performed better in MMLU. For the GGUF-quantized models, we found that Qwen2.5-7b-Instruct and GLM4-9b-Chat performed similarly in AI2-ARC and CNCLU. Since Qwen2.5-7b-Instruct was smaller and more efficient for inferences, we selected the GGUF-quantized Qwen2.5-7b-Instruct as the baseline model.
4.2. Prompt engineering
The role prompts ensured that the system provided generative conversational services as an AI fault diagnosis assistant, thereby enhancing the user experience and improving system security by rejecting questions unrelated to fault diagnosis. The task prompts further refined the user’s queries, enabling the LLM to generate more accurate responses. Upon receiving a query, intent recognition was performed, and the corresponding task prompts were designed based on the identified intent, which might have included fault code and information inquiries, phenomenon inquiries, and feedback. Fault code and phenomena inquiries proceeded to the RAG stage based on the KG, while feedback activated the system’s learning mechanism. This approach allowed the system to more effectively understand and address user needs while continuously learning and performing optimization in real-world applications, thereby enhancing the overall accuracy and efficiency of fault diagnosis.
4.3. KG for RAG
The fault diagnosis decision support system utilized four types of data: fault diagnosis cases, historical work orders, equipment information, and engineering data. Each type of data played a crucial role in constructing the KG and supporting fault diagnosis.
Fault diagnosis cases: The fault diagnosis cases were provided by the Huazhong CNC Division and encompassed a variety of typical fault modes. This dataset included approximately 500 fault diagnosis cases, covering mechanical, electrical, and software-related failures. Each case contained key information such as fault descriptions, diagnosis results, fault types, affected components, and the time of occurrence.
Work orders: The work orders came from the internal records of the company, including equipment maintenance and fault troubleshooting reports. This dataset comprised approximately 5000 work orders, detailing work order numbers, equipment types, maintenance dates, maintenance contents, fault causes, handling procedures, and the personnel involved.
Equipment information: The equipment information was sourced from machine tool manufacturers and their subsequent updates in maintenance logs. The data included equipment models, production dates, operation logs, equipment configurations, maintenance records, and operational status.
Engineering data: The engineering data came from the CNC system’s drive recorder, specifically from the “black box” data recorded in the 10 s preceding an alarm. These data included sensor measurements such as temperature, vibration frequency, current-voltage fluctuations, temperature change rates, and motion speed. These data were critical for fault diagnosis because they provided real-time operational insights that could help detect potential issues and prevent failures.
Based on the fault data provided by the Huazhong CNC Division, we constructed a total of 1549 entities, with 1334 relationships among them. The KG is shown in
Fig. 6. The KG contains both complex graphs and simple paths. These data covered a wide range of fault scenarios, providing a solid foundation for subsequent fault diagnosis and knowledge inference. We used Neo4j as the database for storing and querying the KG, ensuring efficient retrieval and updates. The following diagram illustrates the structure of the KG we constructed and its performance in practical applications.
In this study, to address the limitations exhibited by traditional RAG models in handling user contexts and complex requirements, we proposed an improved RAG model. Traditional RAG models often struggle with accurately diagnosing faults in complex dialogues, particularly in multi-turn interactions and intricate fault scenarios, due to their inability to effectively leverage contextual information. To overcome these challenges, we integrated deep learning techniques with KG technology, proposing an enhanced RAG model based on dynamic subgraph partitioning and a multi-turn dialogue mechanism. Specifically, we used fault phenomenon descriptions and alarm codes as the partitioning criteria to divide the KG into multiple subgraphs, each corresponding to a set of solution paths associated with specific fault causes. This subgraph partitioning not only helped narrow the retrieval scope but also improved the system’s ability to handle complex fault scenarios.
During operation, the system first identified key entities and semantic information in the user’s input to quickly locate the relevant subgraph and provide the most likely fault causes and corresponding solutions, as shown in
Fig. 7(a). The multi-turn fault diagnosis process based on KG is shown in
Fig. 7(b). As the dialogue progressed, the system dynamically adjusted and optimized the solution paths within the subgraph based on user feedback. This approach not only enhanced the coherence of the dialogue but also significantly improved the accuracy of fault diagnosis and repair recommendations through iterative optimization, ensuring that the system better met user needs. In addition, this multi-turn interaction strategy made the system’s performance more stable and reliable in complex scenarios, enabling it to handle a broader range of real-world applications.
Building on this process, engineering data could be used to more accurately and efficiently pinpoint faulty device modules or alarm information, leading to more precise fault diagnosis. Specifically, during system operations, engineering data such as sensor readings and environmental conditions were continuously monitored. Key features were extracted from these data and classified through a fault classification model. The results might point to specific device modules or alarm information, which could then be used to filter the subgraph, improving the efficiency and accuracy of the diagnostic process.
Moreover, the KG continuously integrated new data and feedback from the diagnostic process to refine the association between data features and fault categories. This enabled the system to adapt to changing operational conditions and improve its diagnostic accuracy over time. As a result, the system not only enhanced real-time fault detection capabilities but also gained a deeper understanding of fault patterns, supporting more proactive and preventive long-term maintenance strategies. The fusion of engineering data with KG-based fault diagnosis provided a powerful framework for tackling complex real-world challenges in industrial environments.
4.4. Learning mechanisms
To endow the generative fault diagnosis system based on LLMs with self-learning capabilities and growth potential, this study introduced a learning mechanism designed to continuously optimize and enhance the system’s performance. The system was equipped with a processing workflow that extracted knowledge from the continuously generated fault diagnosis work orders and dynamically integrated new knowledge into the KG, thereby enriching the fault diagnosis case library. This process significantly improved the system’s knowledge base in terms of growth and scalability. Moreover, we optimized the structure of the KG by expanding its connection attributes and introducing the concept of path weighting. Specifically, based on feedback from engineers and users, the weights of the knowledge paths that effectively resolved issues were increased, while the paths identified as problematic by engineers underwent manual review and weight adjustment. This strategy ensured that the knowledge base more accurately reflected real-world conditions and enabled continuous learning and optimization based on user feedback.
We constructed a test dataset consisting of 41 common fault scenarios, with answers generated by the fault diagnosis system. These answers were then rated by experienced engineers and converted into a percentage score. The performance of the system over time is illustrated in
Fig. 8. With the introduction of this learning mechanism, the iterative supplementation and optimization of knowledge allowed the system’s diagnostic capability to surpass that of an engineer with two years of experience. This demonstrated the system’s potential for continuous self-optimization in real-world applications.
In real industrial environments, the fault types and diagnostic requirements may change rapidly, and conflicting information may arise from different user feedback. To address this issue, a panel of experts was assembled in the feedback review section to regularly evaluate user-generated feedback. Once sufficient data were accumulated, a large model-based review filter could be trained to assist experts in the initial screening process.
4.5. Fault diagnosis system
In this research, we developed a generative fault diagnosis decision support system based on LLMs, with the goal of providing users with causes and solutions for CNC system faults. This system was integrated into the CNC Cloud Manager APP by Huazhong CNC, and a connection was established with the CNC system through quick response (QR) code scanning to obtain basic information and the current fault alarm code. When interacting with the system, users could pose specific questions based on actual situations. The system performed an analysis and generated responses using prompt engineering and the RAG method based on the KG. In addition, the system incorporated a learning mechanism that captured and processed effective user feedback, which resulted in the continuous optimization and updating of its KG to improve diagnostic accuracy and efficiency.
The system supported the real-time acquisition of sensor signals and used these signals to assess the operating status of the CNC system, enabling real-time monitoring and maintenance. The system also combined single-channel feature extraction with multi-channel feature extraction, first extracting features from individual sensors through single-channel analysis, and then integrating data from multiple sensors through multi-channel analysis. Finally, the system conducted a comprehensive analysis of the data across multiple time segments using instruction-domain-based multi-channel feature extraction. These features were used as attributes in the machine tool module of the KG, providing critical support for fault diagnosis analysis.
The generative fault diagnosis system based on LLMs also incorporated faults that were caused by abnormal CNC system parameters. The constructed KG included 37 cases of CNC system faults caused by system parameters, with a total of 63 parameters. After providing the cause and solution for these faults, the system allowed users to establish a connection with the CNC system via mobile QR code scanning and download the potentially faulty parameters to the CNC system. The CNC system could then check and modify these parameters, thereby assisting users in locating and resolving faults online.
4.6. Discussion
This study explored the adaptability of different LLMs in the domain of a CNC system and proposed a simple and effective evaluation method. We constructed a multiple-choice question dataset based on the MMLU method to test the abilities of the models to understand and apply domain-specific knowledge. The evaluation results showed that different models had varying levels of adaptability to the domain, with the GLM4-9b model exhibiting the highest suitability for a CNC system. In addition, we conducted quantization tests to enhance model efficiency and reduce computational resource consumption while maintaining model performance, which is crucial for applications in production processes. Effective quantization meant that with the same computational resources, the system could connect to more devices and provide faster fault diagnosis support.
In this study, the use of KGs as a structured form of knowledge representation significantly enhanced the system’s fault diagnosis capabilities, particularly in accurately identifying and responding to complex fault patterns. By integrating role prompts and task prompts into the design strategy, we further improved the overall performance of the system. Building on this foundation, the RAG model based on the KG provided distinct advantages in diagnosing CNC system faults. First, the system was able to deliver concise and accurate answers to specific alarm codes, alarm information, or fault phenomena, thereby eliminating the need for users to sift through lengthy texts to find the “right answer.” Second, the system’s ability to support multi-turn dialogues enabled it to more deeply understand and resolve user issues, significantly enhancing the system’s usability.
The learning mechanism continuously optimized the diagnostic process by analyzing user feedback in real-time, demonstrating exceptional adaptability and improvement. This mechanism not only enabled the system to gradually reduce errors during routine operations but also enhanced its ability to diagnose new types of faults. However, the feedback-based learning process could sometimes be influenced by erroneous feedback, presenting a challenge for the system. To address this, we implemented a two-stage feedback review mechanism. In the first stage, the LLM filtered out a significant amount of non-valuable feedback, aiding engineers in the review process. In the second stage, domain experts further refined this selection, incorporating valuable insights into the KG. Compared with traditional review mechanisms, the introduction of the LLM significantly improved review efficiency, which played a crucial role in the continuous optimization of the KG and the overall improvement of the fault diagnosis system.
5. Conclusions
In this study, an intelligent fault diagnosis system based on LLMs and KGs was successfully developed, validating its effectiveness in CNC systems and demonstrating the feasibility of this approach in practical industrial environments. First, the system integrated multi-source data through KGs, forming a robust data foundation that not only encompassed a wide range of fault cases and related information but also effectively organized multi-dimensional data from CNC systems, providing strong support for efficient fault diagnosis. Second, the system was designed with targeted prompt engineering and employed a knowledge graph-based RAG framework, enabling it to respond to user fault diagnosis requests with high efficiency and accuracy. Third, the introduction of multi-turn dialogue and interactive query capabilities further enhanced the system’s usability and user experience. Building on this foundation, the system incorporated a learning mechanism that continuously optimized its performance by analyzing user feedback, thus demonstrating outstanding learning capabilities and ensuring reliability and effectiveness over long-term use.
This research provides a template and standardized framework for the application of LLMs in the industrial domain, offering significant practical value and laying the groundwork for future similar applications. As discussed earlier, addressing the “hallucination” problem of large models, particularly through the use of the RAG framework, is crucial for the application of large models to specialized domains. Similar applications have been observed in the biomedical field. Building on this foundation, KGs offer distinct advantages. First, many industries already have well-established KGs, and the processed data from these graphs can provide high-density information to enhance the performance of large language models. Second, the linking capability of KGs allows for the retrieval of both broader and more detailed information. Finally, user feedback can be effectively recorded within a KG, leading to continuous improvements in system accuracy and performance throughout a user’s interaction with the system.
Although this study showcased the great potential of LLMs and KGs in industrial applications, there remains room for improvement in addressing more complex industrial scenarios and a broader range of fault types. Future research will focus on further exploring prompt design, fine-tuning strategies, and even the pre-training process of LLMs to achieve broader industrial applications and higher system performance.
CRediT authorship contribution statement
Yuhan Liu: Writing – original draft, Validation, Software, Methodology, Data curation, Conceptualization. Yuan Zhou: Supervision, Resources, Methodology, Conceptualization. Yufei Liu: Writing – review & editing, Supervision, Resources, Project administration, Methodology, Conceptualization. Zhen Xu: Writing – original draft, Validation, Software, Methodology, Data curation. Yixin He: Writing – original draft, Validation, Software, Methodology, Data curation.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research was funded by the National Natural Science Foundation of China (72104224, L2424237, 71974107, L2224059, L2124002, and 91646102), the Beijing Natural Science Foundation (9232015), the Beijing Social Science Foundation (24GLC058); the Construction Project of China Knowledge Center for Engineering Sciences and Technology (CKCEST-2023-1-7), the MOE (Ministry of Education in China) Project of Humanities and Social Sciences (16JDGC011), the Tsinghua University Initiative Scientific Research Program (2019Z02CAU), and the Tsinghua University Project of Volvo-Supported Green Economy and Sustainable Development (20183910020). We thank LetPub for its linguistic assistance during the preparation of this manuscript.