《1 Introduction》

1 Introduction

The 13th Five-Year Development Plan for National Strategic Emerging Industries states that China should place strategic emerging industries in a more prominent position in economic and social development, vigorously build a new system of modern industries, and promote the sustainable and healthy development of economy and society. Mastering the laws of industrial development and studying and judging the direction of industrial development are very important for promoting the healthy development of strategic emerging industries [1]. Emerging technologies, which are in the early stages of technological development and exhibit outstanding potential economic benefits, are closely related to strategic emerging industries. Emerging technologies can be generated in a variety of ways, and as an important one of them, technology convergence has gained widespread attention [2,3]. In strategic emerging industries, there are emerging technologies that are created through technology convergence. For example, the core technologies of the biomass energy industry are mainly derived from the convergence of technologies in two different fields: biology and energy.

One of the prerequisites for technology convergence is knowledge convergence [4,5], an evolutionary process in which two or more different subject areas communicate with each other to generate knowledge spillovers. As the knowledge flow intensifies, a knowledge spillover creates new research models and knowledge that differs from the existing knowledge in the original scientific field. The spillover finally leads to technology convergence [6–8]. Some scholars have studied technology convergence from the perspective of technology trajectories [9–11], used to explain the dynamic changes of technology and the process of innovation evolution [12]. The cross-infiltration of knowledge across different technology trajectories forms a new paradigm, which generates technological innovation. It is the origin of the emerging technology paradigm and creates emerging industries. It is generally believed that analyzing the development of emerging technologies from the perspective of technology convergence trajectories is an effective way to capture the early trends in emerging industries and understand the process of industrial development. Such research can play a positive role in the development of strategic emerging industries.

The main methodology used in studies on technology convergence and knowledge convergence includes qualitative analysis and quantitative analysis, and the latter is currently the mainstream type of analysis. Quantitative methods are based on data such as publications, patents, and news [13–15]. (1) In terms of citation information in the scientific and technical literature used as the main analysis data, Miao et al. [16] have constructed a knowledge flow network based on patent citation information and International Patent Classification to analyze technology convergence trends; No et al. [17] have proposed an index for the degree of technology convergence based on a patent citation network. (2) In terms of analyzing text data, Qiu et al. [18] have used keywords in publications as the main data and utilized word frequency analysis and co-occurrence analysis to examine the development trend of technology convergence. Kim et al. [19] have used the co-occurrence information of enterprises in different fields to measure the degree of industrial convergence. (3) In terms of utilizing citation information and text information of scientific and technical literature, Kim et al. [20] have used citation analysis and latent semantic analysis to analyze citation networks, titles, abstracts, and claims for patents.

The process of knowledge convergence reflects an active interdisciplinary knowledge flow. As an important carrier of scientific knowledge, a publication, containing rich text information and citation information, is an important piece of data for researching knowledge convergence. Most of the existing studies on technology convergence and knowledge convergence tend to use only text information or citation networks. A small number of studies utilize both of these two types of information. However, their analyses remain mostly at the process level and fail to go deep into the algorithm level, which may lead to certain biases in the research results. Given this limitation, this study uses citation network and text information to conduct a trajectory analysis of knowledge convergence and strives for more comprehensive and valid results. We use a graph neural network (GNN) [21–23] model to encode the citation network, title, and abstract information into vectors for clustering.

Analysis of the technology convergence process helps one understand the generation process, development laws, and development direction of emerging industries among strategic emerging industries. We conduct a multi-case study on the following strategic emerging industries: high-end equipment manufacturing, new-generation information technology (IT), new medicine, and new energy. To provide theoretical support for industrial policy research in China, we study and judge the convergence trends of the four industries.

《2 Methodology》

2 Methodology

Using citation network and text information, we establish a research framework for a trajectory analysis of knowledge convergence, including the three main steps of citation network construction, scientific publication clustering, and convergence trajectory identification. As shown in Fig. 1, this research framework involves methods such as citation network visualization, GNN models, K-means, similarity calculations, and topic models. In this process, we perform clustering and association operations on the publications accumulated over time and then identify trajectories of knowledge convergence.

《Fig. 1》

Fig. 1. Research framework for trajectory analysis of knowledge convergence based on citation network and text information. WOS: Web of Science.

《2.1 Citation network construction and visualization》

2.1 Citation network construction and visualization

The data on scientific publications are collected from the Core Collection in the Web of Science database. The search query is created according to professional knowledge and the analysis requirements. Then, the title, abstract, publication date, references, and other information are extracted for subsequent analysis. As reference data, this study uses Citation-Network Data Analyzer (CDA) software [9] for the citation network construction and visualization.

《2.2 Scientific publication clustering based on a GNN》

2.2 Scientific publication clustering based on a GNN

2.2.1 Graph auto-encoder (GAE)

A graph is a data structure composed of nodes and edges, and nodes often contain features. In a citation network of publications, nodes represent publications and edges represent citation relationships. Further, node features include text, publication date, and the journal of the publication. To facilitate the analysis and calculations, the nodes of the graph are represented by low-dimensional dense vectors, through graph embedding. Traditional graph-embedding algorithms tend to retain the structure information of the graph as much as possible but ignore the information of the nodes [24]. As a deep learning model, a GNN can use the structural and node information of the graph for calculations simultaneously. Therefore, a GNN is advantageous when graph-embedding algorithms are applied [25].

In this study, a GAE [26] based on a GNN is used to obtain embedded node vectors. The data sequences are obtained by slicing based on a year in accumulation form. For example, slicing for 2015 means extracting publications from 2015 and before as a group of data. For each group of data, the adjacency matrix and feature matrix for the citation network and text information are constructed and used as inputs in the GAE to obtain the node vectors (the publication vectors). Therefore, the publication vectors contain both the structural information and the node information of the citation network.

2.2.2 K-means clustering

As a classic unsupervised clustering algorithm, K-means can divide given samples into K clusters, where K is a specifiable parameter [27]. We use K-means to cluster the publications, with the publication vectors as input. The obtained clusters represent the research fields on a convergence trajectory. After the proportion of publications of each scientific field in each cluster is calculated, the clusters with relatively uniform publication proportions in each field are identified as convergence clusters.

《2.3 Convergence trajectory identification》

2.3 Convergence trajectory identification

2.3.1 Cluster association

By data slicing and clustering, K clusters are obtained as nodes for the convergence trajectory of each year. The average values of the publication vectors in a cluster are the cluster vector, which comprehensively represents the text and citation. The transformation of clusters in adjacent years creates the convergence trajectory, and an accumulated citation network is a guarantee of a transformation [9,28]. The similarity between cluster vectors is used to define the transformation relationships between clusters. The clusters with the highest similarity in adjacent years are connected to form the knowledge convergence trajectory. In this study, Euclidean distance is used to measure the similarity between clusters. The smaller the distance is, the higher the similarity.

2.3.2 Keywords of clusters

To identify the research topic represented by the clusters, we adopt a latent Dirichlet allocation (LDA) topic model [29] to extract the keywords. The publications included in a cluster are divided into four parts, according to the field to which the publication belongs and the keywords of these parts together constitute the keywords of the cluster.

《3 Results》

3 Results

We focus on the four industries of high-end equipment manufacturing, new-generation IT, new medicine, and new energy, and specifically investigate numerical control equipment (NCE), IT, biomedicine (BM), and solar photovoltaics (SP), which are the core technology fields of these industries. We use a quantitative method to study the trajectory and degree of convergence of these four technologies.

《3.1 Data collection》

3.1 Data collection

We selected search keywords and searched for publications in the Core Collection of the WOS database to cover the aforementioned technology fields and focus on the development in the strategic emerging industries. The search queries and the number of publications retrieved are shown in Table 1. The search period is from 1997 to 2019, and the search time is November 5, 2019.

《Table 1》

Table 1. Search queries on the four technology fields.

From 1997 to 2019, the publication trends in the four fields of NCE, IT, BM, and SP are as follows (Fig. 2): In NCE, the number of publications has maintained a steady increase. In IT, the number and growth rate of publications have been much higher than those in other fields. In the BM, the growth rate has increased significantly after 2009. Finally in SP, the number of publications was relatively smaller before 2008 but increased rapidly after 2010.

《Fig. 2》

Fig. 2. Trends in the number of publications.

《3.2 Trends in citations》

3.2 Trends in citations

The citation network constructed based on the reference information was visualized using CDA software (Fig. 3). All four fields generated sub-networks of a certain size with a certain number of interdisciplinary citations, which reflected the frequent cross-domain knowledge flow.

《Fig. 3》

Fig. 3. Citation network visualization.

Note : Yellow, blue, green, and red represent intra-disciplinary citation links in the fields of numerical control equipment, information technology, biomedicine, solar photovoltaics, respectively. White is used for interdisciplinary citation links.

To further analyze the citation information, the interdisciplinary citations were counted (Fig. 4). The symbol “<” refers to the direction of knowledge flow in Fig. 4. NCE < IT and IT < NCE were the two cases with the largest numbers of interdisciplinary citations. These numbers were significantly larger than those in the other cases, which reflected the frequent flow of knowledge in these two pairs of fields. There was a relatively large number of citations in the four cases of SP < NCE, SP < BM, IT < SP, and SP < IT, all of which are related to SP. This indicated that the cross-domain knowledge flows involving SP were complex and diverse. NCE < BM and BM < NCE exhibited the smallest number of citations, which showed that there was almost no knowledge flow in these two field pairs.

《Fig. 4》

Fig. 4. Trends in the numbers of interdisciplinary citations.

Note : NCE: numerical control equipment; IT: information technology; BM: biomedicine; SP: solar photovoltaics. The symbol “<” refers to the direction of knowledge flow; for example, NCE < IT refers to a publication in the field of numerical control equipment that is cited in the field of information technology

《3.3 Clustering publications》

3.3 Clustering publications

The aforementioned GAE was used to aggregate the structure information and text information of the citation network, and each publication in the citation network was represented by low-dimensional dense vectors. These vectors were clustered using K-means.

3.3.1 GAE training

The adjacency matrix and feature matrix constructed from the citation network of each year were used as input to the GAE. We used the following training strategies and parameter settings [26]: 200 iterations, a learning rate of 0.01, a 32-dimensional hidden layer and 16-dimensional output vectors. The validation and test sets contained 5% and 10% of the citation relationships (edges), respectively. Each publication (node) was represented as a 16-dimensional vector, which includes both citation and text information. The trained GAE was used to embed the citation network of each year into a low-dimensional space, creating multiple groups of publication vectors.

3.3.2 Clustering

Each group of publication vectors was used as the input of K-means for clustering. Based on the clustering experiments conducted with different number of categories settings, it was found that when the number of categories exceeded 25, multiple clusters with a proportion of less than 1% of the publications were generated. Therefore, the number of categories used in the analysis was chosen to be 25.

《3.4 Identification of knowledge convergence trajectories and analysis》

3.4 Identification of knowledge convergence trajectories and analysis

In the clustering results, clusters in which the numbers of publications in each field were relatively uniform (i.e., at least two fields had more than 15% of the publications in the cluster) were defined as convergence clusters. Based on this definition, in 2019, clusters 4, 10, and 20 were identified as convergence clusters. The knowledge convergence trajectories corresponding to the convergence clusters and the change in the proportion of the publications in each cluster of the trajectories are shown in Figs. 5 and 6. Subsequently, to determine the research topic and its evolution, we used the LDA algorithm to extract the keywords of each cluster. These were subdivided into multiple sub-fields. The order of the keywords in each sub-field depended on the distribution probability of the keywords. In other words, the higher a keyword’s position is, the greater its probability and importance. The keywords for some nodes of the five convergence trajectories in 2016 and 2019 are shown in Table 2.

《Fig. 5》

Fig. 5. Knowledge convergence trajectories.

Note : The nodes are the clusters of a given year, and the node number is based on the results in Table 2.

3.4.1 Evolution of trajectories 1, 2, and 3

Trajectories 1, 2, and 3 began to emerge approximately in 2007, and gradually evolved and merged to form clusters 10 in 2019 (Fig. 5). The evolution of these three trajectories: (1) In trajectory 1, the topic of the IT sub-field was relatively independent, and no convergence had yet occurred from 2006–2008 (Fig. 6). Since 2009, publications in the NCE and IT sub-fields co-occurred, and were the main part of trajectory 1. (2) Before merging with trajectory 3, trajectory 2 was mainly dominated by the publications of the NCE sub-field. After the merging, convergence clusters with uniform proportions of publications in the two sub-fields of NCE and IT were formed. (3) Since the emergence of trajectory 3, the proportions of publications in these two sub-fields have been relatively uniform.

《Fig. 6》

Fig. 6. Changes in the proportions of publications from each field in the convergence trajectories

As can be seen from Table 2, the main keywords of the IT sub-field in these three trajectories were intelligence, data, network, and algorithm. Further, the main keywords of the NCE sub-field were “cutting,” “control,” and “error.” However, the keywords of the three trajectories still had different emphases: the keyword “spindle” was relatively important in trajectory 1, “error” in trajectory 2, and “surface” and “milling” in trajectory 3. Based on the meaning of these keywords and an analysis of the text information of the publications included in the clusters, the main research topics to which the two sub-fields of NCE and IT converged were applications of intelligent algorithms in motion control, error compensation, and process planning.

《Table 2》

Table 2. Cluster Keywords.

3.4.2 Evolution of trajectory 4

Trajectory 4 began to emerge in 2008, and the proportions of publications in the four sub-fields in the early stage were relatively even. However, the impact of data noise was clear. Since 2013, publications in BM and SP have gradually become the main component. As shown in Table 2, the keywords appearing in BM were “synthesis” and “materials,” and those in SP were “energy,” “power,” “cells,”’ and “storage.” One can assume that the research topics to which the two sub-fields of BM and SP converged were synthetic materials in solar energy conversion and energy storage.

3.4.3 Evolution of trajectory 5

Trajectory 5 had already appeared in 1998 but the number of publications was small and growing slowly. Since 2015, the number of publications has increased rapidly, and the proportion of the ones in the SP sub-field became the largest ones. Further, the proportion of publications in the two sub-fields of NCE and IT has increased in the past two years. As seen from Table 2, the keywords appearing in the NCE sub-field were “energy” and “consumption”; in the IT sub-field, these were “data,” “algorithm,” “neural,” and “network,” and in the SP sub-field, they were “photovoltaic,” “energy,” and “power.”

Based on the analysis of the publications containing the keywords in trajectory 5, the research topic to which the two sub-fields of NCE and IT converged were applications of intelligent algorithms in optimizing the energy efficiency of machine tools. The research topic to which the two sub-fields of SP and IT converged was applications of intelligent algorithms in the prediction of photovoltaic power generation. Finally, the research to which the two sub-fields of SP and NCE converged focused on the conversion and control methods of current and voltage, and the related component design.

《4 Discussion》

4 Discussion

《4.1 Convergence of the fields of NCE and IT》

4.1 Convergence of the fields of NCE and IT

The convergence of the fields of NCE and IT was relatively strong. There were four related convergence trajectories and many interdisciplinary citations. The scope of convergence of the two fields was wider, involving multiple sub-fields of NCE, such as motion control, error compensation, process planning, and energy efficiency optimization. This result is consistent with the current overall development trend in the high-end equipment manufacturing industry and new-generation IT industry. The fourth industrial revolution, with its core of intelligent technology, appeared with the urgent need for the transformation and upgrading of China’s manufacturing industry [30]. In the past decade, China has issued many technological innovation policies aimed at promoting the transformation and upgrading of the manufacturing industry, thereby promoting the rapid development of the intelligent manufacturing industry. Therefore, as a typical case of the strong convergence in the development of NCE and IT, the rapid development of intelligent manufacturing confirms the validity of our results and indirectly shows that the proposed method is reasonable.

《4.2 Convergence of the fields of IT and SP》

4.2 Convergence of the fields of IT and SP

The convergence of the fields of IT and SP was relatively weak in strength and breadth. Trajectory 5 was dominated by the field of SP: convergence clusters with IT publications accounting for more than 15% of the total have only emerged recently. The convergence of the two sub-fields began to emerge in 2014, and the growth rate of interdisciplinary citations was accelerating yearly. Benefiting from the new-generation IT’s demand for a distributed power supply for communication equipment in the era of global industrial transition to green technologies, this type of technology convergence has much room for development and application potential.

《4.3 Convergence of the fields of NCE and SP》

4.3 Convergence of the fields of NCE and SP

The convergence of the fields of NCE and SP was similar to that of the fields of IT and SP. Further, in trajectory 5, dominated by the field of SP, convergence clusters with NCE publications accounting for more than 15% of the total have only emerged recently. Driven by the global industrial greening, these two fields have great potential for a strong convergence.

《4.4 Convergence of the fields of BM and SP》

4.4 Convergence of the fields of BM and SP

The convergence of the fields of BM and SP has been developing steadily. From the perspective of the convergence trajectory of the two fields (trajectory 4), the increase in the number of publications and the fluctuation in their proportions were relatively small. In 2019, the publication proportions for BM and SP were 30% and 52%, respectively, and the convergence was quite strong. Therefore, it can be considered that the follow-up convergence development trend will maintain its current state and the degree of convergence will be expected to increase steadily.

《4.5 Convergence of the fields of NCE and BM》

4.5 Convergence of the fields of NCE and BM

The degree of convergence of the fields of NCE and BM was relatively low. The number of interdisciplinary citations between NCE and BM has not exceeded 30 in 2019, and the year-on-year growth rate was also low. In the experimental results, no stable convergence trajectories were identified for these two fields. This indicated that the two fields have not yet shown signs of convergence development at the level of scientific knowledge. However, in reality, the opposite is happening: there are extensive industry cross-applications in the two fields, such as the large demand for the application of NCE in the pharmaceutical industry, involving automated pharmacies, disinfection, storage, and so on. Therefore, when the knowledge bases of two technical fields differ significantly, cross-field market applications may not necessarily lead to the convergence of the technical fields.

《5 Conclusion》

5 Conclusion

This study used trajectory analysis method for knowledge convergence based on citation networks and text information for a multi-case analysis of the four strategic emerging industries of high-end equipment manufacturing, new-generation IT, new medicine, and new energy. Utilizing a GNN model, we clustered publications in various fields and identified five technology convergence trajectories. We showed that the convergence of the fields of NCE and IT as well as BM and SP was relatively strong. Further, the NCE and SP pair, and that of IT and SP, showed a certain degree of convergence but the NCE and BM pair has not yet converged.

The development of China’s strategic emerging industries is inseparable from the further deepening of industrial policy research, effective policy formulation, and time optimization. Therefore, based on the theoretical research results of this study, the following suggestions are proposed for the relevant industries:

《5.1 Encourage the widespread use and networked transformation of NCE》

5.1 Encourage the widespread use and networked transformation of NCE

Strongly promoting the intelligent transformation and upgrading, and the long-term high-quality development of the manufacturing industry has become the foundation of the current era. This study confirmed the rationality and necessity of intelligent manufacturing as the main direction to follow for building a powerful China with strong manufacturing. The competent industry authorities should continue to encourage and support the manufacturing enterprises in the wide use of NCE according to their industrial characteristics and complete the transformation to the digital manufacturing paradigm as soon as possible. Enterprises should be encouraged to combine NCE with new-generation IT and accelerate the transformation from the digital to the networked paradigm.

《5.2 Strengthen the application of new energy technologies》

5.2 Strengthen the application of new energy technologies

The convergence of the SP and IT, and SP and NCE pairs was relatively strong, but the convergence speed was slow. Under the background of strongly promoting industrial energy saving and emission reduction and strengthening environmental protection, it is of great significance to accelerate the convergence and development of new energy technologies such as SP and related application technologies in China. We recommend introducing corresponding preferential policies to encourage the application of SP and other new energy technologies in the manufacturing and communications industries.

《5.3 Enhance the development of digital equipment used in BM》

5.3 Enhance the development of digital equipment used in BM

The degree of convergence of the fields of NCE and BM was relatively low. However, in reality, the development of China’s biopharmaceutical industry is in a steadily growing stage, and NCE is used in the R&D and production of pharmaceuticals to improve production efficiency and create a sterile production environment. Therefore, it is necessary to promote the technology convergence of the fields of BM and NCE. We recommend that the competent industry authorities aim at the practical target, start from the supply and demand aspects, and encourage enterprises in the fields of BM and NCE to cooperate closely to promote technology convergence. We propose corresponding encouragement policies to accelerate the development of China’s BM industry through innovation and progress in special digital equipment for this industry.