Intelligent analysis for software data: research and applications
Over the last few decades, software has been one of the primary drivers of economic growth in the world. Human life depends on reliable software; therefore, the software production process (i.e., software design, development, testing, and maintenance) becomes one of the most important factors to ensure the quality of software. During the production process, large amounts of software data (e.g., source code, bug reports, logs, and user reviews) are generated.With the increase in the complexity of software, how to use software data to improve the performance and efficiency of software production has become a challenge for software developers and researchers. To address this challenge, researchers have used information retrieval, data mining, and machine learning technologies to implement a series of automated tools to improve the efficiency of some important software engineering tasks, such as code search, code summarization, severity/priority prediction, bug localization, and program repair. However, these traditional approaches cannot deeply capture the semantic relations of contextual information and usually ignore the structural information of source code. Therefore, there is still room to improve the performance of these automated software engineering tasks.The word “intelligent” means that we can use a new generation of artificial intelligence (AI) technologies (e.g., deep learning) to design a series of “smart” automated tools to improve the effectiveness and efficiency of software engineering tasks so that developers’ workloads are dramatically reduced.Currently, advancement has been achieved by a new generation of AI approaches, which are well suited to address software engineering problems. We show two classical and popular automated software engineering tasks using “intelligent” analysis technology for software data as follows:1. Intelligent software developmentCode search and summarization can help developers develop quality software and improve efficiency. Code search is a frequent activity in software development that can help developers find suitable code snippets to complete software projects. Developers usually input the descriptions of these snippets as queries to achieve this purpose. However, it is extremely challenging to design a practically useful code search tool. The previous information retrieval based approaches ignored the semantic relationship between the high-level descriptions expressed by natural language and low-level source code, which affects the performance of code search. Different from information retrieval based methods, deep learning technologies can automatically learn feature representations and build mapping relationships between inputs and outputs. Therefore, the performance of code search is improved. Code summarization is the task of automatically generating natural language descriptions of source code, which can help developers understand and maintain software. In traditional automated code summarization work, researchers tend to use the summary template to extract keywords of source code, which ignores the grammar information of source code. At present, neural network technology has developed vigorously. Convolutional neural networks (CNNs), recurrent neural networks (RNNs), transformers, and other deep learning networks are applied to the task of automated code summarization.2. Intelligent software maintenanceSeverity/Priority prediction can automatically recommend appropriate labels to help developers reduce the workload for labeling severity and priority levels, which are the important features of bug reports. Severity shows the serious levels of the reported bugs, while priority indicates which bugs should be first fixed. The prediction task can help developers quickly assign the important bugs to appropriate developers for fixing them so that the efficiency of software maintenance is improved. Traditional approaches usually adopt machine learning technologies such as support vector machine (SVM) and naive Bayes (NB) to predict the severity/priority level. However, these approaches cannot overcome the problem of data imbalance, so the prediction accuracy is not perfect. Some deep learning technologies, such as CNNs and graph convolutional networks (GCNs), can effectively resolve this problem and capture the contextual semantic information of bug reports so that the prediction performance is improved.In this context, we organize a special feature in the journal on intelligent analysis for software data. This special feature covers software architecture recovery, app review analysis, integration testing, software project management, defect prediction, and method rename, as well as related applications. After a rigorous review process, six papers were selected.