更多

Over the last few decades, software has been one of the primary drivers of economic growth in the world. Human life depends on reliable software; therefore, the software production process (i.e., software design, development, testing, and maintenance) becomes one of the most important factors to ensure the quality of software. During the production process, large amounts of software data (e.g., source code, bug reports, logs, and user reviews) are generated.With the increase in the complexity of software, how to use software data to improve the performance and efficiency of software production has become a challenge for software developers and researchers. To address this challenge, researchers have used information retrieval, data mining, and machine learning technologies to implement a series of automated tools to improve the efficiency of some important software engineering tasks, such as code search, code summarization, severity/priority prediction, bug localization, and program repair. However, these traditional approaches cannot deeply capture the semantic relations of contextual information and usually ignore the structural information of source code. Therefore, there is still room to improve the performance of these automated software engineering tasks.The word “intelligent” means that we can use a new generation of artificial intelligence (AI) technologies (e.g., deep learning) to design a series of “smart” automated tools to improve the effectiveness and efficiency of software engineering tasks so that developers’ workloads are dramatically reduced.Currently, advancement has been achieved by a new generation of AI approaches, which are well suited to address software engineering problems. We show two classical and popular automated software engineering tasks using “intelligent” analysis technology for software data as follows:1. Intelligent software developmentCode search and summarization can help developers develop quality software and improve efficiency. Code search is a frequent activity in software development that can help developers find suitable code snippets to complete software projects. Developers usually input the descriptions of these snippets as queries to achieve this purpose. However, it is extremely challenging to design a practically useful code search tool. The previous information retrieval based approaches ignored the semantic relationship between the high-level descriptions expressed by natural language and low-level source code, which affects the performance of code search. Different from information retrieval based methods, deep learning technologies can automatically learn feature representations and build mapping relationships between inputs and outputs. Therefore, the performance of code search is improved. Code summarization is the task of automatically generating natural language descriptions of source code, which can help developers understand and maintain software. In traditional automated code summarization work, researchers tend to use the summary template to extract keywords of source code, which ignores the grammar information of source code. At present, neural network technology has developed vigorously. Convolutional neural networks (CNNs), recurrent neural networks (RNNs), transformers, and other deep learning networks are applied to the task of automated code summarization.2. Intelligent software maintenanceSeverity/Priority prediction can automatically recommend appropriate labels to help developers reduce the workload for labeling severity and priority levels, which are the important features of bug reports. Severity shows the serious levels of the reported bugs, while priority indicates which bugs should be first fixed. The prediction task can help developers quickly assign the important bugs to appropriate developers for fixing them so that the efficiency of software maintenance is improved. Traditional approaches usually adopt machine learning technologies such as support vector machine (SVM) and naive Bayes (NB) to predict the severity/priority level. However, these approaches cannot overcome the problem of data imbalance, so the prediction accuracy is not perfect. Some deep learning technologies, such as CNNs and graph convolutional networks (GCNs), can effectively resolve this problem and capture the contextual semantic information of bug reports so that the prediction performance is improved.In this context, we organize a special feature in the journal on intelligent analysis for software data. This special feature covers software architecture recovery, app review analysis, integration testing, software project management, defect prediction, and method rename, as well as related applications. After a rigorous review process, six papers were selected.

在软件演化过程中,受开发能力和投入资源限制,软件架构通常难以与代码保持同步更新,导致架构设计与代码产生不一致,对软件维护等工作造成潜在影响。为解决此问题,本文提出一种增量式软件架构恢复技术,即ISAR。该技术首先从变更代码片段中提取依赖信息,然后根据依赖强度分析模块间关联关系,最后基于代码变更与架构更新间的关联关系设计两层分类器以恢复架构。本文基于10个开源项目构建验证实验,结果表明ISAR在架构恢复精度和效率方面优于传统技术。此外,本文发现架构设计文档质量对ISAR架构恢复精度有一定影响,但随着版本迭代逐渐趋于稳定。

应用程序评论中的新兴主题突出了用户在一定时期内关注的主题(如软件漏洞)。准确、及时地识别新兴主题能帮助开发者更有效地更新应用程序。已有文献基于主题模型或聚类方法识别应用程序评论中的新兴主题。然而,由于评论文本长度较短,提供的信息有限,新兴主题识别准确率较低。为解决该问题,提出一种改进的新兴主题识别方法(IETI)。首先采用自然语言处理技术减少评论文本中的噪音数据,然后使用自适应在线双词主题模型识别评论中的新兴主题。最后利用新兴主题中相关的短语和句子解释新兴主题的含义。采用官方更新日志作为新兴主题的评估标准,选择6个常见的应用程序对IETI进行评估。实验结果表明,IETI在识别新兴主题方面优于传统方法,短语标签F1值增量为0.126,句子标签F1值增量为0.061。我们在Github(https://github.com/wanizhou/IETI)上发布了IETI的代码。

集成测试是面向对象软件测试的重要组成部分。传统的类级集成测试顺序研究策略大多围绕如何降低测试成本开展工作,并未考虑赋予可靠性风险较大的节点较高测试优先级,从而影响软件测试效率。本文提出一种兼顾测试成本与测试效率的方法生成集成测试序列。根据软件在不同场景下的运行状态,将其映射成多层动态执行网络(multi-layer dynamic execution network,MDEN)。借助该网络模型与概率风险评估方法为软件中每一个类赋予风险权重。利用成本收益分析方法,在生成测试用例的过程中保证两条原则:为高风险的类赋予较高权重,同时最小化测试桩复杂度。在此基础上,分析测试序列对软件系统总体运行风险的影响,从而提出评估测试序列优劣的度量方案。通过与现有算法的实验对比分析,证明所提算法生成的类级集成测试序列能有效降低测试代价。最后,将所提算法实现为自动生成集成测试序列的开源工具ITOsolution。