Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Frontiers of Information Technology & Electronic Engineering >> 2022, Volume 23, Issue 8 doi: 10.1631/FITEE.2100360

Focused crawling strategies based on ontologies and simulated annealing methods for rainstorm disaster domain knowledge

Affiliation(s): Guangzhou Key Laboratory of Multilingual Intelligent Processing, Guangdong University of Foreign Studies, Guangzhou 510006, China; School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou 510006, China; School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China; Faculty of Science, University of Alberta, Edmonton T6G2H6, Canada; less

Received: 2021-07-23 Accepted: 2022-08-22 Available online: 2022-08-22

Next Previous

Abstract

At present, is a crucial method for obtaining effective domain knowledge from massive heterogeneous networks. For most current focused crawling technologies, there are some difficulties in obtaining high-quality crawling results. The main difficulties are the establishment of topic benchmark models, the assessment of topic relevance of hyperlinks, and the design of crawling strategies. In this paper, we use domain to build a topic benchmark model for a specific topic, and propose a novel multiple-filtering strategy based on local and global (MFSLG). A comprehensive method (CPEM) based on the web text and link structure is introduced to improve the computation precision of topic relevance for unvisited hyperlinks, and a (SA) method is used to avoid the falling into local optima of the search. By incorporating SA into the with MFSLG and CPEM for the first time, two novel strategies based on and SA (FCOSA), including FCOSA with only global (FCOSA_G) and FCOSA with both local and global (FCOSA_LG), are proposed to obtain topic-relevant webpages about s from the network. Experimental results show that the proposed crawlers outperform the other focused crawling strategies on different performance metric indices.

Related Research