Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Frontiers of Information Technology & Electronic Engineering >> 2021, Volume 22, Issue 6 doi: 10.1631/FITEE.2000429

Video summarization with a graph convolutional attention network

Affiliation(s): School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China; The State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China; less

Received: 2020-08-25 Accepted: 2021-07-12 Available online: 2021-07-12

Next Previous

Abstract

has established itself as a fundamental technique for generating compact and concise video, which alleviates managing and browsing large-scale video data. Existing methods fail to fully consider the local and global relations among frames of video, leading to a deteriorated summarization performance. To address the above problem, we propose a graph convolutional attention network (GCAN) for . GCAN consists of two parts, embedding learning and , where embedding learning includes the temporal branch and graph branch. In particular, GCAN uses dilated temporal convolution to model local cues and temporal self-attention to exploit global cues for video frames. It learns graph embedding via a multi-layer to reveal the intrinsic structure of frame samples. The part combines the output streams from the temporal branch and graph branch to create the context-aware representation of frames, on which the importance scores are evaluated for selecting representative frames to generate video summary. Experiments are carried out on two benchmark databases, SumMe and TVSum, showing that the proposed GCAN approach enjoys superior performance compared to several state-of-the-art alternatives in three evaluation settings.

Related Research