BIROn - Birkbeck Institutional Research Online

    STA-CNN: convolutional spatial-temporal attention learning for action recognition

    Yang, H. and Yuan, C. and Zhang, L. and Sun, Y. and Hu, W. and Maybank, Stephen J. (2020) STA-CNN: convolutional spatial-temporal attention learning for action recognition. IEEE Transactions on Image Processing 29 , pp. 5783-5793. ISSN 1057-7149.

    [img]
    Preview
    Text
    STA-CNN-TIP-21229-2019.pdf - Author's Accepted Manuscript

    Download (2MB) | Preview

    Abstract

    Convolutional neural networks have achieved excellent successes for object recognition in still images. However, the improvement of Convolutional Neural Networks over the traditional methods for recognizing actions in videos is not so significant, because the raw videos usually have much more redundant or irrelevant information than still images. In this paper, we propose a Spatial-Temporal Attentive Convolutional Neural Network (STA-CNN) which selects the discriminative temporal segments and focuses on the informative spatial regions automatically. The STA-CNN model incorporates a Temporal Attention Mechanism and a Spatial Attention Mechanism into a unified convolutional network to recognize actions in videos. The novel Temporal Attention Mechanism automatically mines the discriminative temporal segments from long and noisy videos. The Spatial Attention Mechanism firstly exploits the instantaneous motion information in optical flow features to locate the motion salient regions and it is then trained by an auxiliary classification loss with a Global Average Pooling layer to focus on the discriminative non-motion regions in the video frame. The STA-CNN model achieves the state-of-the-art performance on two of the most challenging datasets, UCF-101 (95.8%) and HMDB-51 (71.5%).

    Metadata

    Item Type: Article
    Additional Information: (c) 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
    Keyword(s) / Subject(s): Temporal Attention, Spatial Attention, Convolutional Neural Network, Action Recognition
    School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
    Depositing User: Stephen Maybank
    Date Deposited: 23 Mar 2020 16:17
    Last Modified: 09 Aug 2023 12:47
    URI: https://eprints.bbk.ac.uk/id/eprint/31357

    Statistics

    Activity Overview
    6 month trend
    279Downloads
    6 month trend
    209Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item
    Edit/View Item