BIROn - Birkbeck Institutional Research Online

    Interaction-aware spatio-temporal pyramid attention networks for action classification

    Hu, W. and Liu, H. and Du, Y. and Yuan, C. and Li, B. and Maybank, Stephen (2022) Interaction-aware spatio-temporal pyramid attention networks for action classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (10), pp. 7010-7028. ISSN 0162-8828.

    [img]
    Preview
    Text
    InteractionAwareSpatioTemporal.pdf - Author's Accepted Manuscript

    Download (1MB) | Preview

    Abstract

    For CNN-based visual action recognition, the accuracy may be increased if local key action regions are focused on. The task of self-attention is to focus on key features and ignore irrelevant information. So, self-attention is useful for action recognition. However, the current self-attention methods usually ignore correlations among local feature vectors at spatial positions in feature maps in CNNs. In this paper, we propose an effective interaction-aware self-attention model which can extract information about the interactions between feature vectors to learn attention maps. Since the different layers in a network capture feature maps at different scales, we introduce a spatial pyramid with the feature maps at different layers to attention modeling. The multi-scale information is utilized to obtain more accurate attention scores. These attention scores are used to weight the local feature vectors and the feature maps and then calculate the attention feature maps. Since the number of feature maps input to the spatial pyramid attention layer is unrestricted, we easily extend this attention layer to a spatial-temporal version. Our model can be embedded into any general CNN to form a video-level end-to-end attention network for action recognition. Besides using the RGB stream alone, several methods are investigated to combine the RGB and flow streams for the final prediction of the classes of human actions. Experimental results show that our method achieves state-of-the-art results on the datasets UCF101, HMDB51, Kinetics-400 and untrimmed Charades.

    Metadata

    Item Type: Article
    Additional Information: (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
    Keyword(s) / Subject(s): Action recognition, Attention networks, Interaction-aware, Spatio-temporal pyramid
    School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
    Depositing User: Steve Maybank
    Date Deposited: 13 Oct 2021 18:26
    Last Modified: 09 Aug 2023 12:51
    URI: https://eprints.bbk.ac.uk/id/eprint/45279

    Statistics

    Activity Overview
    6 month trend
    253Downloads
    6 month trend
    89Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item
    Edit/View Item