BIROn - Birkbeck Institutional Research Online

    Asymmetric 3D Convolutional Neural Networks for Action Recognition

    Yang, H. and Yuan, C. and Li, B. and Du, Y. and Xing, J. and Hu, W. and Maybank, Stephen (2018) Asymmetric 3D Convolutional Neural Networks for Action Recognition. Pattern Recognition 85 , pp. 1-12. ISSN 0031-3203.

    [img]
    Preview
    Text
    Asymmetric_3DCNN_PatternRecognition.pdf - Author's Accepted Manuscript
    Available under License Creative Commons Attribution Non-commercial No Derivatives.

    Download (7MB) | Preview

    Abstract

    Convolutional Neural Network based action recognition methods have achieved significant improvements in recent years. The 3D convolution extends the 2D convolution from operating on one single frame to a video clip, so it is able to extract effective spatial-temporal features for better analysis of human activities in videos. The 3D convolution, however, involves many more parameters than 2D convolution. Thus, it is very expensive on computation, costly on storage, and difficult to learn. In this work, we propose efficient asymmetric one-directional 3D convolutions to approximate the traditional 3D convolution. To improve the feature learning capacity of asymmetric 3D convolutions, we design a set of local 3D convolutional networks, i.e. MicroNets, to incorporate multi-scale 3D convolution branches. Then, we design an asymmetric 3D-CNN deep model which is constructed by MicroNets for the action recognition task. Moreover, to avoid training two networks on RGB and optical flow fields separately as most works do, we propose a simple but effective multi-source enhanced input, which fuses the useful information of the RGB frame and the optical flow field at the pre-processing stage. We evaluate our asymmetric 3D-CNN models on two of the most challenging action recognition benchmarks, UCF-101 and HMDB-51. Our model outperforms all the traditional 3D-CNN models in both effectiveness and efficiency, and is comparable with the recent state-of-the-art action recognition methods on both benchmarks.

    Metadata

    Item Type: Article
    Keyword(s) / Subject(s): Asymmetric 3D Convolution, MicroNets, 3D-CNN, Action Recognition
    School: Birkbeck Schools and Departments > School of Business, Economics & Informatics > Computer Science and Information Systems
    Depositing User: Stephen Maybank
    Date Deposited: 27 Jul 2018 13:49
    Last Modified: 30 Jul 2019 02:33
    URI: http://eprints.bbk.ac.uk/id/eprint/23342

    Statistics

    Downloads
    Activity Overview
    70Downloads
    163Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item Edit/View Item