BIROn - Birkbeck Institutional Research Online

    Fusing R features and local features with context-aware kernels for action recognition

    Yuan, C. and Wu, B. and Li, X. and Hu, W. and Maybank, Stephen J. and Wang, F. (2016) Fusing R features and local features with context-aware kernels for action recognition. International Journal of Computer Vision 118 (2), pp. 151-171. ISSN 0920-5691.

    [img]
    Preview
    Text
    13284.pdf - Author's Accepted Manuscript

    Download (11MB) | Preview

    Abstract

    The performance of action recognition in video sequences depends significantly on the representation of actions and the similarity measurement between the representations. In this paper, we combine two kinds of features extracted from the spatio-temporal interest points with context-aware kernels for action recognition. For the action representation, local cuboid features extracted around interest points are very popular using a Bag of Visual Words (BOVW) model. Such representations, however, ignore potentially valuable information about the global spatio-temporal distribution of interest points. We propose a new global feature to capture the detailed geometrical distribution of interest points. It is calculated by using the 3D R transform which is defined as an extended 3D discrete Radon transform, followed by the application of a two-directional two-dimensional principal component analysis. For the similarity measurement, we model a video set as an optimized probabilistic hypergraph and propose a context-aware kernel to measure high order relationships among videos. The context-aware kernel is more robust to the noise and outliers in the data than the traditional context-free kernel which just considers the pairwise relationships between videos. The hyperedges of the hypergraph are constructed based on a learnt Mahalanobis distance metric. Any disturbing information from other classes is excluded from each hyperedge. Finally, a multiple kernel learning algorithm is designed by integrating the l2 norm regularization into a linear SVM classifier to fuse the R feature and the BOVW representation for action recognition. Experimental results on several datasets demonstrate the effectiveness of the proposed approach for action recognition.

    Metadata

    Item Type: Article
    Additional Information: The final publication is available at Springer via http://dx.doi.org/10.1007/s11263-015-0867-0
    Keyword(s) / Subject(s): Action recognition, Spatio-temporal interest points, 3D R transform, Hypergraph, Context-aware kernel
    School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
    Depositing User: Administrator
    Date Deposited: 02 Nov 2015 13:33
    Last Modified: 09 Aug 2023 12:37
    URI: https://eprints.bbk.ac.uk/id/eprint/13284

    Statistics

    Activity Overview
    6 month trend
    340Downloads
    6 month trend
    248Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item Edit/View Item