BIROn - Birkbeck Institutional Research Online

    A statistical significance testing approach to mining the most informative set of patterns

    Lijffijt, J. and Papapetrou, Panagiotis and Puolamäki, K. (2014) A statistical significance testing approach to mining the most informative set of patterns. Data Mining and Knowledge Discovery 28 (1), pp. 238-263. ISSN 1384-5810.

    Full text not available from this repository.

    Abstract

    Hypothesis testing using constrained null models can be used to compute the significance of data mining results given what is already known about the data. We study the novel problem of finding the smallest set of patterns that explains most about the data in terms of a global p value. The resulting set of patterns, such as frequent patterns or clusterings, is the smallest set that statistically explains the data. We show that the newly formulated problem is, in its general form, NP-hard and there exists no efficient algorithm with finite approximation ratio. However, we show that in a special case a solution can be computed efficiently with a provable approximation ratio. We find that a greedy algorithm gives good results on real data and that, using our approach, we can formulate and solve many known data-mining tasks. We demonstrate our method on several data mining tasks. We conclude that our framework is able to identify in various settings a small set of patterns that statistically explains the data and to formulate data mining problems in the terms of statistical significance.

    Metadata

    Item Type: Article
    Keyword(s) / Subject(s): Data mining algorithms, Pattern mining, Statistical significance testing
    School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
    Depositing User: Administrator
    Date Deposited: 11 Jun 2013 11:25
    Last Modified: 09 Aug 2023 12:33
    URI: https://eprints.bbk.ac.uk/id/eprint/7439

    Statistics

    Activity Overview
    6 month trend
    0Downloads
    6 month trend
    243Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item
    Edit/View Item