BIROn - Birkbeck Institutional Research Online

    Estimating the number of clusters using diversity

    Kingrani, Suneel Kumar and Levene, Mark and Zhang, Dell (2018) Estimating the number of clusters using diversity. Artificial Intelligence Research 7 (1), pp. 15-22. ISSN 1927-6974.

    [img] Text
    clu_num_paper.pdf - Author's Accepted Manuscript
    Restricted to Repository staff only

    Download (871kB)
    20714a.pdf - Published Version of Record
    Available under License Creative Commons Attribution.

    Download (1MB) | Preview


    It is an important and challenging problem in unsupervised learning to estimate the number of clusters in a dataset. Knowing the number of clusters is a prerequisite for many commonly used clustering algorithms such as k-means. In this paper, we propose a novel diversity based approach to this problem. Specifically, we show that the difference between the global diversity of clusters and the sum of each cluster's local diversity of their members can be used as an effective indicator of the optimality of the number of clusters, where the diversity is measured by Rao's quadratic entropy. A notable advantage of our proposed method is that it encourages balanced clustering by taking into account both the sizes of clusters and the distances between clusters. In other words, it is less prone to very small "outlier" clusters than existing methods. Our extensive experiments on both synthetic and real-world datasets (with known ground-truth clustering) have demonstrated that our proposed method is robust for clusters of different sizes, variances, and shapes, and it is more accurate than existing methods (including elbow, Calinski-Harabasz, silhouette, and gap-statistic) in terms of finding out the optimal number of clusters.


    Item Type: Article
    Keyword(s) / Subject(s): clustering, diversity
    School: School of Business, Economics & Informatics > Computer Science and Information Systems
    Research Centres and Institutes: Data Analytics, Birkbeck Institute for
    Depositing User: Dell Zhang
    Date Deposited: 04 Jan 2018 06:54
    Last Modified: 16 Jun 2021 16:42


    Activity Overview
    6 month trend
    6 month trend

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item Edit/View Item