A Soft Subspace Clustering Algorithm with Log-Transformed Distances

  • Received: 01 May 2015 Revised: 01 August 2015 Published: 01 January 2016
  • Entropy weighting used in some soft subspace clustering algorithms is sensitive to the scaling parameter. In this paper, we propose a novel soft subspace clustering algorithm by using log-transformed distances in the objective function. The proposed algorithm allows users to choose a value of the scaling parameter easily because the entropy weighting in the proposed algorithm is less sensitive to the scaling parameter. In addition, the proposed algorithm is less sensitive to noises because a point far away from its cluster center is given a small weight in the cluster center calculation. Experiments on both synthetic datasets and real datasets are used to demonstrate the performance of the proposed algorithm.

    Citation: Guojun Gan, Kun Chen. A Soft Subspace Clustering Algorithm with Log-Transformed Distances[J]. Big Data and Information Analytics, 2016, 1(1): 93-109. doi: 10.3934/bdia.2016.1.93

    Related Papers:

  • Entropy weighting used in some soft subspace clustering algorithms is sensitive to the scaling parameter. In this paper, we propose a novel soft subspace clustering algorithm by using log-transformed distances in the objective function. The proposed algorithm allows users to choose a value of the scaling parameter easily because the entropy weighting in the proposed algorithm is less sensitive to the scaling parameter. In addition, the proposed algorithm is less sensitive to noises because a point far away from its cluster center is given a small weight in the cluster center calculation. Experiments on both synthetic datasets and real datasets are used to demonstrate the performance of the proposed algorithm.


    加载中
    [1] [ C. C. Aggarwal and C. K. Reddy (eds.), Data Clustering:Algorithms and Applications, CRC Press, Boca Raton, FL, USA, 2014.
    [2] [ R. Agrawal, J. Gehrke, D. Gunopulos and P. Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, in SIGMOD Record ACM Special Interest Group on Management of Data, ACM Press, New York, NY, USA, 27(1998), 94-105.
    [3] [ S. Boutemedjet, D. Ziou and N. Bouguila, Model-based subspace clustering of non-gaussian data, Neurocomputing, 73(2010), 1730-1739.
    [4] [ A. Broder, L. Garcia-Pueyo, V. Josifovski, S. Vassilvitskii and S. Venkatesan, Scalable kmeans by ranked retrieval, in Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM '14, ACM, 2014, 233-242.
    [5] [ F. Cao, J. Liang and G. Jiang, An initialization method for the k-means algorithm using neighborhood model, Computers & Mathematics with Applications, 58(2009), 474-483.
    [6] [ M. E. Celebi, H. A. Kingravi and P. A. Vela, A comparative study of efficient initialization methods for the k-means clustering algorithm, Expert Systems with Applications, 40(2013), 200-210.
    [7] [ X. Chen, Y. Ye, X. Xu and J. Z. Huang, A feature group weighting method for subspace clustering of high-dimensional data, Pattern Recognition, 45(2012), 434-446.
    [8] [ M. de Souto, I. Costa, D. de Araujo, T. Ludermir and A. Schliep, Clustering cancer gene expression data:A comparative study, BMC Bioinformatics, 9(2008), 497-510.
    [9] [ C. Domeniconi, D. Gunopulos, S. Ma, B. Yan, M. Al-Razgan and D. Papadopoulos, Locally adaptive metrics for clustering high dimensional data, Data Mining and Knowledge Discovery, 14(2007), 63-97.
    [10] [ B. Duran and P. Odell, Cluster Analysis-A survey, vol. 100 of Lecture Notes in Economics and Mathematical Systems, Springer-Verlage, Berlin, Heidelberg, New York, 1974.
    [11] [ E. Elhamifar and R. Vidal, Sparse subspace clustering:Algorithm, theory, and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2013), 2765-2781.
    [12] [ G. Gan, Data Clustering in C++:An Object-Oriented Approach, Data Mining and Knowledge Discovery Series, Chapman & Hall/CRC Press, Boca Raton, FL, USA, 2011.
    [13] [ G. Gan and M. K.-P. Ng, Subspace clustering using affinity propagation, Pattern Recognition, 48(2015), 1455-1464.
    [14] [ G. Gan and M. K.-P. Ng, Subspace clustering with automatic feature grouping, Pattern Recognition, 48(2015), 3703-3713.
    [15] [ G. Gan and J. Wu, Subspace clustering for high dimensional categorical data, ACM SIGKDD Explorations Newsletter, 6(2004), 87-94.
    [16] [ G. Gan and J. Wu, A convergence theorem for the fuzzy subspace clustering (FSC) algorithm, Pattern Recognition, 41(2008), 1939-1947.
    [17] [ G. Gan, J. Wu and Z. Yang, A fuzzy subspace algorithm for clustering high dimensional data, in Lecture Notes in Artificial Intelligence (eds. X. Li, S. Wang and Z. Dong), vol. 4093, Springer-Verlag, 2006, 271-278.
    [18] [ J. A. Hartigan, Clustering Algorithms, Wiley, New York, NY, 1975.
    [19] [ J. Huang, M. Ng, H. Rong and Z. Li, Automated variable weighting in k-means type clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2005), 657-668.
    [20] [ A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1988.
    [21] [ L. Jing, M. Ng and J. Huang, An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data, IEEE Transactions on Knowledge and Data Engineering, 19(2007), 1026-1041.
    [22] [ H.-P. Kriegel, P. Kröger and A. Zimek, Clustering high-dimensional data:A survey on subspace clustering, pattern-based clustering, and correlation clustering, ACM Transactions on Knowledge Discovery from Data, 3(2009), 1-58.
    [23] [ J. Macqueen, Some methods for classification and analysis of multivariate observations, in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics andProbability (eds. L. LeCam and J. Neyman), vol. 1, University of California Press, Berkely, CA, USA, 1967, 281-297.
    [24] [ J. Pe~na, J. Lozano and P. Larra~naga, An empirical comparison of four initialization methods for the k-means algorithm, Pattern Recognition Letters, 20(1999), 1027-1040.
    [25] [ L. Peng and J. Zhang, An entropy weighting mixture model for subspace clustering of highdimensional data, Pattern Recognition Letters, 32(2011), 1154-1161.
    [26] [ R. Xu and D. Wunsch, Clustering, Wiley-IEEE Press, Hoboken, NJ, 2008.
  • Reader Comments
  • © 2016 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(3093) PDF downloads(521) Cited by(2)

Article outline

Figures and Tables

Figures(4)  /  Tables(14)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog