Export file:


  • RIS(for EndNote,Reference Manager,ProCite)
  • BibTex
  • Text


  • Citation Only
  • Citation and Abstract

A Soft Subspace Clustering Algorithm with Log-Transformed Distances

1. Department of Mathematics University of Connecticut 196 Auditorium Rd, Storrs, CT 06269-3009, USA;
2. Department of Statistics University of Connecticut 215 Glenbrook Road, Storrs, CT 06269, USA

Entropy weighting used in some soft subspace clustering algorithms is sensitive to the scaling parameter. In this paper, we propose a novel soft subspace clustering algorithm by using log-transformed distances in the objective function. The proposed algorithm allows users to choose a value of the scaling parameter easily because the entropy weighting in the proposed algorithm is less sensitive to the scaling parameter. In addition, the proposed algorithm is less sensitive to noises because a point far away from its cluster center is given a small weight in the cluster center calculation. Experiments on both synthetic datasets and real datasets are used to demonstrate the performance of the proposed algorithm.
  Article Metrics


[1] C. C. Aggarwal and C. K. Reddy (eds.), Data Clustering:Algorithms and Applications, CRC Press, Boca Raton, FL, USA, 2014.

[2] R. Agrawal, J. Gehrke, D. Gunopulos and P. Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, in SIGMOD Record ACM Special Interest Group on Management of Data, ACM Press, New York, NY, USA, 27(1998), 94-105.

[3] S. Boutemedjet, D. Ziou and N. Bouguila, Model-based subspace clustering of non-gaussian data, Neurocomputing, 73(2010), 1730-1739.

[4] A. Broder, L. Garcia-Pueyo, V. Josifovski, S. Vassilvitskii and S. Venkatesan, Scalable kmeans by ranked retrieval, in Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM '14, ACM, 2014, 233-242.

[5] F. Cao, J. Liang and G. Jiang, An initialization method for the k-means algorithm using neighborhood model, Computers & Mathematics with Applications, 58(2009), 474-483.

[6] M. E. Celebi, H. A. Kingravi and P. A. Vela, A comparative study of efficient initialization methods for the k-means clustering algorithm, Expert Systems with Applications, 40(2013), 200-210.

[7] X. Chen, Y. Ye, X. Xu and J. Z. Huang, A feature group weighting method for subspace clustering of high-dimensional data, Pattern Recognition, 45(2012), 434-446.

[8] M. de Souto, I. Costa, D. de Araujo, T. Ludermir and A. Schliep, Clustering cancer gene expression data:A comparative study, BMC Bioinformatics, 9(2008), 497-510.

[9] C. Domeniconi, D. Gunopulos, S. Ma, B. Yan, M. Al-Razgan and D. Papadopoulos, Locally adaptive metrics for clustering high dimensional data, Data Mining and Knowledge Discovery, 14(2007), 63-97.

[10] B. Duran and P. Odell, Cluster Analysis-A survey, vol. 100 of Lecture Notes in Economics and Mathematical Systems, Springer-Verlage, Berlin, Heidelberg, New York, 1974.

[11] E. Elhamifar and R. Vidal, Sparse subspace clustering:Algorithm, theory, and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2013), 2765-2781.

[12] G. Gan, Data Clustering in C++:An Object-Oriented Approach, Data Mining and Knowledge Discovery Series, Chapman & Hall/CRC Press, Boca Raton, FL, USA, 2011.

[13] G. Gan and M. K.-P. Ng, Subspace clustering using affinity propagation, Pattern Recognition, 48(2015), 1455-1464.

[14] G. Gan and M. K.-P. Ng, Subspace clustering with automatic feature grouping, Pattern Recognition, 48(2015), 3703-3713.

[15] G. Gan and J. Wu, Subspace clustering for high dimensional categorical data, ACM SIGKDD Explorations Newsletter, 6(2004), 87-94.

[16] G. Gan and J. Wu, A convergence theorem for the fuzzy subspace clustering (FSC) algorithm, Pattern Recognition, 41(2008), 1939-1947.

[17] G. Gan, J. Wu and Z. Yang, A fuzzy subspace algorithm for clustering high dimensional data, in Lecture Notes in Artificial Intelligence (eds. X. Li, S. Wang and Z. Dong), vol. 4093, Springer-Verlag, 2006, 271-278.

[18] J. A. Hartigan, Clustering Algorithms, Wiley, New York, NY, 1975.

[19] J. Huang, M. Ng, H. Rong and Z. Li, Automated variable weighting in k-means type clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2005), 657-668.

[20] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1988.

[21] L. Jing, M. Ng and J. Huang, An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data, IEEE Transactions on Knowledge and Data Engineering, 19(2007), 1026-1041.

[22] H.-P. Kriegel, P. Kröger and A. Zimek, Clustering high-dimensional data:A survey on subspace clustering, pattern-based clustering, and correlation clustering, ACM Transactions on Knowledge Discovery from Data, 3(2009), 1-58.

[23] J. Macqueen, Some methods for classification and analysis of multivariate observations, in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics andProbability (eds. L. LeCam and J. Neyman), vol. 1, University of California Press, Berkely, CA, USA, 1967, 281-297.

[24] J. Pe~na, J. Lozano and P. Larra~naga, An empirical comparison of four initialization methods for the k-means algorithm, Pattern Recognition Letters, 20(1999), 1027-1040.

[25] L. Peng and J. Zhang, An entropy weighting mixture model for subspace clustering of highdimensional data, Pattern Recognition Letters, 32(2011), 1154-1161.

[26] R. Xu and D. Wunsch, Clustering, Wiley-IEEE Press, Hoboken, NJ, 2008.

Copyright Info: © 2016, Guojun Gan, et al., licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution Licese (http://creativecommons.org/licenses/by/4.0)

Download full text in PDF

Export Citation

Article outline

Show full outline
Copyright © AIMS Press All Rights Reserved