
Citation: Zhe Yang, Jun Yin, Baolin Deng. Enhancing water flux of thin-film nanocomposite (TFN) membrane by incorporation of bimodal silica nanoparticles[J]. AIMS Environmental Science, 2016, 3(2): 185-198. doi: 10.3934/environsci.2016.2.185
[1] | Chen Zheng, Zhaobang Tan . A novel identified pyroptosis-related prognostic signature of colorectal cancer. Mathematical Biosciences and Engineering, 2021, 18(6): 8783-8796. doi: 10.3934/mbe.2021433 |
[2] | Muhammad Akram, Saba Siddique, Majed G. Alharbi . Clustering algorithm with strength of connectedness for $ m $-polar fuzzy network models. Mathematical Biosciences and Engineering, 2022, 19(1): 420-455. doi: 10.3934/mbe.2022021 |
[3] | He Ma . Achieving deep clustering through the use of variational autoencoders and similarity-based loss. Mathematical Biosciences and Engineering, 2022, 19(10): 10344-10360. doi: 10.3934/mbe.2022484 |
[4] | Muhammad Akram, Ahmad N. Al-Kenani, Anam Luqman . Degree based models of granular computing under fuzzy indiscernibility relations. Mathematical Biosciences and Engineering, 2021, 18(6): 8415-8443. doi: 10.3934/mbe.2021417 |
[5] | Xueyan Wang . A fuzzy neural network-based automatic fault diagnosis method for permanent magnet synchronous generators. Mathematical Biosciences and Engineering, 2023, 20(5): 8933-8953. doi: 10.3934/mbe.2023392 |
[6] | Meijiao Wang, Yu chen, Yunyun Wu, Libo He . Spatial co-location pattern mining based on the improved density peak clustering and the fuzzy neighbor relationship. Mathematical Biosciences and Engineering, 2021, 18(6): 8223-8244. doi: 10.3934/mbe.2021408 |
[7] | Shuo Sun, Xiaoni Cai, Jinhai Shao, Guimei Zhang, Shan Liu, Hongsheng Wang . Machine learning-based approach for efficient prediction of diagnosis, prognosis and lymph node metastasis of papillary thyroid carcinoma using adhesion signature selection. Mathematical Biosciences and Engineering, 2023, 20(12): 20599-20623. doi: 10.3934/mbe.2023911 |
[8] | Yingjian Yang, Wei Li, Yingwei Guo, Nanrong Zeng, Shicong Wang, Ziran Chen, Yang Liu, Huai Chen, Wenxin Duan, Xian Li, Wei Zhao, Rongchang Chen, Yan Kang . Lung radiomics features for characterizing and classifying COPD stage based on feature combination strategy and multi-layer perceptron classifier. Mathematical Biosciences and Engineering, 2022, 19(8): 7826-7855. doi: 10.3934/mbe.2022366 |
[9] | Jiaxi Lu, Yingwei Guo, Mingming Wang, Yu Luo, Xueqiang Zeng, Xiaoqiang Miao, Asim Zaman, Huihui Yang, Anbo Cao, Yan Kang . Determining acute ischemic stroke onset time using machine learning and radiomics features of infarct lesions and whole brain. Mathematical Biosciences and Engineering, 2024, 21(1): 34-48. doi: 10.3934/mbe.2024002 |
[10] | Chengkang Li, Ran Wei, Yishen Mao, Yi Guo, Ji Li, Yuanyuan Wang . Computer-aided differentiates benign from malignant IPMN and MCN with a novel feature selection algorithm. Mathematical Biosciences and Engineering, 2021, 18(4): 4743-4760. doi: 10.3934/mbe.2021241 |
Clustering is an unsupervised learning technique to divide data into similar groups/clusters. It has real applications in different areas such as biology, agriculture, economics, intelligent system, medical data and imaging [1,2,3,4]. It is a branch of multivariate analysis and briefly divided into two categories: non-parametric approaches and (probability) model-based clustering [5]. In non-parametric approaches, prototype-based clustering algorithms, such as k-mean [6], fuzzy c-means [7,8] and possibilistic c-means [9,10] are most used methods. In 1977, Dempster et al. [11] first proposed a probability mixture-model likelihood approach to clustering via the expectation and maximization (EM) algorithm. To consider variable selection, Pan and Shen [12] combined EM [11] with the idea of least absolute shrinkage and selection operator (Lasso) [13].
In 1993, Banfield and Raftery [14] first proposed a so-called model-based Gaussian clustering to overcome the drawbacks of existing classification maximum likelihood approaches [15,16]. Banfield and Raftery [14] utilized eigenvalue decomposition for covariance matrix so that they can assign which feature to be common to all clusters, and which feature to be different between clusters for the model-based Gaussian clustering. It was widely applied in various areas, such as image segmentation [17], gene expression data [18], and background subtraction [19]. Recently, Yang et al. [20] proposed a fuzzy model-based Gaussian (F-MB-Gauss) clustering that combines the model-based Gaussian [14] with fuzzy membership functions [21,22] for clustering. However, F-MB-Gauss [20] treats data points with feature (variable) components under equal importance, and so it cannot distinguish these irrelevant feature components. In general, there exist some irrelevant features in a data set that may cause bad performance for clustering algorithms. In this paper, we further consider the F-MB-Gauss clustering with a Lasso penalty term. We then propose a fuzzy Gaussian Lasso (FG-Lasso) clustering algorithm. The proposed FG-Lasso algorithm becomes a clustering algorithm fitted for feature selection.
Medical data with gene expression in bioinformatics is an emerging systematic biological study. It is a discipline by combining biology, computer science, information engineering, mathematics and statistics to have better interpretation of data [23,24]. Bioinformatics is closely related to computational molecular biology, In a broad sense, computational biology covers all scientific operations related with biology that involve mathematics, computation, statistics, and algorithmic methods [25,26]. Genes/features selection is a significant task in bioinformatics due to having many irrelevant genes/features, and so discarding theses irrelevant genes/features may largely enhance clustering results and is suitable for further statistical/mathematical or any other treatment to get better results. Cancer is a disease in which WHO reported it is the first or second main leading cause of death. Cancer data are important medical data. Thus, to retain only relevant features is one of significant tasks and issues for researchers, especially in cancer data.
Since the proposed FG-Lasso algorithm is good for feature selection, we apply the FG-Lasso for cancer data, especially for feature selection. It is seen that the proposed FG-Lasso can perform both feature selection and regularization to increase accuracy and interpretability of clustering. It is also a good choice for high dimensional data set. The rest of the paper is organized as follows. In Section 2, we briefly review the F-MB-Gauss clustering and then propose the FG-Lasso algorithm. In Section 3, we present numerical results of the FG-Lasso clustering algorithm. In Section 4, we apply FG-Lasso for cancer data with feature selection. Conclusions are stated in Section 5.
Model-based clustering is an essential technique to pertain data into similar and dissimilar groups/clusters by using mixtures of probability distributions. The model-based Gaussian clustering was initially proposed by Banfield and Raftery [14] to extend the classification maximum likelihood of Scott and Symons [15] and Symons [16]. Let a data set $ X = \{ {x_1}, ..., {x_n}\} $ be a random sample from a d-variate Gaussian mixture with Gaussian distributions $ N\left({x; {\mu _k}, {\Sigma _k}} \right){\rm{ = }}{(2\pi)^{ - d/2}}|{\Sigma _k}{|^{ - 1/2}}\exp (- (1/2){(x - {\mu _k})^T}\Sigma _k^{ - 1}(x - {\mu _k})). $ Let $ P = \left\{P_{1, }, \ldots, P_{c}\right\} $ be a hard c-partition on $ X $, where $ \left\{ {{P_1}, ..., {P_C}} \right\} $ is equivalent to indicator functions $ \left\{ {{z_1}, ..., {z_C}} \right\} $ with $ {z_k}(x) = 1 $ as $ x \in {P_k} $, and $ {z_k}(x) = 0 $ otherwise. The objective function of the model-based Gaussian is given by $ J(z, \theta) = \sum\limits_{i = 1}^n {\sum\limits_{k = 1}^c {{z_{ki}}} } \ln \, {f_k}({x_i}; {\theta _k}) $, where $ {f_k}({x_i}; {\theta _k}){\rm{ = }}N\left({{x_i}; {\mu _k}, {\Sigma _k}} \right), $ and $ {z_{ki}}{\rm{ = }}{z_k}({x_i}) $ is the membership function with $ {z_{ki}} \in {\rm{\{ }}0, 1\} $ . The model-based Gaussian clustering algorithm is iterated by using the necessary conditions for maximizing the objective function $ J(z, \theta) $ with $ {\hat \mu _k} = {{\sum\limits_{i = 1}^n {{z_{ki}}{x_i}} } \mathord{\left/ {\vphantom {{\sum\limits_{i = 1}^n {{z_{ki}}{x_i}} } {\sum\limits_{i = 1}^n {{z_{ki}}} }}} \right. } {\sum\limits_{i = 1}^n {{z_{ki}}} }} $ and $ {\hat \Sigma _k} = {{\sum\limits_{i = 1}^n {{z_{ki}}({x_i} - {{\hat \mu }_k}){{({x_i} - {{\hat \mu }_k})}^{\rm{T}}}} } \mathord{\left/ {\vphantom {{\sum\limits_{i = 1}^n {{z_{ki}}({x_i} - {{\hat \mu }_k}){{({x_i} - {{\hat \mu }_k})}^{\rm{T}}}} } {\sum\limits_{i = 1}^n {{z_{ki}}} }}} \right. } {\sum\limits_{i = 1}^n {{z_{ki}}} }}. $
Zadeh [21] proposed fuzzy sets in 1965. Afterwards, Ruspini [27] extended the indicator functions $ \left\{ {{z_1}, ..., {z_C}} \right\} $ to allow the membership $ {z_k}(x) $ to be in the interval [0, 1] with $ \sum\nolimits_{k = 1}^c {{z_k}(x) = 1} $ for all $ x \in X $ . These extended membership functions $ \left\{ {{z_1}, ..., {z_C}} \right\} $ are called fuzzy c-partition. Recently, Yang et al. [20] proposed a fuzzy model-based Gaussian (F-MB-Gauss) clustering by combining the model-based Gaussian with fuzzy membership functions. The F-MB-Gauss objective function is as follows [20]:
$ J(z, \theta ) = \sum\limits_{i = 1}^n {\sum\limits_{k = 1}^c {z_{ki}^m} } \ln \, f(x;\theta ) = \sum\limits_{i = 1}^n {\sum\limits_{k = 1}^c {z_{ki}^m} } \ln \, N\left( {{x_i};{\mu _k}, {\Sigma _k}} \right) $ |
where $ {z_{ki}} $ is a fuzzy c-partition with the condition $ \sum\nolimits_{k = 1}^c {{z_{ki}} = 1}, \forall i $ and m is a fuzziness index with $ m{\rm{ > }}1 $ that determines the fuzziness level of clusters. However, the fuzziness index m may influence clustering results. To avoid m, the entropy term $ - w\sum\limits_{i = 1}^n {\sum\limits_{k = 1}^c {{z_{ki}}\ln {z_{ki}}} } $ of membership functions is added. Thus, the objective function becomes as
$ J(z, \mu , \sum ) = \sum\limits_{i = 1}^n {\sum\limits_{k = 1}^c {{z_{ki}}} } \ln \, N\left( {{x_i};{\mu _k}, {\Sigma _k}} \right) - w\sum\limits_{i = 1}^n {\sum\limits_{k = 1}^c {{z_{ki}}\ln {z_{ki}}} } $ |
where $ w \geqslant 0 $ is a parameter whose value is determined by a suitable decreasing learning rate, such as $ {0.999^t} $, $ {e^{ - t/100}} $, $ {e^{ - t/10}} $, or $ {e^{ - t}} $ . In Yang et al. [20], they considered the decreasing learning rate for $ w $ with $ {w^{(t)}} = {0.999^t} $ . We adopt it in this paper.
Although F-MB-Gauss [20] presents good clustering results for data sets, it always treats feature components of data points with equal importance. There exist some irrelevant features in most data sets that always affect performance of clustering algorithms with bad clustering results. However, the F-MB-Gauss cannot distinguish these irrelevant feature components. In this paper, we further study the F-MB-Gauss to have the algorithm be able to find out these irrelevant feature components. We use the idea of Lasso (least absolute shrinkage and selection operator) that was first proposed by Tibshirani [13] as variable selection in regression models. Note that Witten and Tibshirani [28] first proposed a feature selection framework using sparse clustering, where they use Lasso constraints of feature weights to shrink features toward 0 as feature selection. Witten and Tibshirani [28] defined the sparse clustering as the optimization of $ \mathop {\max }\limits_{w, \Theta } \sum\nolimits_{j = 1}^d {{w^T}\Theta }~{\rm{subject}}\;\;{\rm{to}}~{\left\| w \right\|_1} \leqslant s, {\rm{ }}{\left\| w \right\|^2} \leqslant 1, {w_j} \geqslant 0, \forall j $, where $ w = \left({{w_1}, {w_2}, \cdots, {w_d}} \right) \in {R^d} $ are feature weights, and $ s $ is $ {L_1} $ bound of $ w $ . They proposed the sparse k-means clustering by replacing the optimization of the k-means objective function. Castro and Pu [29] further proposed a simple approach to sparse k-means clustering based on the framework of Witten and Tibshirani [28]. Qiu et al. [30] extended the sparse k-means clustering to a sparse fuzzy c-means algorithm, and more recently, Chang et al. [31] proposed another sparse fuzzy c-means algorithm by extending the framework of Witten and Tibshirani [28] to $ {L_q}(0 < q \leqslant {\rm{)}} - norm $ regularization for shrinking irrelevant feature weights to 0. However, all of these clustering algorithms for feature selection are based on Lasso constraints of feature weights. For the F-MB-Gauss clustering with Gaussian mixture distributions, it is no way in considering feature weights. However, we can use mean components $ {\mu _{kp}} $ with $ \lambda \sum\nolimits_{k = 1}^c {\sum\nolimits_{p = 1}^d {\left| {{\mu _{kp}}} \right|} } $ . Thus, we consider the F-MB-Gauss with a Lasso penalty term, and then propose a fuzzy Gaussian Lasso (FG-Lasso) clustering algorithm. The FG-Lasso objective function is as follows:
$ {J_{{\rm{FG - Lasso}}}}(z, \mu , \sum ) = \sum\limits_{i = 1}^n {\sum\limits_{k = 1}^c {{z_{ki}}} } \ln \, N\left( {{x_i};{\mu _k}, {\Sigma _k}} \right) - w\sum\limits_{i = 1}^n {\sum\limits_{k = 1}^c {{z_{ki}}\ln {z_{ki}}} } - \lambda \sum\limits_{k = 1}^c {\sum\limits_{p = 1}^d {\left| {{\mu _{kp}}} \right|} } $ | (1) |
where $ \lambda \geqslant 0 $ is the regularization parameter that manages the amount of shrinkage and $ \mu _k^T = ({\mu _{k1}}, \cdots, {\mu _{kd}}), $ $ N\left({{x_i}; {\mu _k}, {\Sigma _k}} \right){\rm{ = }}{(2\pi)^{ - d/2}}|{\Sigma _k}{|^{ - 1/2}}\exp (- (1/2){({x_i} - {\mu _k})^T}\Sigma _k^{ - 1}({x_i} - {\mu _k})). $ The parameter $ \lambda $ can be used as a feature selection threshold. When the values of $ \lambda $ are increasing, more irrelevant features will be discarded. Of course, as $ \lambda = 0 $, the FG-Lasso becomes the F-MB-Gauss.
To obtain the necessary conditions for minimizing the FG-Lasso objective function $ {J_{{\rm{FG - Lasso}}}}(z, \mu, \sum) $, we use the Lagrangian as follows:
$ {\tilde J_{{\rm{FG - Lasso}}}}(z, \mu , \sum ) = \sum\limits_{i = 1}^n {\sum\limits_{k = 1}^c {{z_{ki}}} } \ln \, N\left( {{x_i};{\mu _k}, {\Sigma _k}} \right) - w\sum\limits_{i = 1}^n {\sum\limits_{k = 1}^c {{z_{ki}}\ln {z_{ki}}} } - \lambda \sum\limits_{k = 1}^c {\sum\limits_{p = 1}^d {\left| {{\mu _{kp}}} \right|} } - {\tau _1}\left( {\sum\limits_{k = 1}^c {{z_{ki}} - 1} } \right) $ |
Differentiating $ {\tilde J_{{\rm{FG - Lasso}}}}(z, \mu, \sum) $ with respect to the fuzzy membership function $ {z_{ki}} $ and setting it to be zero, we get the updating equation for $ {z_{ki}} $ as follows:
$ {\hat z_{ki}} = \frac{{{{\left[ {N({x_i};{\mu _{ki}}, {\sum _k})} \right]}^{1/w}}}}{{\sum\limits_{s = 1}^c {{{\left[ {N({x_i};{\mu _{ki}}, {\sum _k})} \right]}^{1/w}}} }} $ | (2) |
To obtain the updating equation of $ {\mu _{kp}} $ by differentiating $ {\tilde J_{{\rm{FG - Lasso}}}}(z, \mu, \sum) $ with respect to $ {\mu _{kp}} $, we only consider the case of $ {\Sigma _k}{\rm{ = }}\Sigma {\rm{ = }}diag(\sigma _p^2), p = 1, \cdots, d. $ Thus, we can obtain that $ \frac{{\partial {{\tilde J}_{{\rm{FG - Lasso}}}}(z, \mu, \sum)}}{{\partial {\mu _{kp}}}} = \frac{{\sum\nolimits_{i = 1}^n {{z_{ki}}({x_i} - {\mu _{kp}})} }}{{\sigma {{_p^2}_{}}}} - \lambda sign({\mu _{kp}}) $ . By using direct inspection of the FG-Lasso objective function $ {J_{{\rm{FG - Lasso}}}}(z, \mu, \sum) $, we can get the following solution for $ {\hat \mu _{kp}} $ :
$ \begin{gathered} {{\hat \mu }_{kp}} = \left\{ \begin{gathered} {{\tilde \mu }_{kp}} + \frac{{\lambda \hat \sigma _p^2}}{{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }}\, \, \, \, , \, \, if~{{\tilde \mu }_{kp}} \lt - \frac{{\lambda \hat \sigma _p^2}}{{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }} \\ 0\, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, , \, \, if~\left| {{{\tilde \mu }_{kp}}} \right|\, \, \leqslant \, \frac{{\lambda \hat \sigma _p^2}}{{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }} \\ {{\tilde \mu }_{kp}} - \frac{{\lambda \hat \sigma _p^2}}{{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }}\, \, , \, \, if~{{\tilde \mu }_{kp}} \gt \frac{{\lambda \hat \sigma _p^2}}{{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }} \\ \end{gathered} \begin{gathered} \right. \\
\, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \\
\end{gathered} $
{{\hat \mu }_{kp}} = \left\{ \begin{gathered} {{\tilde \mu }_{kp}} + \frac{{\lambda \hat \sigma _p^2}}{{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }}\, \, \, \, , \, \, if~{{\tilde \mu }_{kp}} \lt - \frac{{\lambda \hat \sigma _p^2}}{{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }} \\ 0\, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, , \, \, if~\left| {{{\tilde \mu }_{kp}}} \right|\, \, \leqslant \, \frac{{\lambda \hat \sigma _p^2}}{{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }} \\ {{\tilde \mu }_{kp}} - \frac{{\lambda \hat \sigma _p^2}}{{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }}\, \, , \, \, if~{{\tilde \mu }_{kp}} \gt \frac{{\lambda \hat \sigma _p^2}}{{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }} \\ \end{gathered} |
(3) |
with
$ {\tilde \mu _{kp}} = \frac{{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}{x_{ip}}} }}{{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }} $ | (4) |
where $ {\tilde \mu _{kp}} = {{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}{x_{ip}}} } \mathord{\left/ {\vphantom {{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}{x_{ip}}} } {\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }}} \right. } {\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }} $ is the maximum likelihood estimator (MLE) of the Gaussian means. As we increase the value of $ \lambda $ in Eq (3), it should have some $ {\hat \mu _{kp}} $ = 0, otherwise it has the amount $ {{\lambda \hat \sigma _p^2} \mathord{\left/ {\vphantom {{\lambda \hat \sigma _p^2} {\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }}} \right. } {\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }} $ of shrinkage. Thus, during clustering processes, if $ \left| {{{\tilde \mu }_{kp}}} \right|\, \, \leqslant \, {{\lambda \hat \sigma _p^2} \mathord{\left/ {\vphantom {{\lambda \hat \sigma _p^2} {\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }}} \right. } {\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }} $, then $ {\hat \mu _{kp}} $ = 0. Otherwise $ {\hat \mu _{kp}} $ = $ {\tilde \mu _{kp}} - {{\lambda \hat \sigma _p^2} \mathord{\left/ {\vphantom {{\lambda \hat \sigma _p^2} {\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }}} \right. } {\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }} $ . Note that $ {\tilde \mu _{kp}} = {{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}{x_{ip}}} } \mathord{\left/ {\vphantom {{\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}{x_{ip}}} } {\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }}} \right. } {\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} }} $ is the MLE of the normal mean $ {\mu _{kp}}. $ However, the updating Eq (3) for $ {\hat \mu _{kp}} $ represents the contribution of the pth feature to the cluster k through the regularization parameter $ \lambda $ . If $ {\hat \mu _{kp}} = 0 $ for all k, then the pth feature has no contribution to clustering that is non-informative and then discarded. This is why we consider the Lasso penalty $ \lambda \sum\nolimits_{k = 1}^c {\sum\nolimits_{p = 1}^d {\left| {{\mu _{kp}}} \right|} } $ to the F-MB-Gauss objective function such that it becomes the FG-Lasso objective function (1). Thus, the FG-Lasso algorithm can have a behavior of feature selection.
To drive the updating Eq (3) of $ {\hat \mu _{kp}} $, we use the FG-Lasso objective function $ {J_{{\rm{FG - Lasso}}}}(z, \mu, \sum). $ Taking the derivative of $ {J_{{\rm{FG - Lasso}}}}(z, \mu, \sum) $ with respect to $ {\mu _{kp}} $, we get $ \frac{{\partial {J_{{\rm{FG - Lasso}}}}(z, \mu, \sum)}}{{\partial {\mu _{kp}}}} = \frac{{\sum\nolimits_{i = 1}^n {{z_{ki}}({x_i} - {\mu _{kp}})} }}{{\sigma {{_p^2}_{}}}} - \lambda sign({\mu _{kp}}) $ . Set it to be 0, and after simplification, we get $ \sum\limits_{i = 1}^n {\hat z_{ki}^{}\sigma _{_p}^{ - 2}} ({x_{ip}} - {\mu _{kp}}) - \lambda sign({\mu _{kp}}) = 0 $, and have $ \sum\nolimits_{i = 1}^n {\hat z_{ki}^{}} {x_{ip}} = \lambda \sigma _p^2sign({\mu _{kp}}) + \sum\nolimits_{i = 1}^n {\hat z_{ki}^{}} {\mu _{kp}} $ . Dividing both side by $ \sum\nolimits_{j = 1}^n {\hat z_{kj}^{}} $, we obtain the Eq of $ {\hat \mu _{kp}} $ with $ {\hat \mu _{kp}} = \frac{{\sum\nolimits_{i = 1}^n {\hat z_{ki}^{}} {x_{ip}}}}{{\sum\nolimits_{i = 1}^n {\hat z_{ki}^{}} }} - \frac{{\lambda \sigma _p^2sign({{\hat \mu }_{kp}})}}{{\sum\nolimits_{i = 1}^n {\hat z_{ki}^{}} }} $ . Thus, we have $ {\hat \mu _{kp}} = {\tilde \mu _{kp}} - \frac{{\lambda \sigma _p^2sign({{\hat \mu }_{kp}})}}{{\sum\nolimits_{i = 1}^n {\hat z_{ki}^{}} }} $ . As we know that $ \left| {{{\hat \mu }_{kp}}} \right| $ is not differentiable at $ {\hat \mu _{kp}} $ = 0, and so we need to cope up this problem by using subderivative or subetaadient of a convex function. In case of the absolute value function $ f(x) = \lambda \left| x \right| $, the subetaadient or subderivative is defined by $ \partial f(x) = \left\{ {−λ}ifx<0[−λ,λ]ifx=0{+λ}ifx>0
Similarly, for the common diagonal covariance matrix $ {\Sigma _k}{\rm{ = }}\Sigma {\rm{ = }}diag({\sigma _p}), p = 1, \cdots, d $, differentiating $ {\tilde J_{{\rm{FG - Lasso}}}}(z, \mu, \sum) $ with respect to $ \sigma _p^2, p = 1, \cdots, d, $ we can get the following updating Eq:
$ \hat \sigma _p^2 = \frac{{\sum\nolimits_{k = 1}^c {\sum\nolimits_{i = 1}^n {{z_{ki}}{{({x_{ip}} - {{\tilde \mu }_{kp}})}^2}} } }}{{\sum\nolimits_{k = 1}^c {\sum\nolimits_{i = 1}^n {{{\hat z}_{ki}}} } }} $ | (5) |
Due to the expected singularity of matrix when the cluster number is large in the FG-Lasso algorithm, we use the following condition to overcome this problem:
$ \tilde \sigma _p^2 = (1 - \gamma )\sigma _p^2 + \gamma \omega $ | (6) |
where $ \gamma $ is a small positive number and $ \omega $ is a diagonal matrix with a small positive number. Here we use $ \gamma $ = 0.0001, $ \omega {\rm{ = }}d_{\min }^2, $ $ {I_d}{\rm{ = }}\, \, {\rm{min}}\, {\rm{\{ }}\, d_{ij}^2 = {\left\| {{x_i} - {x_j}} \right\|^2} > 0, \, 1 \leqslant i, j \leqslant n\}. $ For $ w $, we use the same decreasing learning rate as Yang et al. [20] with
$ {w^{(t)}} = {0.999^t} $ | (7) |
Thus, the proposed FG-Lasso clustering algorithm can be summarized as follows.
FG-Lasso Algorithm
Step 1: Fix $ \varepsilon > 0 $ . Give initials for $ \mu $ and $ \sigma _p^{2, (0)} $ . Set $ \lambda $ = 1 and $ t = 1 $ .
Step 2: Compute $ \hat z_{kj}^{(0)} $ by using Eq (2).
Step 3: Compute $ \tilde \mu _{_{kp}}^{(t)} $ by using Eq (4).
Step 4: Compute $ {w^{(t)}} $ by using Eq (7).
Step 5: Update $ \hat \sigma _p^{2, (t)} $ with $ \tilde \mu _{_{kp}}^{(t)} $ and $ \hat z_{kj}^{(t - 1)} $ by Eqs (5) and (6).
Step 6: Update $ \hat z_{kj}^{(t)} $ with $ \tilde \mu _{kp}^{(t)}, $ $ {w^{(t)}} $ and $ \hat \sigma _p^{2, (t)} $ by using Eq (2).
Step 7: Update $ \tilde \mu _{kp}^{(t + 1)} $ with $ \hat z_{kj}^{(t)} $ by using Eq (4).
If $ max\left\| {{{\tilde \mu }_{kp}}^{(t + 1)} - {{\tilde \mu }_{kp}}^{(t)}} \right\| < \varepsilon $ stop.
Else $ t = t + 1 $ and return to Step 3.
Step 8: Update $ \hat \sigma _p^{2, (t + 1)} $ with $ \tilde \mu _{kp}^{(t + 1)} $ and $ \hat z_{kj}^{(t)} $ by using Eqs (5) and (6).
Step 9: Update $ \hat \mu _{kp}^{(t)} $ with $ \hat z_{kj}^{(t)} $, $ \tilde \mu _{kp}^{(t + 1)} $ and $ \hat \sigma _{kp}^{2, (t + 1)} $ by using Eq (3), that is,
If $ \left| {\tilde \mu _{kp}^{(t + 1)}} \right|\, \, \leqslant \, \frac{{\lambda \hat \sigma _p^{2, (t + 1)}}}{{\sum\nolimits_{j = 1}^n {\hat z_{kj}^{(t)}} }}, $ then let $ \hat \mu _{kp}^{(t)} $ = 0.
Else $ \hat \mu _{kp}^{(t)}\, = \, \tilde \mu _{kp}^{(t + 1)} - \frac{{\lambda \hat \sigma _p^{2, (t + 1)}}}{{\sum\nolimits_{j = 1}^n {\hat z_{kj}^{(t)}} }}\, \, . $
Step 10: Increase $ \lambda $ and return to Step 3, or output results.
In this section, we demonstrate the performance of the proposed FG-Lasso clustering algorithm. Several synthetic and real data sets are used to have more insights to the feature selection behaviors of the FG-Lasso algorithm. We also give the comparisons of the proposed FG-Lasso with F-MB-Gauss [20]. The accuracy rate (AR) is used as a criterion for evaluating the performance of a clustering algorithm. AR is the percentage of data points that are correctly identified by the clustering algorithm in which AR is defined as $ A R = \sum_{i = 1}^{k} r_{i} / n $, where $ r_{i} $ is the number of points in $ {C_i}^\prime $ that are also in $ {C_i} $ in which $ C = \left\{C_{1}, C_{2}, \cdots, C_{c}\right\} $ is the set of c clusters for the given data set and $ C^{\prime} = \left\{C_{1}^{\prime}, C_{2}^{\prime}, \cdots, C_{c}^{\prime}\right\} $ is the set of c clusters generated by the clustering algorithm.
Example 1. In this example, a simulation data set is used to demonstrate the significance of the FG-Lasso algorithm, where the usefulness of $ \lambda $ for feature selection, especially to remove irrelevant features, is demonstrated. A data set, called 3-cluser Gaussian data, with having 1800 points are generated from a Gaussian mixture with $ {\alpha _1} = {\alpha _2}{\rm{ = }}{\alpha _3}{\rm{ = 1/3}} $ where 600 points are from the normal distribution with $ {\mu _1} = (1\, \, \, 1) $ and $ {\sum _1} = (1\, \, \, \, \, 0;\, 0\, \, \, \, \, 1) $, 600 points are from the normal distribution with $ {\mu _2} = (3\, \, \, 5) $ and $ {\sum _2} = (1\, \, \, \, \, 0;\, 0\, \, \, \, \, 1) $ while the same size in the cluster three with $ {\mu _3} = (5\, \, \, 1.5) $ and $ {\sum _1} = (1\, \, \, \, \, 0;\, 0\, \, \, \, \, 1) $ . We consider the two features, named as $ featur{e_1} $ and $ featur{e_2} $, as informative. We then add other two irrelevant features generated from the uniform distributions over the intervals [﹣5, 5] and [﹣10, 10], respectively, named as $ featur{e_3} $ and $ featur{e_4} $, that are considered as non-informative features. The original data set with two informative features $ featur{e_1} $ and $ featur{e_2} $ is shown in Figure 1(a). Figure 1(b) represents the clustering results of the F-MB-Gauss algorithm with the 3 cluster centers after 45 iterations. The final clustering results of the FG-Lasso algorithm with the 3 clusters after 13 iterations are shown in Figure 1(c). Results of ARs from the FG-Lasso and F-MB-Gauss algorithms are shown in Table 1. It is seen that the proposed FG-Lasso is feasible for feature selection with the final feature number of d* = 2. However, the F-MB-Gauss algorithm cannot have feature selection and so it give the final feature number of d = 4. It is clearly that the irrelevant features of $ featur{e_3} $ and $ featur{e_4} $ actually distort the final clustering results for the F-MB-Gauss algorithm with d = 4 and average AR = 0.3501. However, the proposed FG-Lasso can discard these non-informative features of $ featur{e_3} $ and $ featur{e_4} $ with d* = 2 and a high average AR = 0.961. The details of discarded features as increasing the values of $ \lambda $ using FG-Lasso are shown in Table 2. When the value of $ \lambda $ is increasing as 10, we obtain $ {\hat \mu _{13}} = {\hat \mu _{23}} = {\hat \mu _{33}} = 0 $ and so the feature $ featur{e_3} $ is discarded. Similarly, when we increase the value of $ \lambda $ as 15, we obtain $ {\hat \mu _{14}} = {\hat \mu _{24}} = {\hat \mu _{34}} = 0 $ and, so the feature $ featur{e_4} $ is discarded. Thus, it successfully discards all irrelevant features of $ featur{e_3} $ and $ featur{e_4} $ when the value of $ \lambda $ is 15. From Table 2, we find that both features of $ featur{e_1} $ and $ featur{e_2} $ are informative, but features $ featur{e_3} $ and $ featur{e_4} $ are non-informative, and then discarded.
c | F-MB-Gauss | FG-Lasso | ||
d | AR | d* | AR | |
3 | 4 | 0.3501 | 2 | 0.961 |
Features | $ \lambda $ | |
10 | 15 | |
$ featur{e_1} $ | - | - |
$ featur{e_2} $ | - | - |
$ featur{e_3} $ | $ \hat{\mu}_{13}=\hat{\mu}_{23}=\hat{\mu}_{33}=0 $ | $ \times $ |
$ featur{e_4} $ | - | $ \hat{\mu}_{14}=\hat{\mu}_{24}=\hat{\mu}_{34}=0 $ |
Example 2. We also use a simulation data set for the proposed FG-Lasso algorithm to demonstrate the significance of $ \lambda $ for feature selection, especially to remove irrelevant features. A data set, called 5-cluser Gaussian data, with 800 points are generated from a Gaussian mixture with $ {\alpha _1} = {\alpha _2}{\rm{ = }}{\alpha _3}{\rm{ = }}{\alpha _4} = {\alpha _5} = 1/5 $, $ {\mu _1} = (8\, \, \, 10) $, $ {\mu _2} = (8\, \, \, 20) $, $ {\mu _3} = (15\, \, \, 10) $, $ {\mu _4} = (15\, \, \, 20) $, and $ {\mu _5} = (11\, \, \, 16) $ with $ {\sum _1} = {\sum _2} = {\sum _3} = {\sum _4} = {\sum _5} = (1\, \, \, \, \, 0;\, 0\, \, \, \, \, 1) $ . We consider the two features, named as $ featur{e_1} $ and $ featur{e_2} $, as informative. We then add one irrelevant feature generated from the uniform distributions over the intervals [0, 65], named as $ featur{e_3} $, that is considered as a non-informative feature. The original data set with two informative features $ featur{e_1} $ and $ featur{e_2} $ is shown in Figure 2(a). Figure 2(b) represents the final clustering results of the F-MB-Gauss algorithm with 5 cluster centers after 1270 iterations. The final clustering results of the proposed FG-Lasso algorithm with 5 clusters after 15 iterations are shown in Figure 2(c). Results of average ARs from the FG-Lasso and F-MB-Gauss algorithms with 50 different initializations are shown in Table 3. It is seen that the proposed FG-Lasso is feasible for feature selection with the final feature number of d* = 2. It is clearly that irrelevant features actually distort the final clustering results for the F-MB-Gauss algorithm with average AR = 0.320 that cannot give a feature selection behavior with the final feature number of d = 3. However, the proposed FG-Lasso can discard the non-informative feature $ featur{e_3} $ with a high average AR = 0.810. The details of discarded feature as increasing the values of $ \lambda $ using FG-Lasso are also shown in Table 4. When the values of $ \lambda $ is increasing as 80, we obtain $ {\hat \mu _{{\rm{13}}}} = {\hat \mu _{{\rm{43}}}} = {\hat \mu _{{\rm{53}}}} = 0 $ and when we increase the value of $ \lambda $ is 105, we obtain $ {\hat \mu _{{\rm{23}}}} = {\hat \mu _{{\rm{33}}}} = 0 $ . It is clearly the feature $ featur{e_3} $ is discarded. Thus, it successfully discards irrelevant feature $ featur{e_3} $ as the value of $ \lambda $ is 80 or 105. From Table 4, we find that both features $ featur{e_1} $ and $ featur{e_2} $ are informative, but feature $ featur{e_3} $ is non-informative, and then discarded.
c | F-MB- Gauss | FG-Lasso | ||
d | AR | d* | AR | |
5 | 3 | 0.320 | 2 | 0.810 |
Features | $ \lambda $ | |
80 | 105 | |
$ featur{e_1} $ | - | - |
$ featur{e_2} $ | - | - |
$ featur{e_3} $ | $ \hat{\mu}_{13}=\hat{\mu}_{43}=\hat{\mu}_{53}=0 $ | $ {{\hat \mu }_{23}} = {{\hat \mu }_{33}} = 0 $ |
Except the above two synthetic data sets, we also use a real data set, Pima indian, from UCI repository data [33].
Example 3 (Pima indian [33]). In this example, we consider the real data set of Pima indian [33]. This data set consists of 8 features, named as the number of times of Pregnant, Plasma glucose concentration a 2 hours in an oral glucose tolerance test, Diastolic blood pressure (mm Hg), Triceps skin fold thickness (mm), 2-Hour serum insulin (mu U/ml), Body mass index (weight in kg/(height in m)^2), Diabetes pedigree function, and Age (years), while class variable (Outcomes). The data set has two classes. By using F-MB-Gauss, we obtain the average AR = 0.45 when 30 different initializations are considered. As we increase the values of $ \lambda $ to $ \lambda $ = 50, the proposed FG-Lasso algorithm discards the features, Plasma glucose concentration a 2 hours in an oral glucose tolerance test, Diastolic blood pressure, and 2-Hour serum insulin, with a higher average AR = 0.67. This shows the good aspect of the proposed FG-Lasso clustering algorithm for the Pima indian data set.
Cancer is uncontrolled growth of abnormal cells in a body found in a group of diseases [34]. According to the estimate from WHO (World Health Organization), it is the first or second main leading cause of death before 70 years in 91 out of 172 countries [35]. It spreads and affects the other parts of body if there is not properly diagnosed. This severe disease has many symptoms such as tumor, abnormal bleeding, long-term cough, more weight loss, etc. According to the Global Cancer Incidence, Mortality and Prevalence (GLOBOCAN), cancer has extended 36 types in which lung cancer is the most common diseases (11.6%) of the total cases in male and female [36]. Other alarming leading cancer diseases are breast cancer (11.6%), prostate cancer (7.1%), colorectal cancer (6.1%), stomach cancer (8.2%) and liver cancer (8.2%) [35,36]. Breast cancer is the most leading and commonly diagnosed cancer disease among females [36,37,38] and leading breast cancer issue in 154 out of 185 countries [38]. According to the WHO findings, there were 2.1 million newly women diagnosed breast cases in 2018. The highest statistics found countries are Australia/New-Zealand, United Kingdom, Sweden Finland, Denmark Belgium (Highest rate), the Netherlands and France. There are many risk factors of breast cancer like family history, physical activity, breast feeding, hormones intake, alcohol intake, greater weight and body fat [36,39]. There are three important techniques to diagnose breast cancer, namely as mammography, Fine Needle Aspirate (FNA) biopsy and surgical biopsy. To demonstrate the applicability of the proposed FG-Lasso clustering algorithm, we use the following three real cancer data sets, Breast cancer Wisconsin, Colon tissues, and Leukemia data sets. We implement the FG-Lasso and F-MB-Gauss algorithms to the three cancer data sets and compare their results.
Example 4 (Breast Cancer Wisconsin). Breast cancer Wisconsin data was created in Street et al. [40]. This data consists of 569 FNA with 212 malignant (patients sample) and 357 benign (healthy samples) [36,40]. The 30 attributes are computed from a digitized image of a FNA of breast mass. They describe characteristics of the cell nuclei presented in the image. Real values are computed from each cell nucleus, namely as radius (from center to points on the perimeter), texture (gray-scale values), perimeter, area, smoothness (local variation in radius lengths), compactness (perimeter^2 /area-1.0), concavity (severity of concave portions of the contour), concave points (number of concave portions of the contour), symmetry and fractal dimension ("coastline approximation"-1), where each one in the 10 features has three components as mean, standard error and worst. That has 30 features. The FG-Lasso and F-MB-Gauss algorithms are implemented on Breast cancer Wisconsin where clustering results are shown in Table 5. From Table 5, it is seen that the F-MB-Gauss algorithm obtains a good accuracy rate with d = 30 and AR = 0.905 when 30 different initializations are considered. Using the same 30 initializations, we also implement the proposed FG-Lasso algorithm on the data set. There are 27 from 30 features to be selected with d* = 27 and a little better accuracy rate of AR = 0.925 when the $ \lambda $ value is 12, where the features of area mean, area standard error and area worst are considered as unimportant features and then discarded.
c | F-MB-Gauss | FG-Lasso | ||
d | AR | d* | AR | |
2 | 30 | 0.905 | 27 | 0.925 |
Next, we implement the FG-Lasso and F-MB-Gauss algorithms to the colon tissue data set [41].
Example 5 (Colon Tissues). The Colon tissue data set consists of colon cancer which contains 62 samples from the microarray experiments of colon tissues samples with 2000 genes and two classes (40 tumor tissues and 22 normal tissues) [41]. Colorectal cancer is the kind of cancer that starts tissue or tumor growth on the inner lining of the colon. It is the third incidence of death and second in term of mortality in the world [41]. Most suffered in colon cancer countries are Hungry, Slovenia, Netherlands, Norway, Australia New Zealand, North America, Japan, Republic of Korea, and Singapore (in Females). Among these countries, Hungry is ranked first in males and Norway is ranked first in females, while highest colon incidence rates are found in Republic of Korea among males and Macedonia among females [36,42]. We first standardize the data set, and then apply the FG-Lasso and F-MB-Gauss algorithms to the Colon tissue data set where the clustering results are shown in Table 6. From Table 6, it is seen that the F-MB-Gauss algorithm obtains the average accuracy rate AR = 0.601 with 30 different initializations. When the proposed FG-Lasso algorithm is implemented on the data set with the same 30 different initializations, there are 1384 from 2000 features to be selected with the final feature number of d* = 1384 when the $ \lambda $ value is increasing to be 90 in which it obtains a better average accuracy rate AR = 0.653, as shown in Table 6. This demonstrates that the proposed FG-Lasso algorithm is significant for feature selection on the Colon tissue data set.
c | F-MB-Gauss | FG-Lasso | ||
d | AR | d* | AR | |
2 | 2000 | 0.601 | 1384 | 0.653 |
Finally, we implement the FG-Lasso and F-MB-Gauss algorithms to the Leukemia cancer data set. Leukemia is the type of cancer as the uncontrolled growth of hematopoietic stem cell in the bone marrow occurs. Leukemia is a well-known data that the type of cancer occurs as the uncontrolled growth of hematopoietic stem cell in the bone marrow. It is the most common in white male people and increase according to ages [43,44]. Broadly there are four subtypes of leukemia, acute lymphoblastic, acute myelogenous, chronic lymphocytic, and chronic myelogenous. Acute lymphoblastic leukemia (ALL) is most commonly found in children while the other three types occur in adults. ALL causes fever, lethargy, bleeding, musculoskeletal pain or dysfunction. On the other hand, fever, fatigue, weight loss, bleeding or bruising are the most commonly symptoms of acute myelogenous leukemia (AML) [44]. According to the global cancer statistics [36], leukemia has 2.4% rate of new cases, and 3.2% of deaths rate worldwide in 2018.
Example 6 (Leukemia Data). Leukemia data had originally considered by Golub et al. [45]. The data consists of 38 patients considered as observations each from leukemia patients with their biological sample array while 7129 genes are considered as features. Among these samples, 27 are acute lymphoblastic leukemia (ALL) and 11 are acute myelogenous leukemia (AML). Golub et al. [45] distinguished two types of patients due to their isolation clinical treatments. We only sort 2000 genes according to their variances and we also standardize data so each attribute has mean 0 and variance 1. We implement the FG-Lasso and F-MB-Gauss algorithms to the Leukemia data set where the clustering results are shown in Table 7. From Table 7, it is seen that the F-MB-Gauss algorithm obtains the low average accuracy rate AR = 0.393 with 30 different initializations. This shows that these irrelevant features in the Leukemia data set actually affects clustering results. When the proposed FG-Lasso algorithm is applied to the data set, there are 654 from 2000 features to be selected with the final feature number of d* = 654 when the $ \lambda $ value is increasing to be 350 in which it promotes the accuracy rate to AR = 0.615. That is, the proposed FG-Lasso algorithm is quite significant for feature selection on the Leukemia data set by selecting 654 from 2000 features.
c | F-MB-Gauss | FG-Lasso | ||
d | AR | d* | AR | |
2 | 2000 | 0.393 | 654 | 0.615 |
The F-MB-Gauss clustering proposed by Yang et al. [20] always treats feature components in data points with equal importance, and so it does not have a feature selection behavior. However, there generally exist irrelevant features in data that may badly affect the performance of clustering algorithms. In this paper, we extended the F-MB-Gauss clustering to the fuzzy Gaussian Lasso (FG-Lasso) using a Lasso penalty term of Gaussian means components. The FG-Lasso algorithm is then proposed for clustering data sets with feature selection. The proposed FG-Lasso has good behaviors with better choice for feature selection. Several experimental results and comparisons have actually demonstrated the feature selection aspect of the proposed FG-Lasso algorithm. According to the estimate from WHO, cancer is the first or second main leading cause of death. This severe disease has many symptoms such as tumor, abnormal bleeding, long-term cough, and more weight loss. Cancer data are important medical data where they are high dimensional and exist many irrelevant features. In this paper, we also apply the proposed FG-Lasso algorithm to the three cancer data, Breast cancer Wisconsin, Colon tissues, and Leukemia. According to clustering results, it is seen that the proposed FG-Lasso can select these important features with a higher accuracy as increasing of the threshold $ \lambda $ . However, our question is what value of the threshold $ \lambda $ should be to have an optimal number of features in the FG-Lasso clustering algorithm. That is, to find a good estimate for the threshold parameter $ \lambda $ should be important and will be our further research topic. On the other hand, to consider a whole covariance matrix, not only a diagonal matrix, is also another problem in FG-Lasso, and it would be also our future work.
The authors would like to thank the anonymous referees for their helpful comments in improving the presentation of this paper.
The authors declare there is no conflict of interest.
[1] |
Mezher T, Fath H, Abbas Z, et al. (2011) Techno-economic assessment and environmental impacts of desalination technologies. Desalination 266: 263-273. doi: 10.1016/j.desal.2010.08.035
![]() |
[2] |
Su J, Zhang S, Ling MM, et al. (2012) Forward osmosis: an emerging technology for sustainable supply of clean water. Clean Technol Envir 14: 507-511. doi: 10.1007/s10098-012-0486-1
![]() |
[3] |
Fritzmann C, Löwenberg J, Wintgens T, et al. (2007) State-of-the-art of reverse osmosis desalination. Desalination 216: 1-76. doi: 10.1016/j.desal.2006.12.009
![]() |
[4] |
Lee KP, Arnot TC, Mattia D (2011) A review of reverse osmosis membrane materials for desalination—Development to date and future potential. J Membrane Sci 370: 1-22. doi: 10.1016/j.memsci.2010.12.036
![]() |
[5] |
Xu J, Wang Z, Yu L, et al. (2013) A novel reverse osmosis membrane with regenerable anti-biofouling and chlorine resistant properties. J Membrane Sci 435: 80-91. doi: 10.1016/j.memsci.2013.02.010
![]() |
[6] |
Daraei P, Madaeni SS, Salehi E, et al. (2013) Novel thin film composite membrane fabricated by mixed matrix nanoclay/chitosan on PVDF microfiltration support: Preparation, characterization and performance in dye removal. J Membrane Sci 436: 97-108. doi: 10.1016/j.memsci.2013.02.031
![]() |
[7] |
Zhu X, Loo HE, Bai R (2013) A novel membrane showing both hydrophilic and oleophobic surface properties and its non-fouling performances for potential water treatment applications. J Membrane Sci 436: 47-56. doi: 10.1016/j.memsci.2013.02.019
![]() |
[8] |
Li D, Wang H (2010) Recent developments in reverse osmosis desalination membranes. J Mater Chem 20: 4551. doi: 10.1039/b924553g
![]() |
[9] | Wei J, Jian X, Wu C, et al. (2005) Influence of polymer structure on thermal stability of composite membranes. J Membrane Sci 256: 116-121. |
[10] |
Kim HI, Kim SS (2006) Plasma treatment of polypropylene and polysulfone supports for thin film composite reverse osmosis membrane. J Membrane Sci 286: 193-201. doi: 10.1016/j.memsci.2006.09.037
![]() |
[11] |
Chen G, Li S, Zhang X, et al. (2008) Novel thin-film composite membranes with improved water flux from sulfonated cardo poly(arylene ether sulfone) bearing pendant amino groups. J Membrane Sci 310: 102-109. doi: 10.1016/j.memsci.2007.10.039
![]() |
[12] |
Tarboush BJA, Rana D, Matsuura T, et al. (2008) Preparation of thin-film-composite polyamide membranes for desalination using novel hydrophilic surface modifying macromolecules. J Membrane Sci 325: 166-175. doi: 10.1016/j.memsci.2008.07.037
![]() |
[13] |
Yu S, Liu M, Liu X, et al. (2009) Performance enhancement in interfacially synthesized thin-film composite polyamide-urethane reverse osmosis membrane for seawater desalination. J Membrane Sci 342: 313-320. doi: 10.1016/j.memsci.2009.07.003
![]() |
[14] |
Mansourpanah Y, Momeni Habili E (2013) Preparation and modification of thin film PA membranes with improved antifouling property using acrylic acid and UV irradiation. J Membrane Sci 430: 158-166. doi: 10.1016/j.memsci.2012.11.065
![]() |
[15] |
Zhao L, Chang PCY, Yen C, et al. (2013) High-flux and fouling-resistant membranes for brackish water desalination. J Membrane Sci 425-426: 1-10. doi: 10.1016/j.memsci.2012.09.018
![]() |
[16] |
Li S, Wang Z, Zhang C, et al. (2013) Interfacially polymerized thin film composite membranes containing ethylene oxide groups for CO2 separation. J Membrane Sci 436: 121-131. doi: 10.1016/j.memsci.2013.02.038
![]() |
[17] |
Buonomenna MG (2013) Nano-enhanced reverse osmosis membranes. Desalination 314: 73-88. doi: 10.1016/j.desal.2013.01.006
![]() |
[18] |
Johansson EM, Ballem MA, Cordoba JM, et al. (2011) Rapid synthesis of SBA-15 rods with variable lengths, widths, and tunable large pores. Langmuir 27: 4994-4999. doi: 10.1021/la104864d
![]() |
[19] |
Kim E-S, Deng B (2011) Fabrication of polyamide thin-film nano-composite (PA-TFN) membrane with hydrophilized ordered mesoporous carbon (H-OMC) for water purifications. J Membrane Sci 375: 46-54. doi: 10.1016/j.memsci.2011.01.041
![]() |
[20] |
Jeong BH, Hoek EMV, Yan Y, et al. (2007) Interfacial polymerization of thin film nanocomposites: A new concept for reverse osmosis membranes. J Membrane Sci 294: 1-7. doi: 10.1016/j.memsci.2007.02.025
![]() |
[21] |
Jadav GL, Singh PS (2009) Synthesis of novel silica-polyamide nanocomposite membrane with enhanced properties. J Membrane Sci 328: 257-267. doi: 10.1016/j.memsci.2008.12.014
![]() |
[22] |
Rajaeian B, Rahimpour A, Tade MO, et al. (2013) Fabrication and characterization of polyamide thin film nanocomposite (TFN) nanofiltration membrane impregnated with TiO2 nanoparticles. Desalination 313: 176-188. doi: 10.1016/j.desal.2012.12.012
![]() |
[23] |
Yin J, Zhu G, Deng B (2013) Multi-walled carbon nanotubes (MWNTs)/polysulfone (PSU) mixed matrix hollow fiber membranes for enhanced water treatment. J Membrane Sci 437: 237-248. doi: 10.1016/j.memsci.2013.03.021
![]() |
[24] |
Dumée L, Lee J, Sears K, et al. (2013) Fabrication of thin film composite poly(amide)-carbon-nanotube supported membranes for enhanced performance in osmotically driven desalination systems. J Membrane Sci 427: 422-430. doi: 10.1016/j.memsci.2012.09.026
![]() |
[25] | Kim CE, Yoon JS, Hwang HJ (2008) Synthesis of nanoporous silica aerogel by ambient pressure drying. J Sol-Gel Sci Techn 49: 47-52. |
[26] |
Yin J, Kim ES, Yang J, et al. (2012) Fabrication of a novel thin-film nanocomposite (TFN) membrane containing MCM-41 silica nanoparticles (NPs) for water purification. J Membrane Sci 423-424: 238-246. doi: 10.1016/j.memsci.2012.08.020
![]() |
[27] |
Wu H, Tang B, Wu P (2013) Optimizing polyamide thin film composite membrane covalently bonded with modified mesoporous silica nanoparticles. J Membrane Sci 428: 341-348. doi: 10.1016/j.memsci.2012.10.053
![]() |
[28] |
Mori H, Uota M, Fujikawa D, et al. (2006) Synthesis of micro-mesoporous bimodal silica nanoparticles using lyotropic mixed surfactant liquid-crystal templates. Micropor Mesopor Mat 91: 172-180. doi: 10.1016/j.micromeso.2005.11.033
![]() |
[29] |
Kosmulski M (2002) The pH-dependent surface charging and the points of zero charge. J Colloid Interface Sci 253: 77-87. doi: 10.1006/jcis.2002.8490
![]() |
[30] |
Lee HS, Im SJ, Kim JH, et al. (2008) Polyamide thin-film nanofiltration membranes containing TiO2 nanoparticles. Desalination 219: 48-56. doi: 10.1016/j.desal.2007.06.003
![]() |
[31] |
Shawky HA, Chae SR, Lin S, et al. (2011) Synthesis and characterization of a carbon nanotube/polymer nanocomposite membrane for water treatment. Desalination 272: 46-50. doi: 10.1016/j.desal.2010.12.051
![]() |
[32] | Viart N, Niznansky D, Rehspringer JL (1997) Structural Evolution of a Formamide Modified Sol--Spectroscopic Study. J Sol-Gel Sci Techn 8: 183-187. |
[33] |
Wu H, Tang B, Wu P (2013) Optimization, characterization and nanofiltration properties test of MWNTs/polyester thin film nanocomposite membrane. J Membrane Sci 428: 425-433. doi: 10.1016/j.memsci.2012.10.042
![]() |
[34] | Yin J, Deng B (2015) Polymer-matrix nanocomposite membranes for water treatment. J Membrane Sci 479: 256-275. |
[35] |
Yin J, Zhu G, Deng D (2016) Graphene oxide (GO) enhanced polyamide (PA) thin-film nanocomposite (TFN) membrane for water purification. Desalination 379: 93-101. doi: 10.1016/j.desal.2015.11.001
![]() |
1. | Jianfei Song, Zhenyu Li, Guijin Yao, Songping Wei, Ling Li, Hui Wu, Vijayalakshmi Kakulapati, Framework for feature selection of predicting the diagnosis and prognosis of necrotizing enterocolitis, 2022, 17, 1932-6203, e0273383, 10.1371/journal.pone.0273383 | |
2. | Dan Wang, Yier Lin, Liang Hong, Ce Zhang, Yajie Bai, Zhen Zhen Bi, Naeem Jan, Detection of the Driver’s Mental Workload Level in Smart and Autonomous Systems Using Physiological Signals, 2022, 2022, 1563-5147, 1, 10.1155/2022/5233257 | |
3. | Wajid Ali, Ishtiaq Hussain, Miin-Shen Yang, 2022, Chapter 25, 978-981-16-7181-4, 307, 10.1007/978-981-16-7182-1_25 | |
4. | Wajid Ali, Miin-Shen Yang, Mehboob Ali, Saif Ud-Din, Fuzzy model-based sparse clustering with multivariate t-mixtures, 2023, 37, 0883-9514, 10.1080/08839514.2023.2169299 | |
5. | Maria Brigida Ferraro, Marco Forti, Paolo Giordani, 2024, FKML0: A Matlab Routine for Sparse Fuzzy Clustering, 979-8-3503-1954-5, 1, 10.1109/FUZZ-IEEE60900.2024.10611853 | |
6. | Young-Geun Choi, Soohyun Ahn, Jayoun Kim, Model-Based Clustering of Mixed Data With Sparse Dependence, 2023, 11, 2169-3536, 75945, 10.1109/ACCESS.2023.3296790 | |
7. | Maria Brigida Ferraro, Marco Forti, Paolo Giordani, 2025, Chapter 5, 978-3-031-64446-7, 28, 10.1007/978-3-031-64447-4_5 | |
8. |
Maria Brigida Ferraro, Marco Forti, Paolo Giordani,
Fuzzy clustering with L0 regularization,
2025,
0254-5330,
10.1007/s10479-025-06502-1
|
|
9. | Peng Liao, Xiaocui Zhang, Ruyan Yuan, Tingjun Tang, Feng Wang, Baowei Geng, Application of bioinformatics analysis to construct the prognostic model and immune-related gene characteristics of low-grade glioma, 2025, 16, 2730-6011, 10.1007/s12672-025-02639-4 |
c | F-MB-Gauss | FG-Lasso | ||
d | AR | d* | AR | |
3 | 4 | 0.3501 | 2 | 0.961 |
Features | $ \lambda $ | |
10 | 15 | |
$ featur{e_1} $ | - | - |
$ featur{e_2} $ | - | - |
$ featur{e_3} $ | $ \hat{\mu}_{13}=\hat{\mu}_{23}=\hat{\mu}_{33}=0 $ | $ \times $ |
$ featur{e_4} $ | - | $ \hat{\mu}_{14}=\hat{\mu}_{24}=\hat{\mu}_{34}=0 $ |
c | F-MB- Gauss | FG-Lasso | ||
d | AR | d* | AR | |
5 | 3 | 0.320 | 2 | 0.810 |
Features | $ \lambda $ | |
80 | 105 | |
$ featur{e_1} $ | - | - |
$ featur{e_2} $ | - | - |
$ featur{e_3} $ | $ \hat{\mu}_{13}=\hat{\mu}_{43}=\hat{\mu}_{53}=0 $ | $ {{\hat \mu }_{23}} = {{\hat \mu }_{33}} = 0 $ |
c | F-MB-Gauss | FG-Lasso | ||
d | AR | d* | AR | |
2 | 30 | 0.905 | 27 | 0.925 |
c | F-MB-Gauss | FG-Lasso | ||
d | AR | d* | AR | |
2 | 2000 | 0.601 | 1384 | 0.653 |
c | F-MB-Gauss | FG-Lasso | ||
d | AR | d* | AR | |
2 | 2000 | 0.393 | 654 | 0.615 |
c | F-MB-Gauss | FG-Lasso | ||
d | AR | d* | AR | |
3 | 4 | 0.3501 | 2 | 0.961 |
Features | $ \lambda $ | |
10 | 15 | |
$ featur{e_1} $ | - | - |
$ featur{e_2} $ | - | - |
$ featur{e_3} $ | $ \hat{\mu}_{13}=\hat{\mu}_{23}=\hat{\mu}_{33}=0 $ | $ \times $ |
$ featur{e_4} $ | - | $ \hat{\mu}_{14}=\hat{\mu}_{24}=\hat{\mu}_{34}=0 $ |
c | F-MB- Gauss | FG-Lasso | ||
d | AR | d* | AR | |
5 | 3 | 0.320 | 2 | 0.810 |
Features | $ \lambda $ | |
80 | 105 | |
$ featur{e_1} $ | - | - |
$ featur{e_2} $ | - | - |
$ featur{e_3} $ | $ \hat{\mu}_{13}=\hat{\mu}_{43}=\hat{\mu}_{53}=0 $ | $ {{\hat \mu }_{23}} = {{\hat \mu }_{33}} = 0 $ |
c | F-MB-Gauss | FG-Lasso | ||
d | AR | d* | AR | |
2 | 30 | 0.905 | 27 | 0.925 |
c | F-MB-Gauss | FG-Lasso | ||
d | AR | d* | AR | |
2 | 2000 | 0.601 | 1384 | 0.653 |
c | F-MB-Gauss | FG-Lasso | ||
d | AR | d* | AR | |
2 | 2000 | 0.393 | 654 | 0.615 |