Addressing limitations of the K-means clustering algorithm: outliers, non-spherical data, and optimal cluster selection

Iliyas Karim khan; Hanita Binti Daud; Nooraini binti Zainuddin; Rajalingam Sokkalingam; Abdussamad; Abdul Museeb; Agha Inayat; Iliyas Karim khan; Hanita Binti Daud; Nooraini binti Zainuddin; Rajalingam Sokkalingam; Abdussamad; Abdul Museeb; Agha Inayat

doi:10.3934/math.20241222

AIMS Mathematics

2024, Volume 9, Issue 9: 25070-25097. doi: 10.3934/math.20241222

Previous Article Next Article

Research article Special Issues

Addressing limitations of the K-means clustering algorithm: outliers, non-spherical data, and optimal cluster selection

1.
Fundamental and Applied Science Department, Universiti Teknologi PETRONAS, Perak 32610, Malaysia
2.
Department of Statistic University of Malakand Chakdara Lower Dir, Khyber Pakhtunkhwa Pakistan

Received: 14 June 2024 Revised: 17 July 2024 Accepted: 23 July 2024 Published: 27 August 2024
MSC : 68T10, 91C20

Clustering is essential in data analysis, with K-means clustering being widely used for its simplicity and efficiency. However, several challenges can affect its performance, including the handling of outliers, the transformation of non-spherical data into a spherical form, and the selection of the optimal number of clusters. This paper addressed these challenges by developing and enhancing specific models. The primary objective was to improve the robustness and accuracy of K-means clustering in the presence of these issues. To handle outliers, this research employed the winsorization method, which uses threshold values to minimize the influence of extreme data points. For the transformation of non-spherical data into a spherical form, the KROMD method was introduced, which combines Manhattan distance with a Gaussian kernel. This approach ensured a more accurate representation of the data, facilitating better clustering performance. The third objective focused on enhancing the gap statistic for selecting the optimal number of clusters. This was achieved by standardizing the expected value of reference data using an exponential distribution, providing a more reliable criterion for determining the appropriate number of clusters. Experimental results demonstrated that the winsorization method effectively handles outliers, leading to improved clustering stability. The KROMD method significantly enhanced the accuracy of converting non-spherical data into spherical form, achieving an accuracy level of 0.83 percent and an execution time of 0.14 per second. Furthermore, the enhanced gap statistic method outperformed other techniques in selecting the optimal number of clusters, achieving an accuracy of 93.35 percent and an execution time of 0.1433 per second. These advancements collectively enhance the performance of K-means clustering, making it more robust and effective for complex data analysis tasks.

Keywords:

Citation: Iliyas Karim khan, Hanita Binti Daud, Nooraini binti Zainuddin, Rajalingam Sokkalingam, Abdussamad, Abdul Museeb, Agha Inayat. Addressing limitations of the K-means clustering algorithm: outliers, non-spherical data, and optimal cluster selection[J]. AIMS Mathematics, 2024, 9(9): 25070-25097. doi: 10.3934/math.20241222

Related Papers:

[1]	Emmanuel Daanoba Sunkari, Basiru Mohammed Kore, Samuel Edem Kodzo Tetteh . Petrography and structural features of the Precambrian basement rocks in the Benin-Nigerian Shield, NW Nigeria: Implications for their correlation with South Atlantic Precambrian terranes. AIMS Geosciences, 2022, 8(4): 503-524. doi: 10.3934/geosci.2022028
[2]	John V. Smith, Christian Arnhardt . A New Assessment Method for Structural-Control Failure Mechanisms in Rock Slopes — Case Examples. AIMS Geosciences, 2016, 2(3): 214-230. doi: 10.3934/geosci.2016.3.214
[3]	Inthuorn Sasanakul, Sarah Gassman, Pitak Ruttithivaphanich, Siwadol Dejphumee . Characterization of shear wave velocity profiles for South Carolina Coastal Plain. AIMS Geosciences, 2019, 5(2): 303-324. doi: 10.3934/geosci.2019.2.303
[4]	Wu Yang, Ning Yu, Mingxing Yang, Jun Yan, Min Zhang, ShiQiang Yang . Sustainable development of geological resources: the Characteristics of Red Karst Landscape and Tourism Development in Tongren, Guizhou. AIMS Geosciences, 2024, 10(1): 141-171. doi: 10.3934/geosci.2024009
[5]	Andre Vervoort . Impact of the closure of a coal district on the environmental issue of long-term surface movements. AIMS Geosciences, 2022, 8(3): 326-345. doi: 10.3934/geosci.2022019
[6]	Yasir Bashir, Muhammad Afiq Aiman Bin Zahari, Abdullah Karaman, Doğa Doğan, Zeynep Döner, Ali Mohammadi, Syed Haroon Ali . Artificial intelligence and 3D subsurface interpretation for bright spot and channel detections. AIMS Geosciences, 2024, 10(4): 662-683. doi: 10.3934/geosci.2024034
[7]	Dujie Zhang . Rock strength degradation induced by salt precipitation: A new mechanical mechanism of sand production in ultra-deep fractured tight sandstone gas reservoirs. AIMS Geosciences, 2023, 9(3): 595-608. doi: 10.3934/geosci.2023032
[8]	Shishay T Kidanu, Gebremedhin Berhane, Mogos Amare, Mulubrhan Kebede . Assessment of rock mass properties and load-bearing potential in low-grade metamorphic rocks: A study from the Tigray region, Ethiopia. AIMS Geosciences, 2024, 10(3): 498-523. doi: 10.3934/geosci.2024026
[9]	Emil Drápela . Prevention of damage to sandstone rocks in protected areas of nature in northern Bohemia. AIMS Geosciences, 2021, 7(1): 56-73. doi: 10.3934/geosci.2021003
[10]	Evgeniy V. Torgashov, Aleksandra V. Varnavina . Site Characterization during Bridge Foundation Construction Using Electrical Resistivity Tomography. AIMS Geosciences, 2016, 2(3): 201-213. doi: 10.3934/geosci.2016.3.201

Abstract

1. Introduction

Rock surfaces are habitats for microbes and primary sources of nutrients. The suitability of rock surfaces for colonization depends on the physical shape of the surfaces and the bioavailability of cations and anions. Processes providing those nutrients are atmospheric deposition and mineral weathering. Geomorphologists apply various methods for rock surface weathering and erosion ^[1]. The quantification of the weathering state of rocks is often expressed in terms of weathering indices involving various variables such as e.g., strength, porosity, moisture distribution as well as microscopic observations ^[6].

The non-destructive investigation of rock surfaces and their degradation has been limited to visual techniques. The reactivity of accessible minerals controls the rate of weathering and therefore the rate of initial soil formation and development of soil fertility ^[7]. This is a prerequisite for the prosperity of life and can influence global climate ^[8].

To date, the mobilized ions on rock surfaces have been analyzed only indirectly, i.e. via the measurement of the infiltrate ^[9] or by the weight loss of the rock ^[10]. The heterogeneity of rock surfaces becomes particularly pronounced on the microscale, the scale of microorganisms. Weathering processes and the microbial colonization develop at particularly vulnerable sites, such as fissures, grain boundaries, as well as kinks and steps within single minerals ^[11]. The weathering age of rocks has often been determined by lichens, due to their slow growth and longevity ^[12]. Nevertheless, metabolic processes can hardly be studied, since the cultivation and separation of crustose lichens is fraught with difficulties ^[13] (and citations therein). The concentration of adenosine triphosphate (ATP) indicates the viability of the lichens ^[20,21] (and citations therein), but the analysis has been limited to lichens that can easily be separated from the substrate (e.g. ^[22,23]).

The present work introduces a novel and non-destructive sampling method for the measurement of readily available ions and mobile ATP from rock surfaces that are either bare or covered by lichens. The method (DoR, “Drop on Rock”) is based on the analysis of a drop of pure water spread onto and recollected from the rock surface to be investigated. Dissolved ions and ATP are analyzed by a capillary electrophoresis (CE) instrument and a luminometer, respectively. Both instruments are portable and can deal with sample volumes as low as 25 µL.

We applied the DoR method to freshly broken and pre-weathered granite surfaces, as well as to rock surfaces covered by map lichen (Rhizocarpon geographicum). In particular, we evaluated (i) the macroscopic surface heterogeneity of bare granite surfaces with respect to readily available cations and (ii) the effect of wetting and freezing. In addition, (iii) we analyzed the readily available ions and mobile ATP on lichens. For this purpose, lichen-covered granite surfaces were investigated with respect to their age (size) and to humidity.

2. Materials and Methods

2.1. Granite Samples

All experiments were conducted with granite specimens. Fist-sized samples were collected along the glacier chronosequence of the Damma glacier forefield (Central Alps, Switzerland) from 2012 to 2014. Samples for the lichen experiments were collected in October 2013 and June 2014 using sterile gloves and immediately packed in sterile plastic bags for transportation to the laboratory. From each sampling site, the lichens with the largest diameter were preferred, as they represented the earliest colonization. The investigated granite was composed mainly of plagioclase, quartz, microcline, muscovite, biotite, and epidote. Accessory minerals were chlorite, apatite, and magnetite ^[24,25]. The granite was formed 300 million years ago, metamorphosed under greenschist conditions and belongs to the Aar massif ^[26].

To obtain pairs of freshly broken and unweathered rock surfaces, the granite samples for the wetting and freezing experiments were cut to~100 cm³ uniform cuboids or fist-sized chunks using a water-cooled diamond saw. Afterwards, the samples were incised to a depth of 1 cm with the diamond saw, cleaned with pure water, and split with a chisel and a hammer to produce a pair of mirror-imaged specimens. All samples were stored in a clean bench with a high efficiency particulate air filter (HEPA) to avoid contamination. All freshly broken rock surfaces were photographed to document the sampling points.

2.2. Wetting and Freezing-thawing Experiments

The granite samples were placed in plastic boxes filled with pure water up to 1 cm below the freshly broken surface for the duration of six hours. Care was taken not to submerge the surface to be sampled in water to avoid the removal of readily available ions. The assembly was placed in a desiccator (without drying agent) and a vacuum applied causing the pore spaces of the granite sample to fill with water, thus humidifying the granites completely. The water-saturated granite specimen were either stored in the clean bench for drying or frozen at-20°C for 24 h. For thawing, the granite samples were placed back into the clean bench at room temperature for another 24 h. As soon as the granite specimens were visibly dry (color appearance) the next DoR sampling was carried out. The same spots were sampled before and after wetting and freezing.

2.3. Field Exposure Experiments on Mount Pilatus

Freshly broken granite surfaces (6 pairs of mirror-faced samples) were exposed on three field locations within an altitude transect on the northern slope of Mount Pilatus, Central Switzerland. The lowest location (1007 m a.s.l.) was in a fen characterized by low-grown vegetation. The intermediate location (1445 m a.s.l.) was in a forest glade and the topmost location was above the timberline on the roof of the hotel building on top of Mount Pilatus (2075 m a.s.l.) (Figure 1, left). At all sites, humidity and temperature were recorded with a sensor (Hioki LR5001, Nagano, Japan). Additional climate data were provided by the Federal Office of Meteorology and Climatology (IDAWEB 1.1.21^© MeteoSwiss). For the study of the influence of atmospheric deposition both halves of the fist-sized granite chunks were mounted side by side. One half of each pair was positioned facing downwards, while the corresponding half was placed facing upwards in open plastic containers (17×25cm) with holes drilled for drainage. Inside the containers the rock samples were suspended with the help of a grid made from cable ties (Figure 1, right). The plastic containers exposed below the timberline were mounted 40 cm above ground to avoid soil contamination. DoR sam ples were taken on site from 2012 to 2014 once each summer.

Figure 1. Left: Experimental setting of exposed rock samples located on the roof of the hotel building on top of Mount Pilatus (Central Switzerland). Right: a pair of freshly broken mirror-faced granite surfaces is visible in the box. One surface was facing upwards (back) and the other downwards (front) to determine the influence of atmospheric deposition. A humidity and temperature sensor was mounted at the front of the box.

DownLoad: Full-Size Img PowerPoint

2.4. The Drop-on-Rock Method

The extraction of readily available ions in a small volume (50 µL) of water from bare rock surfaces requires its confinement in a defined area. We used a ring made of rubber foam (ear plugs) of clinical quality (Ohropax, Germany) with an inner diameter of 8 mm. The rubber foam proved to be chemically inert to the species of interest and allowed easy handling without leaking. While pipetting, the rubber ring was pressed onto the rock surface using a plastic ring as shown in Figure 2 (left). The contact time between the rock and the drop was as short as possible (~1 s) as otherwise the water drop would be absorbed by the pores of the granite samples. The procedure was repeated twice for each sample spot with the same drop of water to guarantee a sufficient mixing. Finally, the remaining solution (~30 µL) was transferred to a 100 µL plastic vial or a sterile 2 mL reaction tube (Greiner Bio One, Germany) for the ion or ATP analysis, respectively.

Figure 2. Left: Applying the DoR method on a freshly broken rock surface with the confining rubber foam ring (yellow) pressed on by a plastic ring (colorless, opaque). Right: Applying the DoR method on R. geographicum growing on a granite specimen. Note that the drop at the pipette tip remains intact due to the hydrophobic nature of the lichen’s surface.

DownLoad: Full-Size Img PowerPoint

2.5. DoR Sampling on Lichens and ATP Analysis

Lichen-covered granite samples from 2013 were stored in plastic bags at 5°C, while those from 2014 were placed on the roof of the institute but sheltered to avoid atmospheric deposition. The humidity experiments were conducted with samples from 2013, all other experiments with samples from 2014. Before sampling, all rocks collected in 2014 were conditioned in a self-built humidifier system (plastic box of 16×39×68 cm, containing an automatic humidifier from Le Veil, Spokane, WA, USA) for 0.5 hours with 80% humidity. Note, DoR sampling on lichens was carried out without the ring of rubber foam as the surface of lichens is hydrophobic the drop remained confined. The age of the lichens was derived from their diameter assuming an annual growth rate of 0.5mm ^[27]. ATP analysis was performed with the BacTiter-Glo Microbial Cell Viability Assay and the GloMax® 20/20 luminometer (Promega, Dübendorf, Switzerland). The ATP for the stock solution was from Thermo Scientific (Waltham, MA, USA). In deviation from the standard procedure given in the manual of the instrument the assay kit was equilibrated at room temperature and 10 mL of the substrate added to the buffer to prepare the reagent. The reagent (30 µL) was then pipetted into sterile reaction tubes (Greiner Bio One, Frickenhausen, Germany) and stored at-20°C for further use. Samples and reagents were warmed up to 25°C for 2 minutes. For the actual measurement, 30µL of the sample was added to the reagent solution and incubated at 25°C for 20 seconds.

2.6. Ion Analysis with Capillary Electrophoresis

The analysis of ions was performed with a portable CE instrument with capacitively coupled contactless conductivity detection (C⁴D) ^[27]. The equipment can deal with~25 µL of sample and is suitable for field applications ^[28]. A detailed measurement protocol and a schematic drawing of the portable capillary electrophoresis instrument are given in Torres et al. ^[27]. In short, the background electrolyte solution consisted of 11 mmol/L L-histidine, 50 mmol/L acetic acid, and 1.5 mmol/L 18-crown-6. Reagents for the stock solutions were provided by Fluka or Sigma-Aldrich (Buchs, Switzerland; Steinheim, Germany). Pure water for the preparation of the solutions and for the DoR sampling was from Merck (Zug, Switzerland). Vials and pipette tips were cleaned with 0.1 molar acetic acid and pure water, and air dried in a HEPA clean bench. A fused silica capillary (50μm i.d., 360μm o.d., 55 cm length) (BGB Analytic AG, Böckten, Switzerland) was used for the separation of ions under application of a voltage of 15 kV. For the simultaneous analysis of anions and cations, we injected the sample on two different CE instruments (identical set-ups, but different polarities) for 20 seconds hydrodynamically with 15 cm height for anions and 8 cm height for cations. The TraceDec® C⁴D detector (Innovative Sensor Technologies, Strasshof, Austria) and the eDAQ Chart software (version 5.5.8, Denistone East NSW 2112, Australia) were employed for data acquisition. The statistical significance of the data was evaluated with the Student’s t-test in R (GNU statistical software).

2.7. Scanning Electron Microscopy (SEM)

Scanning electron microscopy (Nova NanoSEM 230 FEI) with a gaseous analytical detector (GAD) was used for the visualization of the surface topography of the granite samples before and after freezing and thawing. To avoid alteration of the rock surfaces, the samples were not coated and a low vacuum mode was chosen (0.7 mbar). The spot size was set to 3.5, and the beam current to 15 kV. The rock surfaces were marked with three dots (~0.25 mm) of colloidal silver paste (Electron Microscopy Sciences, Hatfield, USA) as anchor points of a coordinate system.

3. Results and Discussion

3.1. Reproducibility of the Drop-on-Rock Method

Various eluents were tested to assure that only readily available ions were extracted but not ions released from the crystal lattice by hydrolysis. The differences in ion concentrations removed from the surface with a drop of pure H₂O, 1 mmol/L HCl, or 1 mmol/L oxalic acid were statistically not significant. Hence, we conclude that the contact time of a few seconds was short enough to avoid hydrolysis. Repeated sampling of one spot resulted in decreasing concentrations of ions removed from the surface. This behavior suggested that the surface was increasingly exhausted of readily available compounds as a result of stepwise elution. This is one reason for the low reproducibility of the DoR sampling on granite surfaces. Another limitation arises from the fact that the sampled surface area varied at the millimeter scale, since the re-positioning of the confining rubber ring is not accurate enough in comparison with the mineralogical and topographical alterations at the micrometer scale. Therefore, the DoR sampling method was developed for semi-quantitative investigations only. The accuracy of the rubber ring’s re-positioning may be improved by using a special fixing device, which however would require additional technical effort.

The portable CE instrument allowed the ion analysis of tiny sample volumes (~25 µL) within 10 minutes with detection limits in the sub-micromolar range ^[29]. The quality of the sampling water was critical for low detection limits.

For the analysis of mobile ATP with the luminometer from triplicate measurements we achieved standard deviations of < 10% and < 5% for concentrations of ≤5 nmol/L, and ≤10 nmol/L, respectively. Relative standard coefficients for five point calibration curves were≥99%.

3.2. Ion Availability from Freshly Broken and Water-treated Granite Surfaces

The release of ions from a rock surface is expected to depend on its mineral composition, the orientation of the minerals in relation to the rock’s surface, and its weathering state (e.g. ^[11]). Therefore, ion availability was tested at selected spots on the rock surface at areas dominated by either quartz or biotite minerals. Measurements were made first on the untreated surface after break-up and repeated after wetting and drying of the rock. The photo in Figure 3 shows one face of a cuboid granite specimen with the investigated DoR sampling spots. Initial measurements on untreated surfaces did not display any significant difference between locations dominated by either quartz or biotite. While quartz is prone to be most resistant with regard to weathering, e.g. biotite (Spots # 1 and 3, Figure 3) and feldspar contain ions that can easily be removed from the crystal lattice by hydrolysis. Results indicated that no dissolution of surface minerals occurred on freshly broken surfaces. However, granite surfaces that had been subjected to six hours of wetting, followed by drying, before sampling showed that all ion concentrations significantly increased. Concentrations were highest on spots dominated by biotite minerals (# 3, 1) and decreased with increasing proportions of quartz (in the sequence of # 4, 2, 5). The results were confirmed with replicate tests on three different granite surfaces (data not shown). In conclusion, wetting caused a correlation between the concentrations of the released ions and the weatherability of the sampled spots.

Figure 3. Freshly broken surface of a granite cuboid exhibiting a typical mineral distribution. Dashed circles highlight the areas sampled with the DoR method. Pie charts represent the cation concentrations on the sampled spots. There is a qualitative shift in the relative contribution from biotite to quartz along the Spots # 3, 1, 4, 2 to 5. Calcium concentrations (µmol/L) are indicated in the blue parts of the pie charts.

DownLoad: Full-Size Img PowerPoint

3.3. Ion Availability from Frozen and Thawed Granite Surfaces

To investigate the impact of physical stress several pairs of mirror-imaged granite specimens were treated by freezing and thawing. Figure 4 shows a surface of a granite specimen with the investigated DoR sampling spots (top) and the sodium concentrations in the DoR samples before and after several freeze-thaw cycles (bottom). Regardless the increasing number of freeze-thaw cycles the concentrations did not show any trend. The scatter of data along the number of freeze-thaw cycles were probably caused by slight deviations of the sampled surface area in the course of the experimental campaign. As seen already in Section 3.2, the DoR samples from spots dominated by feldspar or biotite (# 1, 2, and 3) generally released higher sodium concentrations than other spots.

Figure 4. Sodium concentrations obtained from sampling spots 1-6 after various cycles of freezing and thawing (0, 4, 8, 12, 16, 24). Sampling Spot # 1 was characterized by a high content of feldspar minerals, while 2 and 4 contained more biotite. Spots # 3, 5 and 6 were dominated by quartz.

DownLoad: Full-Size Img PowerPoint

In addition, we compared a pair of mirror-imaged granite specimens, one only wetted, and the other also frozen and thawed. The surfaces of both halves released more ions than before treatment, surprisingly at comparable concentrations. To examine the possibility of additional generation of surface through freezing, the same 300 µm×300 µm surface area before and after freezing and thawing was imaged by SEM (Figure 5). As a consequence of frost action, pronounced surface break-down and rearrangement of particles were observed. However, the expected increase of instantly available cations was not confirmed by the CE measurements. Thus, the mobilization of cations from the granite surface appears to be promoted mainly by wetting rather than by subsequent freezing and thawing.

Figure 5. SEM pictures of feldspar minerals on a granite surface before (left) and after (right, slightly distorted with respect to the left image) freezing and thawing. The yellow circles highlight altered locations, where material was quarried out or rearranged. (Two animation videos on surface changes due to freezing are provided in the supplementary, A1L.mp4 and A1N.mp4).

DownLoad: Full-Size Img PowerPoint

Again, surface spots dominated by phyllosilicates showed the highest concentrations of instantly available cations after wetting. Frost action did not increase the availability of ions in our experiments. Hence, the creation of new reactive surfaces relevant for the instant availability of cations appears to depend on the scale. Additional alterations on the micrometer scale through freezing and thawing are probably not relevant for increasing the release of ions during the very beginning of exposure. To verify these laboratory results we exposed granite samples to alpine field conditions (next section).

3.4. Ion Availability from Rocks Exposed to an Alpine Environment

We investigated the influence of atmospheric deposition and altitude-dependent climate on the surface-specific availability of nutrients in an alpine environment. Fresh granite surfaces were exposed on the northern slope and on top of Mount Pilatus from 2012 to 2014. On the mountain top, the temperature ranged from-25°C to +55°C. In contrast, the temperature at the lowest exposure site that was surrounded by shrub-sized vegetation, varied only between-5°C and +22°C. Among the approximately 200 DoR samples, no significant differences were found between the different altitudes (data not shown). This applied also to the comparison between granite surfaces that were facing upwards or downwards.

At all three altitudes, the concentrations of phosphate, nitrate and sulfate were mostly below detection limits according to Torres et al. ^[12]. However, less than 10% of the DoR samples collected on the upwards oriented granite surfaces showed concentrations of phosphate, nitrate and sulfate in the order of 1-40, 0.5-10, and 0.5-5 µmol/L, respectively (data not shown). Annual rates of atmospheric deposition were several orders of magnitude higher ^[15]. Thus, we conclude that bare granite surfaces did not accumulate atmospheric deposition, and therefore the DoR samples include the recently deposited material only. This can be seen also in the comparison of the three years of exposure, i.e. the highest ion concentrations were observed during the driest summer (2013). In conclusion, the contribution of atmospheric deposition is of limited residence time, if at all. For the given time-span, extrinsic factors such as frost, wind and radiation did not enhance the reactivity of the granite surface. These findings are in agreement with the freeze-thaw experiments (previous chapter). Despite the resistance of granite with respect to extrinsic factors, concentrations of instantly available ions can be significantly increased after the first contact with water, especially on spots with minerals prone to weathering.

3.5. ATP and Cation Availability from Lichen Surfaces

The lichen R. geographicum is one of the first colonizers of freshly exposed rocks. Within the same locality under identical climatic conditions the lichens’diameter is seen to represent their relative age ^[22]. Although, R.geographicum is widely used for geochronology, the study of crustose lichens is challenging as they grow very slowly and cannot be seperated easily from the substrate ^[12,30]. Although, ATP represents the vitality of the lichen ^[31], the analysis of ATP has not been applied yet to crustose lichen.

Using the DoR method we analyzed in parallel the concentrations of ions and mobile ATP as a factor of humidity and age (size). As shown in Figure 6 (left panel), initial experiments revealed that the mobile ATP on a lichen surface was highly dependent on ambient humidity. Mobile ATP was significantly higher when lichens were conditioned with moisture in comparison to unconditioned lichens. Thus, the production of ATP was reduced if the relative humidity was not optimal. Lichens are able to constrain their metabolism to protect themselves from desiccation ^[32,33]. To study the necessary time span of humidification for optimal metabolic rates, we analyzed the ATP concentrations of three individuals as a function of their exposure time (0.5, 2, and 13 hours) to a constant ambient humidity of 80%. The data shown in Figure 6 (right panel) reveal that ATP did not increase further after the first half hour of humidification. An additional parameter important for the metabolic activity of lichens is temperature. However, we worked under ambient conditions, thus we cannot quantify the influence of temperature.

Figure 6. Left: Immediately mobile ATP on dry vs. humid lichens. Right: Effect of humidification time-span (80% humidity) on ATP availability after 0.5 h (white), 2 h (light grey) and 13 h (dark grey). All data represent single measurements.

DownLoad: Full-Size Img PowerPoint

Furthermore, we investigated the relationship of the lichens’size and mobile ATP to estimate the dependence of vitality on age. ATP concentrations measured from R. geographicum individuals of 0.5 to 3.9 cm in diameter (10 to 80 years old) indicate a parabolic trend (Figure 7). Our findings are in accordance with previous observations of Armstrong and Bradwell ^[12,30]. Their data on radial growth rates of 3 to 50 years old lichens showed also a parabolic trend. Therefore, they concluded that growth accelerates after colonization and decelerates as lichens become older. It is assumed that cell degradation in the senescent phase may slow down growth ^[31]. This could also explain the decline in ATP concentration in our study. It must be stated, however, that differences in ATP availability from the older part (center) in comparison to the younger part (corona) of R. geographicum were not statistically significant (data not shown).

Figure 7. Dependence of mobile ATP (black dots) and readily available K⁺ (red crosses) on thallus size of R.geographicum. Data points represent mean values of four DoR samples. Curves show best fit (for fitting, the outlier at~14 nmol/L ATP was omitted).

DownLoad: Full-Size Img PowerPoint

The analysis of potassium in the same DoR samples revealed a similar parabolic trend as observed from ATP. However, the other prominent cations, i.e. sodium, calcium and magnesium did not show any trend as a function of size. Potassium is an important nutrient for the lichen’s algae and can be mobilized by lichens from biotite ^[32,33]. To date it is not clear if a limited supply of K⁺ for the algae cell is corresponding to cell degradation. However, our limited data does not allow a definite conclusion from the ATP and K⁺ data regarding the weathering capability of R. geographicum. For linking the vitality of the organisms to rates of biological weathering more experimental data are needed. Nevertheless, the DoR method may be a useful tool to study such metabolic interactions.

4. Conclusions

In conclusion, the DoR method proved to be applicable for the investigation of rock and lichen surfaces for their readily available ions and mobile ATP. This opens a new experimental access to surface processes. The sampling procedure is simple, non-destructive and therefore also ideal for on-site studies, especially in remote areas or on samples that cannot be removed, e.g. buildings, monuments or large boulders. The analysis with the portable capillary electrophoresis instrument and the luminometer is reliable, inexpensive, and suitable for fieldwork. A promising future application might be the investigation of microorganisms in interaction with their chemical environment, e.g. by identifying the enrichment of elements on surfaces. As a follow-up to this study, the initial weathering of further rock types will be investigated with the DoR method, e.g. limestone, which is a very common building material and important in global weathering cycles.

Acknowledgments

We are grateful for technical support by Beat Kienholz, Michael Schurter (both Eawag), and Pius Dahinden (Pilatus-Bahnen AG). For the analytical support we thank Brian Sinnet, who helped with the scanning electron microscope and Frederik Hammes, who shared his experiences in the analysis of ATP (both Eawag). Furthermore, we thank Monika Niederhuber and Daniel Trüssel (both ETH Zürich) for the video animation of the SEM pictures. The authors acknowledge support by the Swiss National Science Foundation (Grant no. 200021-137715).

Conflict of Interest

All authors declare no conflict of interest.

References

[1]	X. Du, Y. He, J. Z. Huang, Random sample partition-based clustering ensemble algorithm for big data, 2021 IEEE International Conference on Big Data (Big Data), 2021, 5885–5887. https://doi.org/10.1109/BigData52589.2021.9671297
[2]	B. Huang, Z. Liu, J. Chen, A. Liu, Q. Liu, Q. He, Behavior pattern clustering in blockchain networks, Multimed. Tools Appl., 76 (2017), 20099–20110. https://doi.org/10.1007/s11042-017-4396-4 doi: 10.1007/s11042-017-4396-4
[3]	Y. Djenouri, A. Belhadi, D. Djenouri, J. C. W. Lin, Cluster-based information retrieval using pattern mining, Appl. Intell., 51 (2021), 1888–1903. https://doi.org/10.1007/s10489-020-01922-x doi: 10.1007/s10489-020-01922-x
[4]	C. Ouyang, C. Liao, D. Zhu, Y. Zheng, C. Zhou, C. Zou, Compound improved Harris hawks optimization for global and engineering optimization, Cluster Comput., 2024. https://doi.org/10.1007/s10586-024-04348-z
[5]	J. Xu, T. Li, D. Zhang, J. Wu, Ensemble clustering via fusing global and local structure information, Expert Syst. Appl., 237 (2024), 121557. https://doi.org/10.1016/j.eswa.2023.121557 doi: 10.1016/j.eswa.2023.121557
[6]	W. L. Zhao, C. H. Deng, C. W. Ngo, K-means: a revisit, Neurocomputing, 291 (2018), 195–206. https://doi.org/10.1016/j.neucom.2018.02.072 doi: 10.1016/j.neucom.2018.02.072
[7]	J. Qi, Y. Yu, L. Wang, J. Liu, K-means: an effective and efficient K-means clustering algorithm*, 2016 IEEE international conferences on big data and cloud computing (BDCloud), social computing and networking (SocialCom), sustainable computing and communications (SustainCom) (BDCloud-SocialCom-SustainCom), IEEE, 2016. https://doi.org/10.1109/BDCloud-SocialCom-SustainCom.2016.46
[8]	X. Wu, H. Zhou, B. Wu, T. Zhang, A possibilistic fuzzy Gath-Geva clustering algorithm using the exponential distance, Expert Syst. Appl., 184 (2021), 115550. https://doi.org/10.1016/j.eswa.2021.115550 doi: 10.1016/j.eswa.2021.115550
[9]	Y. Liu, Z. Liu, S. Li, Y. Guo, Q. Liu, G. Wang, Cloud-cluster: an uncertainty clustering algorithm based on cloud model, Knowl.-Based Syst., 263 (2023), 110261. https://doi.org/10.1016/j.knosys.2023.110261 doi: 10.1016/j.knosys.2023.110261
[10]	M. Ahmed, R. Seraj, S. M. S. Islam, The K-means algorithm: a comprehensive survey and performance evaluation, Electronics, 9 (2020), 1295. https://doi.org/10.3390/electronics9081295 doi: 10.3390/electronics9081295
[11]	T. M. Ghazal, Performances of K-means clustering algorithm with different distance metrics, Intell. Autom. Soft Comput., 30 (2021), 735–742. https://doi.org/10.32604/iasc.2021.019067 doi: 10.32604/iasc.2021.019067
[12]	Z. Zhang, Q. Feng, J. Huang, Y. Guo, J. Xu, J. Wang, A local search algorithm for K-means with outliers, Neurocomputing, 450 (2021), 230–241. https://doi.org/10.1016/j.neucom.2021.04.028 doi: 10.1016/j.neucom.2021.04.028
[13]	E. Dandolo, A. Pietracaprina, G. Pucci, Distributed K-means with outliers in general metrics, In: J. Cano, M. D. Dikaiakos, G. A. Papadopoulos, M. Pericàs, R. Sakellariou, Euro-Par 2023: Parallel Processing. Euro-Par 2023, Lecture Notes in Computer Science, Cham: Springer, 14100 (2023), 474–488. https://doi.org/10.1007/978-3-031-39698-4_32
[14]	H. He, Y. He, F. Wang, W. Zhu, Improved K‐means algorithm for clustering non‐spherical data, Expert Syst., 39 (2022), e13062. https://doi.org/10.1111/exsy.13062 doi: 10.1111/exsy.13062
[15]	J. Heidari, N. Daneshpour, A. Zangeneh, A novel K-means and K-medoids algorithms for clustering non-spherical-shape clusters non-sensitive to outliers, Pattern Recogn., 155 (2024), 110639. https://doi.org/10.1016/j.patcog.2024.110639 doi: 10.1016/j.patcog.2024.110639
[16]	T. M. Kodinariya, P. R. Makwana, Review on determining number of cluster in K-means clustering, Int. J. Adv. Res. Comput. Sci. Manage. Stud., 1 (2013), 90–95.
[17]	B. Sowan, T. P. Hong, A. Al-Qerem, M. Alauthman, N. Matar, Ensembling validation indices to estimate the optimal number of clusters, Appl. Intell., 53 (2023), 9933–9957. https://doi.org/10.1007/s10489-022-03939-w doi: 10.1007/s10489-022-03939-w
[18]	J. Rossbroich, J. Durieux, T. F. Wilderjans, Model selection strategies for determining the optimal number of overlapping clusters in additive overlapping partitional clustering, J. Classif., 39 (2022), 264–301. https://doi.org/10.1007/s00357-021-09409-1 doi: 10.1007/s00357-021-09409-1
[19]	Z. Hao, Z. Lu, G. Li, F. Nie, R. Wang, X. Li, Ensemble clustering with attentional representation, IEEE Trans. Knowl. Data Eng., 36 (2023), 581–593. https://doi.org/10.1109/TKDE.2023.3292573 doi: 10.1109/TKDE.2023.3292573
[20]	Z. P. Zhang, S. Li, W. X. Liu, Y. Wang, D. X. Li, A new outlier detection algorithm based on fast density peak clustering outlier factor, Int. J. Data Warehous. Mining, 19 (2023), 1–19. https://doi.org/10.4018/IJDWM.316534 doi: 10.4018/IJDWM.316534
[21]	W. Wang, Y. Ren, R. Zhou, J. Zhang, An outlier detection algorithm based on probability density clustering, Int. J. Data Warehous. Mining, 19 (2023), 1–20. https://doi.org/10.4018/IJDWM.333901 doi: 10.4018/IJDWM.333901
[22]	Y. Liu, Z. Liu, S. Li, Z. Yu, Y. Guo, Q. Liu, et al., Cloud-vae: variational autoencoder with concepts embedded, Pattern Recogn., 140 (2023), 109530. https://doi.org/10.1016/j.patcog.2023.109530 doi: 10.1016/j.patcog.2023.109530
[23]	J. Li, X. Zhao, B. Du, Landslide induced seismic signal clustering with outlier removal, IEEE Geosci. Remote Sens. Lett., 20 (2023), 1–5. https://doi.org/10.1109/LGRS.2023.3327044 doi: 10.1109/LGRS.2023.3327044
[24]	H. Wang, P. Xu, J. Zhao, Improved KNN algorithms of spherical regions based on clustering and region division, Alex. Eng. J., 61 (2022), 3571–3585. https://doi.org/10.1016/j.aej.2021.09.004 doi: 10.1016/j.aej.2021.09.004
[25]	W. Xiong, J. Wang, Gene mutation of particle morphology through spherical harmonic-based principal component analysis, Powder Technol., 386 (2021), 176–192. https://doi.org/10.1016/j.powtec.2021.03.032 doi: 10.1016/j.powtec.2021.03.032
[26]	T. Huang, S. Wang, W. Zhu, An adaptive kernelized rank-order distance for clustering non-spherical data with high noise, Int. J. Mach. Learn. Cyber., 11 (2020), 1735–1747. https://doi.org/10.1007/s13042-020-01068-9 doi: 10.1007/s13042-020-01068-9
[27]	H. Xin, Y. Lu, H. Tang, R. Wang, F. Nie, Self-weighted Euler K-means clustering, IEEE Signal Proc. Lett., 30 (2023), 1127–1131. https://doi.org/10.1109/LSP.2023.3305909 doi: 10.1109/LSP.2023.3305909
[28]	T. Simmons, M. Daghooghi, I. Borazjani, Dynamics of non-spherical particles resting on a flat surface in a viscous fluid, Phys. Fluids, 35 (2023), 043334. https://doi.org/10.1063/5.0145221 doi: 10.1063/5.0145221
[29]	F. Ros, R. Riad, S. Guillaume, PDBI: a partitioning Davies-Bouldin index for clustering evaluation, Neurocomputing, 528 (2023), 178–199. https://doi.org/10.1016/j.neucom.2023.01.043 doi: 10.1016/j.neucom.2023.01.043
[30]	I. F. Ashari, E. D. Nugroho, R. Baraku, I. N. Yanda, R. Liwardana, Analysis of elbow, silhouette, Davies-Bouldin, Calinski-Harabasz, and rand-index evaluation on K-means algorithm for classifying flood-affected areas in Jakarta, J. Appl. Inform. Comput., 7 (2023), 95–103. https://doi.org/10.30871/jaic.v7i1.4947 doi: 10.30871/jaic.v7i1.4947
[31]	E. Schubert, Stop using the elbow criterion for K-means and how to choose the number of clusters instead, ACM SIGKDD Explor. Newsl., 25 (2023), 36–42. https://doi.org/10.1145/3606274.3606278 doi: 10.1145/3606274.3606278
[32]	N. T. M. Sagala, A. A. S. Gunawan, Discovering the optimal number of crime cluster using elbow, Silhouette, gap statistics, and NbClust methods, ComTech: Comput. Math. Eng. Appl., 13 (2022), 1–10. https://doi.org/10.21512/comtech.v13i1.7270 doi: 10.21512/comtech.v13i1.7270
[33]	R. G. Ribeiro, R. Rios, Temporal gap statistic: a new internal index to validate time series clustering, Chaos Soliton. Fract., 142 (2021), 110326. https://doi.org/10.1016/j.chaos.2020.110326 doi: 10.1016/j.chaos.2020.110326
[34]	S. Demir, E. K. Sahin, Application of state-of-the-art machine learning algorithms for slope stability prediction by handling outliers of the dataset, Earth Sci. Inform., 16 (2023), 2497–2509. https://doi.org/10.1007/s12145-023-01059-8 doi: 10.1007/s12145-023-01059-8
[35]	I. Horenko, E. Vecchi, J. Kardoš, A. Wächter, O. Schenk, T. J. O'Kane, et al., On cheap entropy-sparsified regression learning, Proc. Natl. Acad. Sci., 120 (2023), e2214972120. https://doi.org/10.1073/pnas.2214972120 doi: 10.1073/pnas.2214972120
[36]	K. K. Sharma, A. Seal, Outlier-robust multi-view clustering for uncertain data, Knowl.-Based Syst., 211 (2021), 106567. https://doi.org/10.1016/j.knosys.2020.106567 doi: 10.1016/j.knosys.2020.106567
[37]	E. Schubert, A. Lang, G. Feher, Accelerating spherical K-means, In: N. Reyes, R. Connor, N. Kriege, D. Kazempour, I. Bartolini, E. Schubert, et al., Similarity search and applications. SISAP 2021, Lecture Notes in Computer Science, Cham: Springer, 13058 (2021), 217–231. https://doi.org/10.1007/978-3-030-89657-7_17
[38]	D. S. Rini, I. Sriliana, P. Novianti, S. Nugroho, P. Jana, Spherical K-means method to determine earthquake clusters, J. Phys.: Conf. Ser., IOP Publishing, 1823 (2021), 012043. https://doi.org/10.1088/1742-6596/1823/1/012043
[39]	N. Ukey, Z. Yang, B. Li, G. Zhang, Y. Hu, W. Zhang, Survey on exact knn queries over high-dimensional data space, Sensors, 23 (2023), 629. https://doi.org/10.3390/s23020629 doi: 10.3390/s23020629
[40]	O. Koren, M. Koren, A. Sabban, AutoML–optimal K procedure, 2022 International Conference on Advanced Enterprise Information System (AEIS), IEEE, 2022,110–119. https://doi.org/10.1109/AEIS59450.2022.00023
[41]	P. Patel, B. Sivaiah, R. Patel, Approaches for finding optimal number of clusters using K-means and agglomerative hierarchical clustering techniques, 2022 international conference on intelligent controller and computing for smart power (ICICCSP), IEEE, 2022, 1–6. https://doi.org/10.1109/ICICCSP53532.2022.9862439
[42]	Jayashree, T. Shivaprakash, Optimal value for number of clusters in a dataset for clustering algorithm, In: M. Pandit, M. K. Gaur, P. S. Rana, A. Tiwari, Artificial intelligence and sustainable computing, Algorithms for Intelligent Systems, Singapore: Springer, 2022,631–645. https://doi.org/10.1007/978-981-19-1653-3_48
[43]	M. S. Girija, B. R. Tapas Bapu, D. Magesh Babu, A variance difference method for determining optimal number of clusters in wireless sensor networks, Res. Square, 2023. https://doi.org/10.21203/rs.3.rs-1984952/v1 doi: 10.21203/rs.3.rs-1984952/v1
[44]	A. M. El-Mandouh, L. A. Abd-Elmegid, H. A. Mahmoud, M. H. Haggag, Optimized K-means clustering model based on gap statistic, Int. J. Adv. Comput. Sci. Appl., 10 (2019), 183–188. https://doi.org/10.14569/IJACSA.2019.0100124 doi: 10.14569/IJACSA.2019.0100124
[45]	E. Purwaningsih, E. Nurelasari, Implementasi metode K-means clustering Dengan Davies Bouldin index pada analisis faktor penyebab perceraian, J. Inform. Manag., 7 (2023), 134–143. https://doi.org/10.51211/imbi.v7i2.2307 doi: 10.51211/imbi.v7i2.2307
[46]	G. Gan, M. K. P. Ng, K-means clustering with outlier removal, Pattern Recogn. Lett., 90 (2017), 8–14. https://doi.org/10.1016/j.patrec.2017.03.008 doi: 10.1016/j.patrec.2017.03.008
[47]	F. Zubedi, B. Sartono, K. A. Notodiputro, Implementation of Winsorizing and random oversampling on data containing outliers and unbalanced data with the random forest classification method, J. Nat., 22 (2022), 108–116. https://doi.org/10.24815/jn.v22i2.25499 doi: 10.24815/jn.v22i2.25499
[48]	L. Guo, X. Zhang, Q. Wang, X. Xue, Z. Liu, Y. Mu, Joint enhanced low-rank constraint and kernel rank-order distance metric for low level vision processing, Expert Syst. Appl., 201 (2022), 116976. https://doi.org/10.1016/j.eswa.2022.116976 doi: 10.1016/j.eswa.2022.116976
[49]	S. Yue, P. Wang, J. Wang, T. Huang, Extension of the gap statistics index to fuzzy clustering, Soft Comput., 17 (2023), 1833–1846. https://doi.org/10.1007/s00500-013-1023-9 doi: 10.1007/s00500-013-1023-9

This article has been cited by:

Pavel Kubáň, Peter C. Hauser, Contactless conductivity detection for analytical techniques: Developments from 2016 to 2018, 2019, 40, 01730835, 124, 10.1002/elps.201800248

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.1

Metrics

Article views(3050) PDF downloads(138) Cited by(8)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(6) / Tables(3)

AIMS Mathematics

Addressing limitations of the K-means clustering algorithm: outliers, non-spherical data, and optimal cluster selection

Related Papers:

Abstract

1. Introduction

2. Materials and Methods

2.1. Granite Samples

2.2. Wetting and Freezing-thawing Experiments

2.3. Field Exposure Experiments on Mount Pilatus

2.4. The Drop-on-Rock Method

2.5. DoR Sampling on Lichens and ATP Analysis

2.6. Ion Analysis with Capillary Electrophoresis

2.7. Scanning Electron Microscopy (SEM)

3. Results and Discussion

3.1. Reproducibility of the Drop-on-Rock Method

3.2. Ion Availability from Freshly Broken and Water-treated Granite Surfaces

3.3. Ion Availability from Frozen and Thawed Granite Surfaces

3.4. Ion Availability from Rocks Exposed to an Alpine Environment

3.5. ATP and Cation Availability from Lichen Surfaces

4. Conclusions

Acknowledgments

Conflict of Interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Mathematics

Addressing limitations of the K-means clustering algorithm: outliers, non-spherical data, and optimal cluster selection

Related Papers:

Abstract

1. Introduction

2. Materials and Methods

2.1. Granite Samples

2.2. Wetting and Freezing-thawing Experiments

2.3. Field Exposure Experiments on Mount Pilatus

2.4. The Drop-on-Rock Method

2.5. DoR Sampling on Lichens and ATP Analysis

2.6. Ion Analysis with Capillary Electrophoresis

2.7. Scanning Electron Microscopy (SEM)

3. Results and Discussion

3.1. Reproducibility of the Drop-on-Rock Method

3.2. Ion Availability from Freshly Broken and Water-treated Granite Surfaces

3.3. Ion Availability from Frozen and Thawed Granite Surfaces

3.4. Ion Availability from Rocks Exposed to an Alpine Environment

3.5. ATP and Cation Availability from Lichen Surfaces

4. Conclusions

Acknowledgments

Conflict of Interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog