Intelligent breast cancer diagnostic system empowered by deep extreme gradient descent optimization

Muhammad Bilal Shoaib Khan; Atta-ur-Rahman; Muhammad Saqib Nawaz; Rashad Ahmed; Muhammad Adnan Khan; Amir Mosavi; Muhammad Bilal Shoaib Khan; Atta-ur-Rahman; Muhammad Saqib Nawaz; Rashad Ahmed; Muhammad Adnan Khan; Amir Mosavi

doi:10.3934/mbe.2022373

Mathematical Biosciences and Engineering

2022, Volume 19, Issue 8: 7978-8002. doi: 10.3934/mbe.2022373

Previous Article Next Article

Research article Special Issues

Intelligent breast cancer diagnostic system empowered by deep extreme gradient descent optimization

1.
Department of Information Technology, Akhuwat College University, Lahore 54000, Pakistan
2.
Department of Computer Science, College of Computer Science and Information Technology (CCSIT), Imam Abdulrahman Bin Faisal University (IAU), P.O. Box 1982, Dammam 31441, Saudi Arabia
3.
Department of Computer Science & IT, Minhaj University Lahore, Lahore 54000, Pakistan
4.
ICS Department, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
5.
Department of Software, Gachon University, Seongnam 13120, Korea
6.
John von Neumann Faculty of Informatics, Obuda University, Budapest, Hungary
7.
Institute of Information Engineering, Automation and Mathematics, Slovak University of Technology in Bratislava, Bratislava, Slovakia
8.
Institute of Information Society, University of Public Service, 1083 Budapest, Hungary

Academic Editor: Soheila Borhani

Received: 02 April 2022 Revised: 25 April 2022 Accepted: 09 May 2022 Published: 30 May 2022

Cancer is a manifestation of disorders caused by the changes in the body's cells that go far beyond healthy development as well as stabilization. Breast cancer is a common disease. According to the stats given by the World Health Organization (WHO), 7.8 million women are diagnosed with breast cancer. Breast cancer is the name of the malignant tumor which is normally developed by the cells in the breast. Machine learning (ML) approaches, on the other hand, provide a variety of probabilistic and statistical ways for intelligent systems to learn from prior experiences to recognize patterns in a dataset that can be used, in the future, for decision making. This endeavor aims to build a deep learning-based model for the prediction of breast cancer with a better accuracy. A novel deep extreme gradient descent optimization (DEGDO) has been developed for the breast cancer detection. The proposed model consists of two stages of training and validation. The training phase, in turn, consists of three major layers data acquisition layer, preprocessing layer, and application layer. The data acquisition layer takes the data and passes it to preprocessing layer. In the preprocessing layer, noise and missing values are converted to the normalized which is then fed to the application layer. In application layer, the model is trained with a deep extreme gradient descent optimization technique. The trained model is stored on the server. In the validation phase, it is imported to process the actual data to diagnose. This study has used Wisconsin Breast Cancer Diagnostic dataset to train and test the model. The results obtained by the proposed model outperform many other approaches by attaining 98.73 % accuracy, 99.60% specificity, 99.43% sensitivity, and 99.48% precision.

Keywords:

Citation: Muhammad Bilal Shoaib Khan, Atta-ur-Rahman, Muhammad Saqib Nawaz, Rashad Ahmed, Muhammad Adnan Khan, Amir Mosavi. Intelligent breast cancer diagnostic system empowered by deep extreme gradient descent optimization[J]. Mathematical Biosciences and Engineering, 2022, 19(8): 7978-8002. doi: 10.3934/mbe.2022373

Related Papers:

[1]	Jian-xue Tian, Jue Zhang . Breast cancer diagnosis using feature extraction and boosted C5.0 decision tree algorithm with penalty factor. Mathematical Biosciences and Engineering, 2022, 19(3): 2193-2205. doi: 10.3934/mbe.2022102
[2]	Chunmei He, Hongyu Kang, Tong Yao, Xiaorui Li . An effective classifier based on convolutional neural network and regularized extreme learning machine. Mathematical Biosciences and Engineering, 2019, 16(6): 8309-8321. doi: 10.3934/mbe.2019420
[3]	Jiajia Jiao, Xiao Xiao, Zhiyu Li . dm-GAN: Distributed multi-latent code inversion enhanced GAN for fast and accurate breast X-ray image automatic generation. Mathematical Biosciences and Engineering, 2023, 20(11): 19485-19503. doi: 10.3934/mbe.2023863
[4]	Bo An . Construction and application of Chinese breast cancer knowledge graph based on multi-source heterogeneous data. Mathematical Biosciences and Engineering, 2023, 20(4): 6776-6799. doi: 10.3934/mbe.2023292
[5]	Juan Zhou, Xiong Li, Yuanting Ma, Zejiu Wu, Ziruo Xie, Yuqi Zhang, Yiming Wei . Optimal modeling of anti-breast cancer candidate drugs screening based on multi-model ensemble learning with imbalanced data. Mathematical Biosciences and Engineering, 2023, 20(3): 5117-5134. doi: 10.3934/mbe.2023237
[6]	Vinh Huy Chau . Powerlifting score prediction using a machine learning method. Mathematical Biosciences and Engineering, 2021, 18(2): 1040-1050. doi: 10.3934/mbe.2021056
[7]	Sushovan Chaudhury, Kartik Sau, Muhammad Attique Khan, Mohammad Shabaz . Deep transfer learning for IDC breast cancer detection using fast AI technique and Sqeezenet architecture. Mathematical Biosciences and Engineering, 2023, 20(6): 10404-10427. doi: 10.3934/mbe.2023457
[8]	Qun Xia, Yangmei Cheng, Jinhua Hu, Juxia Huang, Yi Yu, Hongjuan Xie, Jun Wang . Differential diagnosis of breast cancer assisted by S-Detect artificial intelligence system. Mathematical Biosciences and Engineering, 2021, 18(4): 3680-3689. doi: 10.3934/mbe.2021184
[9]	Anastasia-Maria Leventi-Peetz, Kai Weber . Probabilistic machine learning for breast cancer classification. Mathematical Biosciences and Engineering, 2023, 20(1): 624-655. doi: 10.3934/mbe.2023029
[10]	Chelsea Harris, Uchenna Okorie, Sokratis Makrogiannis . Spatially localized sparse approximations of deep features for breast mass characterization. Mathematical Biosciences and Engineering, 2023, 20(9): 15859-15882. doi: 10.3934/mbe.2023706

Abstract

1. Introduction

Human mitochondrial DNA (mtDNA) has proven to be a useful tool for a variety of anthropological investigations such as forensics genetics, human evolutionary history, migration patterns, and population studies ^[1,2,3,4,5]. Sequence diversity within the mitochondrial D-loop hypervariable regions (HVR1 and HVR2) has been applied for this purpose since the level of polymorphism in these regions is high enough to permit its use as an important tool in population diversity studies ^[6,7]. However, most of these studies are based on an analysis of a controlled cohort of individuals which are randomly selected to be representative of the population of the geographical region of interest ^[8,9]. In this study, we present an alternate approach for the analysis of population diversity by targeting human mtDNA directly from environmental waters impacted by human contamination.

DNA is naturally shed into the environment by virtually all animal species through feces, urine, exudates, or tissue residues ^[10,11]. There are numerous sources of human mtDNA in environmental waters. These include fecal waste from combined sewer overflows (CSO), sanitary sewer overflows, household sewage treatment systems, and agriculture/urban runoff ^[12,13]. Human fecal waste has been shown to have large amounts of exfoliated epithelial cells, each cell harboring thousands of mitochondrial copies making mtDNA an adequate molecular target in environmental studies. Recently, several studies have taken advantage of human-specific mtDNA signature sequences to implicate human feces as the primary source of contamination in fecally-contaminated effluents ^[14,15,16]. Consequently, human mtDNA sequences obtained from environmental waters are reliable, quantitative and real-time indicators of diversity of the contributing populations.

With the exception of one study, the aforementioned studies have focused on the detection of mtDNA using qPCR assays to detect fecal pollution sources. Recently, Kapoor et al. ^[13] demonstrated the use of mtDNA sequence analysis to both determine the importance of specific human fecal pollution sources in an urban watershed (Cincinnati, OH), as well as the relative abundance of population haplogroups associated with the contributing populations. We hypothesize that human mtDNA sequences in sewage are reliable, quantitative, and real-time indicators of population diversity in a community. To describe, characterize, and track human population diversity in a watershed region, we used high-throughput DNA sequencing technology to profile the HVR2 sequences in water samples taken from a tropical watershed (Río Grande de Arecibo (RGA), Puerto Rico) impacted by human sewage. Like previous controlled population studies ^[6,9], the single-nucleotide polymorphisms (SNPs) present in HVR2 was used to differentiate populations on the basis of their frequencies of occurrence. Furthermore, we extracted haplotypes and assigned mitochondrial haplogroups to identify the mtDNA biological ancestry of the populations impacting the watershed. We demonstrate the potential of these data for surveying the distribution of population diversity in this region and their intersection with orthogonal data like U.S. Census data. These data establish a regional-scale, baseline population profile, which represents a unique metagenomics tool for studying population diversity, regional migration, and other anthropological investigations.

2. Materials and Methods

2.1. Study area and sampling sites

The Río Grande de Arecibo (RGA) watershed is located along the western-central part of Puerto Rico and has a catchment area of approximately 769 km², with water flowing northward from the central mountain range into a coastal valley before discharging into the Atlantic Ocean. Multiple point sources, including leaking septic and sewer systems and discharge from wastewater treatment plants (WWTPs) contribute to human fecal pollution in the watershed, in addition to several nonpoint sources associated with recreational activities. Three secondary sewage treatment plants discharge disinfected secondary effluents into the watershed: two drain into Río Cidra and Río Caunillas, tributaries of the RGA, while the third drains directly into the RGA. The water quality of the RGA watershed is a major concern as it is an important drinking water reservoir and some sections are used in recreational activities. Thus, fecal contamination of the RGA is a significant public health concern and has a negative economic impact. Most of the population in the RGA watershed is located in the coastal alluvial plain near the municipality of Arecibo ^[17]. The upper watershed is mostly forested, undeveloped land.

The sampling sites (Figure 1) were identified and assessed for the presence of human fecal contamination through PCR-based detection of human fecal markers as described in a previous study ^[18]. These sites had a high human density based on previously recorded fecal pollution levels and potential impact from human fecal pollution via sewage overflow and watershed runoff ^[18]. Three sites (4, 7, and 10) were located downstream of a wastewater treatment plant (WWTP) for the municipalities of Adjuntas, Utuado, and Jayuya, respectively. Sites 6 and 7 represent sites before and after a WWTP. Site 6 is located approximately 1.62 km upstream from site 7, and site 7 is located 120 m downstream from the sewage treatment plant. Site 8 is located at the mouth of the watershed right before the RGA drains into the Atlantic Ocean and close to the town center of Arecibo.

Figure 1. Location of sampling sites in Puerto Rico. Sites 4, 6, 7, 8 and 10 were used for sampling based on high levels of human fecal contamination. Wastewater treatment plants (WWTPs) are shown as black triangles and the major urban areas are highlighted by red stars.

DownLoad: Full-Size Img PowerPoint

2.2. Sample collection and DNA extraction

Ten samples (Table 1) were chosen from the water samples collected within the RGA watershed sites. The water samples collected within the RGA watershed represented different degrees of human contamination. Water sample collection and DNA extraction was performed as described earlier ^[18,19]. Briefly, all samples were collected using sterile bottles and transported on ice to the laboratory at the University of Puerto Rico—Río Piedras Campus where the samples (100 mL) were filtered through polycarbonate membranes (0.4-µm pore size, 47-mm diameter; GE Water and Process Technologies, Trevose, PA) and stored at −80 ℃ until DNA extraction. The membranes were shipped overnight on dry ice to the EPA laboratory (Cincinnati, OH) for DNA extraction. Total DNA was extracted from filters samples using the PowerSoil DNA isolation kit, following the manufacturer's instructions (Mo Bio Laboratories, Inc.). DNA extracts were stored at −20 ℃ until further processing.

Table 1. Description of samples collected in this study.

Sample	Site	Sampling Date	Location	Presumed human contamination source
1	7	6/10/2010	Downstream from Utuado WWTP	Sewage
2	7	10/28/2010	Downstream from Utuado WWTP	Sewage
3	7	5/27/2010	Downstream from Utuado WWTP	Sewage
4	8	11/12/2009	Mouth of Arecibo River	Urban runoff, recreation
5	8	9/23/2010	Mouth of Arecibo River	Urban runoff, recreation
6	8	10/24/2010	Mouth of Arecibo River	Urban runoff, recreation
7	4	11/23/2009	Downstream from Adjuntas WWTP, Cidra River	Sewage
8	4	5/27/2010	Downstream from Adjuntas WWTP, Cidra River	Sewage
9	6	10/28/2010	Upstream from Utuado WWTP	Septic tanks
10	10	11/12/2009	Downstream from Jayuya WWTP	Sewage

| Show Table

DownLoad: CSV

2.3. High throughput sequencing

The human mitochondrial hypervariable region Ⅱ sequences were elucidated via Illumina sequencing of HVR2 libraries generated with DNA extracts and barcoded primers HVR2-F (5′-GGTCTATCACCCTATTAACCAC-3′) and HVR2-R (5′-CTGTTAAAAGTGCATACCGCC-3′) ^[13]. We generated PCR amplicon libraries for each water DNA extracts. PCR reactions were performed in 25 μL volumes using the Ex Taq kit (Takara) with 200 nM each of the forward and reverse primer and 2 μL of template DNA. Cycling conditions involved an initial 5 min denaturing step at 94 ℃, followed by 35 cycles of 45 s at 94 ℃, 60 s at 56 ℃, and 90 s at 72 ℃ and a final elongation step of 10 min at 72 ℃. Prior to multiplexed sequencing, PCR products were visualized on an agarose gel to confirm product sizes. Sequencing of the pooled library was performed on an Illumina Miseq benchtop sequencer using pair-end 250 bp kits at the Cincinnati Children's Hospital DNA Core facility. The HVRII sequence of the operator was also determined through Sanger sequencing and confirmed that it did not contribute to experimental data.

2.4. Bioinformatics analyses

All HVR2 sequences were sorted according to barcodes and grouped under their respective sampling event. The sequences were processed and cleaned using the software MOTHUR v1.25.1 ^[20]. Briefly, fastq files for forward and reverse reads were used to form contigs which were first screened for sequence length (no greater than 420 bp). To compensate for potential sequencing errors, sequences having an average quality under 20, having ambiguous bases (Ns), or being shorter than 300 bp were discarded. The quality-filtered sequences were then aligned to the revised Cambridge Reference Sequence (rCRS) ^[21] for human mitochondrial DNA (NC_012920.1| Homo sapiens mitochondrion, complete genome); and analyzed by using custom scripts to detect the SNPs present in the sequences. All SNPs with frequency greater than 5% were used for further analyses. Additionally, the sequences were exported to CLC Genomics Workbench Version 6.5 (CLC Bio, Cambridge, MA) and aligned to the rCRS, after which the Quality-based Variant Detection was called to detect insertions and deletions (indels) as well as SNPs with reference to the rCRS as described previously ^[13]. The mitochondrial genome databases, including MITOMAP ^[22], mtDB ^[23] and Phylotree ^[24] were referred to validate the occurrence of detected variants. Haplotypes were extracted and submitted to MITOMASTER version Beta 1 ^[25] to assign mitochondrial haplogroups based on variants present in HVR2.

3. Results and Discussion

3.1. Variant detection and frequency

In total, more than 100, 000 sequence reads were retrieved with a mean output exceeding 20, 000 per barcoded sample, which were then filtered and grouped according to their respective sampling events. HVR2 DNA from ten samples was sequenced and screened producing an average read length of approximately 423 bp. Of this, a 300 bp portion (i.e., from base position 50 to 350) was used for variant detection since SNPs in this region have been well documented ^[22]. A total of 19 distinct variants were detected with frequency greater than 5% of the total number of unique reads, all of which are present in MITOMAP—database of mtDNA Control Region Sequence Variants ^[22]. We observed some SNPs that were common to all samples with varying frequencies, while other SNPs were sample-specific, allowing each sample to have a unique human mtDNA signature in the form SNP allelic frequencies (Figure 2). The variation in SNP frequencies could be the result of several factors including limited sample size, population changes related to migration, changes in sampling time and storm runoff volumes during wet weather events, or a combination of them. Variants 73G and 263G were detected in all samples with high frequency ( > 90%), while variant 150T was detected in all samples except the samples belonging to sites 4 and 8. Interestingly, variant 263G has been observed for mitochondrial genomes from European populations ^[26], and is compatible with the European ancestry that originated in the island over five centuries ago. All the variants detected at site 6 were also detected for samples belonging to site 7, except for 232G which was detected at site 6 with relatively low frequency. This is expected since sites 6 and 7 are located in close proximity to each other. Site 8 had two unique variants (67T, 81T) which may be attributed to the influx of water from several different tributaries, since site 8 is located right before the RGA drains into the Atlantic Ocean at sea level and it is the most downstream of all sampling sites.

Figure 2. Heat map demonstrating the occurrence and frequency for variants detected in all samples (n = 10) in the human mitochondrial HVR2 region (position 50–350 bp relative to rCRS).
Sampling sites are denoted within brackets next to the sample number. Variants are identified based on the revised Cambridge Reference Sequence (NC_012920.1| Homo sapiens mitochondrion, complete genome). Single nucleotide polymorphisms (SNPs) are denoted by " > " (73A > G means A is replaced by G at position 73). Insertions are denoted by "." followed by the number of nucleotides inserted at that position (309.1C means insertion of one C at position 309). Deletions are denoted by "del" followed by the nucleotides deleted (248delA means deletion of A from position 248).

DownLoad: Full-Size Img PowerPoint

3.2. Haplotypes and haplogroup classification

Mitochondrial haplogroups have arisen from mutation and migration during human evolution and largely correspond to the geographic regions of their origin ^[23,24]. These mitochondrial haplogroups can be used to define ancestry based on the frequency of observation as a means of investigating population diversity ^[4,27]. For instance, there is broad correspondence between the L haplogroups and African ethnicity assignments, while the H haplogroups are most common among the Europeans. Consequently, we sought to use our human mtDNA sequences to extract haplotypes and classify them into haplogroups by comparing them to the Phylotree database ^[24]. We observed abundant diversity of haplotypes from HVR2 amplicons for all samples, which is consistent with the clear indication of human-associated pollution in the watershed ^[18]. The major haplotypes obtained from each sample are presented in Table 2.

Table 2. Major haplotypes detected in the samples.

Sample	Haplotype
1	73G, 150T, 263G, 315.1C; 73G, 150T, 189G, 263G, 315.1C; 73G, 150T, 176C, 263G, 315.1C; 73G, 150T, 263G; 73G, 150T, 176C, 189G, 263G, 315.1C; 73G, 150T, 315.1C
2	73G, 150T, 189G, 263G, 315.1C; 73G, 95G, 150T, 189G, 263G, 315.1C; 73G, 95G, 150T, 189G, 263G; 73G, 150T, 189G, 263G
3	73G, 263G, 315.1C; 73G, 150T, 263G, 315.1C; 73G, 263G, 309.1C, 310C; 73G, 263G, 310C 73G, 143A, 195delT, 248delA, 263G, 286delAA, 309.2C, 310C
4	73G, 150T, 263G, 315.1C; 73G, 150T, 189G, 263G, 315.1C; 73G, 150T, 173C, 263G, 315.1C; 73G, 150T, 176C, 263G, 315.1C; 73G, 150T, 176C, 189G, 263G, 315.1C; 73G, 150T, 263G; 73G, 150T, 189G, 263G
5	73G, 263G, 309.1C, 310C; 73G, 263G, 310C; 73G, 263G, 315.1C; 73G, 81T, 263G, 309.1C, 310C; 73G, 263G, 309.1C; 73G, 263G; 73G, 309.1C, 310C
6	73G, 263G, 315.1C; 73G, 263G, 388G; 67T, 73G, 263G, 315.1C; 73G, 176C, 263G, 315.1C; 73G, 263G
7	73G, 263G, 315.1C; 73G, 263G, 309.1C, 310C; 73G, 263G, 308A, 315.1C; 73G, 263G, 310C; 73G, 263G
8	73G, 263G, 315.1C; 73G, 263G
9	73G, 150T, 263G, 315.1C; 73G, 150T, 189G, 263G, 315.1C; 73G, 150T, 232G, 263G, 315.1C; 73G, 150T, 263G; 73G, 150T, 189G, 263G
10	73G, 150T, 263G, 315.1C; 73G, 150T, 189G, 263G, 315.1C; 73G, 150T, 176C, 263G, 315.1C; 73G, 150T, 176C, 189G, 263G, 315.1C; 73G, 150T, 263G; 73G, 150T, 189G, 263G

| Show Table

DownLoad: CSV

The mitochondrial sequences were compared and assigned to haplogroups based on the differences in HVR2 sequence mutations with respect to the rCRS. Since most accurate haplogroup prediction is based on full mtDNA sequences, sequences were assigned to the closest haplogroup for which the HVR2 sequence contained all mutations that define the haplogroup. The most salient features of the haplogroup distribution (Figure 3) in the clustered sequences were the relatively high frequencies of haplogroup H (32%). This haplogroup is very common in Europe ^[28,29] and its presence in the mtDNA sequences from our study is in agreement with the presence of European population on the island. Other dominant haplogroups were T (25%), L (24%) and B (11%). As an additional verification step, HVR2 PCR products were cloned (TOPO TA Cloning Kit for Sequencing, Invitrogen, Carlsbad, CA) and 90 colonies were randomly picked and sent for Sanger sequencing. Nucleotide sequences were assembled and edited by using Sequencher 4.7 software (Gene Codes, Ann Arbor, MI) and analyzed for haplogroup prediction using MITOMASTER. Most of the sequences belonged to the haplogroup H (40%), followed by T (20%), L (12%) and B (10%). The results obtained with Sanger sequencing corresponded well with Illumina high-throughput sequencing supporting the reproducibility of the results by alternative sequencing methods.

Figure 3. Pie chart showing the haplogroup distribution derived from the sequencing of HVR2 amplicons obtained from the water samples (n = 10) collected in the Río Grande de Arecibo Watershed in Puerto Rico.

DownLoad: Full-Size Img PowerPoint

3.3. Population diversity

To further explore the applicability of our HVR2-derived haplogroup data to local population diversity, several mitochondrial databases and studies were consulted to assign haplogroups to the general population groups found in Puerto Rico. We assigned haplogroups H, T and J to "West Eurasians"; haplogroup L to "Sub-Saharan African"; and haplogroup B to "American Indian" according to Martínez‐Cruzado et al. ^[30]. Based on the average distribution of HVR2-derived population groups, most mtDNA haplogroups were identified as of West Eurasian ancestry (57.6%), followed by those of African (23.9%) and American Indian (11%) ancestries (Figure 4). According to U.S. census data for 2010 ^[31], populations belonging to these groups live in and around the study area. Figure 5 presents the comparative analysis of the population data obtained through the two strategies— census data for population (by race) viz-a-viz the HVR2-derived population groups for three different locations in the watershed. There was a strong correlation between the federal census data and the mitochondrial haplogroups as an indicator of population composition (Pearson product-moment correlation coefficient, r = 0.9) demonstrating the suitability of human mitochondrial sequences to infer the population structure of the neighborhoods impacting the watershed.

Figure 4. Pie chart showing the average population diversity of the sampling region (Río Grande de Arecibo Watershed in Puerto Rico) determined through the HVR2 derived haplogroups (n = 10).

DownLoad: Full-Size Img PowerPoint

Figure 5. Pie charts demonstrating the population racial diversity in three different municipalities in the Río Grande de Arecibo Watershed obtained through (a) 2010 population census data (by race), and (b) annotation of HVRII sequences (obtained during 2009-10) into haplogroups. Site 8 is located in Arecibo; sites 6 and 7 are at Utuado; and site 4 is at Adjuntas.

DownLoad: Full-Size Img PowerPoint

While their relative abundance is different, the average census abundance patterns (White > African American > American Indian) are similar to our findings suggesting that the results correspond with the census data for population (by race). The mtDNA sequencing analysis suggests that American Indian ancestry is more prevalent than that the census data reports. Similarly, results from studies using HVR1 and other mtDNA-restriction profiles have also suggested that the presence of American Indian signals in Puerto Rico is more prevalent than previously considered ^[30], which is in agreement with our findings. The HVR2 motifs that are characteristic of the 'American Indian' haplogroups detected in this study are 73G, 143A and 263G (32) whereas Martínez‐Cruzado (30) used predetermined restriction motifs as defining markers, along with HVR1 sequences to resolve inconclusive results. The latter approach to define haplogroups is more exact since it is based on haplogroup-defining markers for the entire mtDNA and not only just HVR2. However, classification of haplogroups based on analysis of small mtDNA regions with maximal discriminative power is useful for environmental studies due to concerns related to DNA damage in the environment. This approach has proven useful in past anthropological studies involving analysis of Neanderthal-type specimen to sequence small regions (300–350 bp) of Neanderthal mtDNA ^[33,34]. Deducing population diversity from mtDNA sequences retrieved from waste streams may be more accurate than census data since these are limited to people who respond to surveys and are subject to misclassification of self-declared racial/ethnic background while waste streams are impacted by everyone connected to the public sewer system. However, it is also possible that certain groups are overrepresented using the current approach either because they disproportionately use the water resources, are not connected to the sewer pipelines (e.g., use of septic tanks) and/or live in close proximity to sampling sites than others. Signature sequences from areas impacted by leaky septic tanks and combined sewer overflows will also be reflected in these types of molecular surveys. While further studies are needed to better understand how all these different sources may impact haplogroup distribution, we suggest that the use of these methods could provide complementary information in epidemiological studies.

The overall bioinformatics strategy in this study included the following steps: (ⅰ) trim/clean sequencing reads and group them according to sites, (ⅱ) map sample specific reads to the rCRS, (ⅲ) annotate the mapped sequences to detect variants in HVR2 region, and (ⅳ) extract haplotypes from individual reads and assign haplogroups based on HVR2 sequence motifs. As reported here, next-generation sequencing technology of the mitochondrial hypervariable sequences enabled the identification of a great number of mtDNA variants and at varied allele frequencies. The methods used in our study for haplogrouping uses only HVR2 sequences which may result in coarse haplogroup assignments. Since most accurate haplogroup prediction is based on full mtDNA sequences, sequences were assigned to the closest haplogroup for which the HVR2 sequences contain all SNPs that define the haplogroup. We believe that future global sequencing efforts associated with distinct populations will provide improved phylogenetic resolution of the human mtDNA hypervariable regions as a tool for defining genetic ancestry. It has not escaped our attention that extending our technique to include other mtDNA regions and/or assembling full mitochondrial genomes through metagenomics approaches on a massively parallel scale would allow for tracking humans through public waste streams, thus ethical concerns remain an important consideration in future work.

Mitochondrial DNA analysis has been applied in several biomedical investigations of human evolution, for example, studies tracing the origin of modern humans or of certain human populations. In addition, mtDNA analysis is extremely effective in a forensic setting for the identification of criminals and victims of crimes or accidents. Although our study was confined to analysis of HVRII region of human mtDNA for samples collected in a limited number of geographic locations, it can be inferred that by targeting specific regions of mtDNA, we can estimate cancer rates, occurrence of diseases, and population diversity in watershed regions impacted by human contamination. Moreover, we envision that a similar approach could be used to study the population diversity of different animal species in natural settings, such as local versus migratory birds.

4. Conclusions

We investigated the occurrence of HVR2 allelic frequencies of human mtDNA derived from water samples taken within a fecally impacted tropical watershed. The SNPs within the human HVR2 sequences represented a unique molecular signature for evaluating anthropogenic site-specific inputs. We observed several HVR2 haplotypes linked to these samples, and used this haplogroup data to derive human population diversity within the different sites of the watershed. There was a strong correspondence between the demographic census data and the population composition based on mitochondrial haplogroups, demonstrating the suitability of human mitochondrial sequences to infer the population structure of the neighborhoods impacting the watershed. As the levels of human mtDNA is significantly high in point and non-point sources of fecal pollution, detecting mtDNA allelic signatures in environmental waters provides a unique approach for simultaneously studying fecal waste source tracking, human population diversity and other many anthropological investigations.

Acknowledgements

We would like to thank Mehdi Keddache for help in data analysis, and David Wendell for providing access to the CLC Genomics program. VK was supported by U. S. Environmental Protection Agency (EPA) via a post-doctoral appointment administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. EPA. The manuscript has been subjected to the EPA's peer review and has been approved as an EPA publication. Mention of trade names or commercial products does not constitute endorsement or recommendation by the EPA for use. The views expressed in this article are those of the authors and do not necessarily represent the views or policies of the U.S. EPA.

Conflict of interest

All authors declare no conflicts of interest in this paper.

References

[1]	E. Aličković, A. Subasi, Breast cancer diagnosis using GA feature selection and Rotation Forest, Neural Comput. Appl., 28 (2015), 753–763. https://doi.org/10.1007/s00521-015-2103-9 doi: 10.1007/s00521-015-2103-9
[2]	World Health Organization, Breast cancer 2021, 2021. Available from: https://www.who.int/news-room/fact-sheets/detail/breast-cancer.
[3]	Y. S. Sun, Z. Zhao, Z. N. Yang, F. Xu, H. J. Lu, Z. Y. Zhu, et al., Risk factors and preventions of breast cancer, Int. J. Biol. Sci., 13 (2017), 1387–1397. https://doi.org/10.7150/ijbs.21635 doi: 10.7150/ijbs.21635
[4]	J. B. Harford, Breast-cancer early detection in low-income and middle-income countries: Do what you can versus one size fits all, Lancet Oncol., 12 (2011), 306–312. https://doi.org/10.1016/s1470-2045(10)70273-4 doi: 10.1016/s1470-2045(10)70273-4
[5]	C. Lerman, M. Daly, C. Sands, A. Balshem, E. Lustbader, T. Heggan, et al., Mammography adherence and psychological distress among women at risk for breast cancer, J. Natl. Cancer Inst., 85 (1993), 1074–1080. https://doi.org/10.1093/jnci/85.13.1074 doi: 10.1093/jnci/85.13.1074
[6]	P. T. Huynh, A. M. Jarolimek, S. Daye, The false-negative mammogram, Radiographics, 18 (1998), 1137–1154. https://doi.org/10.1148/radiographics.18.5.9747612 doi: 10.1148/radiographics.18.5.9747612
[7]	M. G. Ertosun, D. L. Rubin, Probabilistic Visual Search for Masses within mammography images using Deep Learning, in 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), (2015), 1310–1315. https://doi.org/10.1109/bibm.2015.7359868
[8]	Y. Lu, J. Y. Li, Y. T. Su, A. A. Liu, A review of breast cancer detection in medical images, in 2018 IEEE Visual Communications and Image Processing, (2018), 1–4. https://doi.org/10.1109/vcip.2018.8698732
[9]	J. Ferlay, I. Soerjomataram, R. Dikshit, S. Eser, C. Mathers, M. Rebelo, et al., Cancer incidence and mortality worldwide: Sources, methods and major patterns in Globocan 2012, Int. J. Cancer, 136 (2014), E359–E386. https://doi.org/10.1002/ijc.29210 doi: 10.1002/ijc.29210
[10]	N. Mao, P. Yin, Q. Wang, M. Liu, J. Dong, X. Zhang, et al., Added value of Radiomics on mammography for breast cancer diagnosis: A feasibility study, J. Am. Coll. Radiol., 16 (2019), 485–491. https://doi.org/10.1016/j.jacr.2018.09.041 doi: 10.1016/j.jacr.2018.09.041
[11]	H. Wang, J. Feng, Q. Bu, F. Liu, M. Zhang, Y. Ren, et al., Breast mass detection in digital mammogram based on Gestalt Psychology, J. Healthc. Eng., 2018 (2018), 1–13. https://doi.org/10.1155/2018/4015613 doi: 10.1155/2018/4015613
[12]	S. McGuire, World cancer report 2014, Switzerland: World Health Organization, international agency for research on cancer, Adv. Nutrit. Int. Rev., 7 (2016), 418–419. https://doi.org/10.3945/an.116.012211 doi: 10.3945/an.116.012211
[13]	M. K. Gupta, P. Chandra, A comprehensive survey of Data Mining, Int. J. Comput. Technol., 12 (2020), 1243–1257. https://doi.org/10.1007/s41870-020-00427-7 doi: 10.1007/s41870-020-00427-7
[14]	T. Zou, T. Sugihara, Fast identification of a human skeleton-marker model for motion capture system using stochastic gradient descent method, in 2020 8th IEEE RAS/EMBS International Conference for Biomedical Robotics and Biomechatronics (BioRob)., (2020), 181–186. https://doi.org/10.1109/biorob49111.2020.9224442
[15]	A. Reisizadeh, A. Mokhtari, H. Hassani, R. Pedarsani, An exact quantized decentralized gradient descent algorithm, IEEE Trans. Signal Process., 67 (2019), 4934–4947. https://doi.org/10.1109/tsp.2019.2932876 doi: 10.1109/tsp.2019.2932876
[16]	D. Maulud, A. M. Abdulazeez, A review on linear regression comprehensive in machine learning, J. Appl. Sci. Technol. Trends, 1 (2020), 140–147. https://doi.org/10.38094/jastt1457 doi: 10.38094/jastt1457
[17]	D. R. Wilson, T. R. Martinez, The general inefficiency of batch training for gradient descent learning, Neural Networks, 16 (2003) 1429–1451. https://doi.org/10.1016/s0893-6080(03)00138-2 doi: 10.1016/s0893-6080(03)00138-2
[18]	D. Yi, S. Ji, S. Bu, An enhanced optimization scheme based on gradient descent methods for machine learning, Symmetry, 11 (2019), 942. https://doi.org/10.3390/sym11070942 doi: 10.3390/sym11070942
[19]	D. A. Zebari, D. Q. Zeebaree, A. M. Abdulazeez, H. Haron, H. N. Hamed, Improved threshold based and trainable fully automated segmentation for breast cancer boundary and pectoral muscle in mammogram images, IEEE Access, 8 (2020), 203097–203116. https://doi.org/10.1109/access.2020.3036072 doi: 10.1109/access.2020.3036072
[20]	D. Q. Zeebaree, H. Haron, A. M. Abdulazeez, D. A. Zebari, Trainable model based on new uniform LBP feature to identify the risk of the breast cancer, in 2019 International Conference on Advanced Science and Engineering (ICOASE), 2019. https://doi.org/10.1109/icoase.2019.8723827
[21]	D. Q. Zeebaree, A. M. Abdulazeez, L. M. Abdullrhman, D. A. Hasan, O. S. Kareem, The prediction process based on deep recurrent neural networks: A Review, Asian J. Comput. Inf. Syst., 10 (2021), 29–45. https://doi.org/10.9734/ajrcos/2021/v11i230259 doi: 10.9734/ajrcos/2021/v11i230259
[22]	D. Q. Zeebaree, A. M. Abdulazeez, D. A. Zebari, H. Haron, H. N. A. Hamed, Multi-level fusion in ultrasound for cancer detection based on uniform LBP features, Comput. Matern. Contin., 66 (2021), 3363–3382. https://doi.org/10.32604/cmc.2021.013314 doi: 10.32604/cmc.2021.013314
[23]	M. Muhammad, D. Zeebaree, A. M. Brifcani, J. Saeed, D. A. Zebari, A review on region of interest segmentation based on clustering techniques for breast cancer ultrasound images, J. Appl. Sci. Technol. Trends, 1 (2020), 78–91. https://doi.org/10.38094/jastt1328 doi: 10.38094/jastt1328
[24]	P. Kamsing, P. Torteeka, S. Yooyen, An enhanced learning algorithm with a particle filter-based gradient descent optimizer method, Neural Comput. Appl., 32 (2020), 12789–12800. https://doi.org/10.1007/s00521-020-04726-9 doi: 10.1007/s00521-020-04726-9
[25]	Y. Hamid, L. Journaux, J. A. Lee, M. Sugumaran, A novel method for network intrusion detection based on nonlinear SNE and SVM, J. Artif. Intell. Soft Comput. Res., 6 (2018), 265. https://doi.org/10.1504/ijaisc.2018.097280 doi: 10.1504/ijaisc.2018.097280
[26]	H. Sadeeq, A. M. Abdulazeez, Hardware implementation of firefly optimization algorithm using fpgas, in 2018 International Conference on Advanced Science and Engineering, (2018), 30–35. https://doi.org/10.1109/icoase.2018.8548822
[27]	D. P. Hapsari, I. Utoyo, S. W. Purnami, Fractional gradient descent optimizer for linear classifier support vector machine, in 2020 Third International Conference on Vocational Education and Electrical Engineering (ICVEE), (2020), 1–5.
[28]	M. S. Nawaz, B. Shoaib, M. A. Ashraf, Intelligent cardiovascular disease prediction empowered with gradient descent optimization, Heliyon, 7 (2021), 1–10. https://doi.org/10.1016/j.heliyon.2021.e06948 doi: 10.1016/j.heliyon.2021.e06948
[29]	Y. Qian, Exploration of machine algorithms based on deep learning model and feature extraction, J. Math. Biosci. Eng., 18 (2021), 7602–7618. https://doi.org/10.3934/mbe.2021376 doi: 10.3934/mbe.2021376
[30]	Z. Wang, M. Li, H. Wang, H. Jiang, Y. Yao, H. Zhang, et al., Breast cancer detection using extreme learning machine based on feature fusion with CNN deep features, IEEE Access, 7 (2019), 105146–105158. https://doi.org/10.1109/access.2019.2892795 doi: 10.1109/access.2019.2892795
[31]	UCI Machine Learning Repository, Breast Cancer Wisconsin (Diagnostic) Data Set. Available from: https://archive.ics.uci.edu/ml/datasets/Breast+ Cancer + Wisconsin + (Diagnostic).
[32]	R. V. Anji, B. Soni, R. K. Sudheer, Breast cancer detection by leveraging machine learning, ICT Express, 6 (2020), 320–324. https://doi.org/10.1016/j.icte.2020.04.009 doi: 10.1016/j.icte.2020.04.009
[33]	Z. Salod, Y. Singh, Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A Protocol, J. Public Health Res., 8 (2019). https://doi.org/10.4081/jphr.2019.1677 doi: 10.4081/jphr.2019.1677
[34]	Y. Lin, H. Luo, D. Wang, H. Guo, K. Zhu, An ensemble model based on machine learning methods and data preprocessing for short-term electric load forecasting, Energies, 10 (2017), 1186. https://doi.org/10.3390/en10081186 doi: 10.3390/en10081186
[35]	M. Amrane, S. Oukid, I. Gagaoua, T. Ensari, Breast cancer classification using machine learning, in 2018 Electric Electronics, Computer Science, Biomedical Engineerings' Meeting (EBBT), (2018), 1–4. https://doi.org/10.1109/ebbt.2018.8391453
[36]	R. Sumbaly, N. Vishnusri, S. Jeyalatha, Diagnosis of breast cancer using decision tree data mining technique, Int. J. Comput. Appl., 98 (2014), 16–24. https://doi.org/10.5120/17219-7456 doi: 10.5120/17219-7456
[37]	B. Zheng, S. W. Yoon, S. S. Lam, Breast cancer diagnosis based on feature extraction using a hybrid of k-means and support vector machine algorithms, Expert Syst. Appl., 41 (2014), 1476–1482. https://doi.org/10.1016/j.eswa.2013.08.044 doi: 10.1016/j.eswa.2013.08.044
[38]	T. Araújo, G. Aresta, E. Castro, J. Rouco, P. Aguiar, C. Eloy, et al., Classification of breast cancer histology images using convolutional neural networks, Plos One, 12 (2017), e0177544. https://doi.org/10.1371/journal.pone.0177544 doi: 10.1371/journal.pone.0177544
[39]	S. P. Rajamohana, A. Dharani, P. Anushree, B. Santhiya, K. Umamaheswari, Machine learning techniques for healthcare applications: early autism detection using ensemble approach and breast cancer prediction using SMO and IBK, in Cognitive Social Mining Applications in Data Analytics and Forensics, (2019), 236–251. https://doi.org/10.4018/978-1-5225-7522-1.ch012
[40]	L. G. Ahmad, Using three machine learning techniques for predicting breast cancer recurrence, J. Health Med. Inf., 4 (2013), 10–15. https://doi.org/10.4172/2157-7420.1000124 doi: 10.4172/2157-7420.1000124
[41]	B. Padmapriya, T. Velmurugan, Classification algorithm based analysis of breast cancer data, Int. J. Data Min. Tech. Appl., 5 (2016), 43–49. https://doi.org/10.20894/ijdmta.102.005.001.010 doi: 10.20894/ijdmta.102.005.001.010
[42]	S. Bharati, M. A. Rahman, P. Podder, Breast cancer prediction applying different classification algorithm with comparative analysis using Weka, in 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (ICEEiCT), (2018), 581–584. https://doi.org/10.1109/ceeict.2018.8628084
[43]	K. Williams, P. A. Idowu, J. A. Balogun, A. I. Oluwaranti, Breast cancer risk prediction using data mining classification techniques, Trans. Networks Commun., 3 (2015), 17–23. https://doi.org/10.14738/tnc.32.662 doi: 10.14738/tnc.32.662
[44]	P. Mekha, N. Teeyasuksaet, Deep learning algorithms for predicting breast cancer based on tumor cells, in 2019 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT-NCON), 2019. https://doi.org/10.1109/ecti-ncon.2019.8692297
[45]	C. Shah, A. G. Jivani, Comparison of data mining classification algorithms for breast cancer prediction, in 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), 2013. https://doi.org/10.1109/icccnt.2013.6726477
[46]	A. A. Bataineh, A comparative analysis of nonlinear machine learning algorithms for breast cancer detection, Int. J. Mach. Learn. Comput., 9 (2019), 248–254. https://doi.org/10.18178/ijmlc.2019.9.3.794 doi: 10.18178/ijmlc.2019.9.3.794
[47]	M. S. M. Prince, A. Hasan, F. M. Shah, An efficient ensemble method for cancer detection, in 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), 2019. https://doi.org/10.1109/icasert.2019.8934817
[48]	S. Aruna, A novel SVM based CSSFFS feature selection algorithm for Detecting Breast Cancer, Int. J. Comput., 31 (2011), 14–20. https://doi.org/10.5120/3844-5346 doi: 10.5120/3844-5346
[49]	G. Carneiro, J. Nascimento, A. P. Bradley, Automated analysis of unregistered Multi-View Mammograms with deep learning, IEEE Trans. Med. Imaging, 36 (2017), 2355–2365. https://doi.org/10.1109/tmi.2017.2751523 doi: 10.1109/tmi.2017.2751523
[50]	Z. Sha, L. Hu, B. D. Rouyendegh, Deep learning and optimization algorithms for Automatic Breast Cancer Detection, Int. J. Imaging Syst. Technol., 30 (2020), 495–506. https://doi.org/10.1002/ima.22400 doi: 10.1002/ima.22400
[51]	M. Mahmoud, Breast cancer classification in histopathological images using convolutional neural network, Int. J. Comput. Sci. Appl., 9 (2018), 12–15. https://doi.org/10.14569/ijacsa.2018.090310 doi: 10.14569/ijacsa.2018.090310
[52]	Z. Jiao, X. Gao, Y. Wang, J. Li, A deep feature based framework for Breast Masses classification, Neurocomputing, 197 (2016), 221–231. https://doi.org/10.1016/j.neucom.2016.02.060 doi: 10.1016/j.neucom.2016.02.060
[53]	M. H. Yap, G. Pons, J. Marti, S. Ganau, M. Sentis, R. Zwiggelaar, et al., Automated breast ultrasound lesions detection using convolutional neural networks, IEEE. J. Biomed. Health Inf., 22 (2018), 1218–1226. https://doi.org/10.1109/jbhi.2017.2731873 doi: 10.1109/jbhi.2017.2731873
[54]	N. Wahab, A. Khan, Y. S. Lee, Transfer learning based deep CNN for segmentation and detection of mitoses in breast cancer histopathological images, Microscopy, 68 (2019), 216–233. https://doi.org/10.1093/jmicro/dfz002 doi: 10.1093/jmicro/dfz002
[55]	Z. Wang, G. Yu, Y. Kang, Y. Zhao, Q. Qu, Breast tumor detection in digital mammography based on Extreme Learning Machine, Neurocomputing, 128 (2014), 175–184. https://doi.org/10.1016/j.neucom.2013.05.053 doi: 10.1016/j.neucom.2013.05.053
[56]	Y. Qiu, Y. Wang, S. Yan, M. Tan, S. Cheng, H. Liu, et al., An initial investigation on developing a new method to predict short-term breast cancer risk based on Deep Learning Technology, Comput. Aided. Des., 2016. https://doi.org/10.1117/12.2216275 doi: 10.1117/12.2216275
[57]	X. W. Chen, X. Lin, Big data deep learning: Challenges and perspectives, IEEE Access, 2 (2014), 514–525. https://doi.org/10.1109/access.2014.2325029 doi: 10.1109/access.2014.2325029
[58]	J. Arevalo, F. A. González, R. R. Pollán, J. L. Oliveira, M. A. G. Lopez, Representation learning for mammography mass lesion classification with convolutional neural networks, Comput. Methods Programs Biomed., 127 (2016), 248–257. https://doi.org/10.1016/j.cmpb.2015.12.014 doi: 10.1016/j.cmpb.2015.12.014
[59]	Y. Kumar, A. Aggarwal, S. Tiwari, K. Singh, An efficient and robust approach for biomedical image retrieval using zernike moments, Biomed. Signal Process. Control, 39 (2018), 459–473. https://doi.org/10.1016/j.bspc.2017.08.018 doi: 10.1016/j.bspc.2017.08.018
[60]	K. Kalaiarasi, R. Soundaria, N. Kausar, P. Agarwal, H. Aydi, H. Alsamir, Optimization of the average monthly cost of an EOQ inventory model for deteriorating items in machine learning using Python, Therm. Sci., 25 (2021), 347–358. https://doi.org/10.2298/tsci21s2347k doi: 10.2298/tsci21s2347k
[61]	M. Franulović, K. Marković, A. Trajkovski, Calibration of material models for the human cervical spine ligament behaviour using a genetic algorithm, Facta Univ., Series: Mechan. Eng., 19 (2021) 751. https://doi.org/10.22190/fume201029023f doi: 10.22190/fume201029023f
[62]	M. Fayaz, D. H. Kim, A prediction methodology of energy consumption based on Deep Extreme Learning Machine and comparative analysis in residential buildings, Electronics, 7 (2018), 222. https://doi.org/10.3390/electronics7100222 doi: 10.3390/electronics7100222
[63]	G. B. Huang, D. H. Wang, Y. Lan, Extreme learning machines: A survey, Int. J. Mach. Learn. Cybern., 2 (2011), 107–122. https://doi.org/10.1007/s13042-011-0019-y doi: 10.1007/s13042-011-0019-y
[64]	H. Tang, S. Gao, L. Wang, X. Li, B. Li, S. Pang, A novel intelligent fault diagnosis method for rolling bearings based on Wasserstein generative adversarial network and Convolutional Neural Network under Unbalanced Dataset, Sensors, 21 (2021), 6754. https://doi.org/10.3390/s21206754 doi: 10.3390/s21206754
[65]	J. Wei, H. Liu, G. Yan, F. Sun, Multi-modal deep extreme learning machine for robotic grasping recognition, Proceed. Adapt., Learn. Optim., (2016), 223–233. https://doi.org/10.1007/978-3-319-28373-9_19 doi: 10.1007/978-3-319-28373-9_19
[66]	N. S. Naz, M. A. Khan, S. Abbas, A. Ather, S. Saqib, Intelligent routing between capsules empowered with deep extreme machine learning technique, SN Appl. Sci., 2 (2019), 1–14. https://doi.org/10.1007/s42452-019-1873-6 doi: 10.1007/s42452-019-1873-6
[67]	J. Cai, J. Luo, S. Wang, S. Yang, Feature selection in Machine Learning: A new perspective, Neurocomputing, 300 (2018), 70–79. https://doi.org/10.1016/j.neucom.2017.11.077 doi: 10.1016/j.neucom.2017.11.077
[68]	L. M. Abualigah, A. T. Khader, E. S. Hanandeh, A new feature selection method to improve the document clustering using particle swarm optimization algorithm, J. Comput. Sci., 25 (2018), 456–466. https://doi.org/10.1016/j.jocs.2017.07.018 doi: 10.1016/j.jocs.2017.07.018
[69]	P. A. Flach, ROC analysis, encyclopedia of machine learning and data mining, Encycl. Mach. Learn. Data Min., (2016), 1–8. https://doi.org/10.1007/978-1-4899-7502-7_739-1 doi: 10.1007/978-1-4899-7502-7_739-1
[70]	Q. Wuniri, W. Huangfu, Y. Liu, X. Lin, L. Liu, Z. Yu, A generic-driven wrapper embedded with feature-type-aware hybrid bayesian classifier for breast cancer classification, IEEE Access, 7 (2019), 119931–119942. https://doi.org/10.1109/access.2019.2932505 doi: 10.1109/access.2019.2932505
[71]	J. Zheng, D. Lin, Z. Gao, S. Wang, M. He, J. Fan, Deep Learning assisted efficient ADABOOST algorithm for breast cancer detection and early diagnosis, IEEE Access, 8 (2020), 96946–96954. https://doi.org/10.1109/access.2020.2993536 doi: 10.1109/access.2020.2993536
[72]	X. Zhang, D. He, Y. Zheng, H. Huo, S. Li, R. Chai, et al., Deep learning based analysis of breast cancer using advanced ensemble classifier and linear discriminant analysis, IEEE Access, 8 (2020), 120208–120217. https://doi.org/10.1109/access.2020.3005228 doi: 10.1109/access.2020.3005228
[73]	Y. Yari, T. V. Nguyen, H. T. Nguyen, Deep learning applied for histological diagnosis of breast cancer, IEEE Access, 8 (2020), 162432–162448. https://doi.org/10.1109/access.2020.3021557 doi: 10.1109/access.2020.3021557
[74]	A. H. Osman, H. M. Aljahdali, An effective of ensemble boosting learning method for breast cancer virtual screening using neural network model, IEEE Access, 8 (2020), 39165–39174. https://doi.org/10.1109/access.2020.2976149 doi: 10.1109/access.2020.2976149
[75]	Y. Li, J. Wu, Q. Wu, Classification of breast cancer histology images using multi-size and discriminative patches based on Deep Learning, IEEE Access, 7 (2019), 21400–21408. https://doi.org/10.1109/access.2019.2898044 doi: 10.1109/access.2019.2898044
[76]	D. M. Vo, N. Q. Nguyen, S. W. Lee, Classification of breast cancer histology images using incremental boosting convolution networks, Inf. Sci., 482 (2019), 123–138. https://doi.org/10.1016/j.ins.2018.12.089 doi: 10.1016/j.ins.2018.12.089
[77]	S. Y. Siddiqui, M. A. Khan, S. Abbas, F. Khan, Smart occupancy detection for road traffic parking using deep extreme learning machine, J. K.S.U. Comput. Inf. Sci., 34 (2022), 727–733. https://doi.org/10.1016/j.jksuci.2020.01.016 doi: 10.1016/j.jksuci.2020.01.016
[78]	M. A. Khan, S. Abbas, K. M. Khan, M. A. A. Ghamdi, A. Rehman, Intelligent forecasting model of covid-19 novel coronavirus outbreak empowered with deep extreme learning machine, Comput. Matern. Contin., 64 (2020), 1329–1342. https://doi.org/10.32604/cmc.2020.011155 doi: 10.32604/cmc.2020.011155
[79]	S. Abbas, M. A. Khan, L. E. F. Morales, A. Rehman, Y. Saeed, Modelling, simulation and optimization of power plant energy sustainability for IoT enabled smart cities empowered with deep extreme learning machine, IEEE Access, 8 (2020), 39982–39997. https://doi.org/10.1109/ACCESS.2020.2976452 doi: 10.1109/ACCESS.2020.2976452
[80]	A. Rehman, A. Athar, M. A. Khan, S. Abbas, A. Fatima, M. Zareei, et al., Modelling, simulation, and optimization of diabetes type ii prediction using deep extreme learning machine, J. Ambient Intell. Smart Environ., 12 (2020), 125–138. https://doi.org/10.3233/AIS-200554 doi: 10.3233/AIS-200554
[81]	A. Haider, M. A. Khan, A. Rehman, H. S. Kim, A real-time sequential deep extreme learning machine cybersecurity intrusion detection system, Comput. Matern. Contin., 66 (2021), 1785–1798. https://doi.org/10.32604/cmc.2020.013910 doi: 10.32604/cmc.2020.013910
[82]	M. A. Khan, A. Rehman, K. M. Khan, M. A. A. Ghamdi, S. H. Almotiri, Enhance intrusion detection in computer networks based on deep extreme learning machine, Comput. Matern. Contin., 66 (2021), 467–480. https://doi.org/10.32604/cmc.2020.013121 doi: 10.32604/cmc.2020.013121
[83]	U. Ahmed, G. F. Issa, M. A. Khan, S. Aftab, M. F. Khan, R. A. T. Said, et al., Prediction of diabetes empowered with fused machine learning, IEEE Access, 10 (2022), 8529–8538. https://doi.org/10.1109/ACCESS.2022.3142097 doi: 10.1109/ACCESS.2022.3142097
[84]	S. Y. Siddiqui, A. Haider, T. M. Ghazal, M. A. Khan, I. Naseer, S. Abbas, et al., IoMT cloud-based intelligent prediction of breast cancer stages empowered with deep learning, IEEE Access, 9 (2021), 146478–146491. https://doi.org/10.1109/ACCESS.2021.3123472 doi: 10.1109/ACCESS.2021.3123472
[85]	M. Ahmad, M. Alfayad, S. Aftab, M. A. Khan, A. Fatima, B. Shoaib, et.al., Data and machine learning fusion architecture for cardiovascular disease prediction, Comput. Matern. Contin., 69 (2021), 2717–2731. https://doi.org/10.32604/cmc.2021.019013 doi: 10.32604/cmc.2021.019013

This article has been cited by:

1.	A. B. M. Tanvir Pasha, Jessica Hinojosa, Duc Phan, Adrianne Lopez, Vikram Kapoor, Detection of human fecal pollution in environmental waters using human mitochondrial DNA and correlation with general and human-associated fecal genetic markers, 2020, 18, 1477-8920, 8, 10.2166/wh.2019.197
2.	Vikram Kapoor, Indrani Gupta, A. B. M. Tanvir Pasha, Duc Phan, Real-Time Quantitative PCR Measurements of Fecal Indicator Bacteria and Human-Associated Source Tracking Markers in a Texas River following Hurricane Harvey, 2018, 5, 2328-8930, 322, 10.1021/acs.estlett.8b00237
3.	Jiayin Liang, Xiangqun Zheng, Tianyang Ning, Jiarui Wang, Xiaocheng Wei, Lu Tan, Feng Shen, Revealing the Viable Microbial Community of Biofilm in a Sewage Treatment System Using Propidium Monoazide Combined with Real-Time PCR and Metagenomics, 2024, 12, 2076-2607, 1508, 10.3390/microorganisms12081508

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Mathematical Biosciences and Engineering

4.4

Metrics

Article views(3662) PDF downloads(172) Cited by(10)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(4) / Tables(4)

Mathematical Biosciences and Engineering

Intelligent breast cancer diagnostic system empowered by deep extreme gradient descent optimization

Related Papers:

Abstract

1. Introduction

2. Materials and Methods

2.1. Study area and sampling sites

2.2. Sample collection and DNA extraction

2.3. High throughput sequencing

2.4. Bioinformatics analyses

3. Results and Discussion

3.1. Variant detection and frequency

3.2. Haplotypes and haplogroup classification

3.3. Population diversity

4. Conclusions

Acknowledgements

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Mathematical Biosciences and Engineering

Intelligent breast cancer diagnostic system empowered by deep extreme gradient descent optimization

Related Papers:

Abstract

1. Introduction

2. Materials and Methods

2.1. Study area and sampling sites

2.2. Sample collection and DNA extraction

2.3. High throughput sequencing

2.4. Bioinformatics analyses

3. Results and Discussion

3.1. Variant detection and frequency

3.2. Haplotypes and haplogroup classification

3.3. Population diversity

4. Conclusions

Acknowledgements

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog