Research article Special Issues

A new approach to generating virtual samples to enhance classification accuracy with small data—a case of bladder cancer


  • In the medical field, researchers are often unable to obtain the sufficient samples in a short period of time necessary to build a stable data-driven forecasting model used to classify a new disease. To address the problem of small data learning, many studies have demonstrated that generating virtual samples intended to augment the amount of training data is an effective approach, as it helps to improve forecasting models with small datasets. One of the most popular methods used in these studies is the mega-trend-diffusion (MTD) technique, which is widely used in various fields. The effectiveness of the MTD technique depends on the degree of data diffusion. However, data diffusion is seriously affected by extreme values. In addition, the MTD method only considers data fitted using a unimodal triangular membership function. However, in fact, data may come from multiple distributions in the real world. Therefore, considering the fact that data comes from multi-distributions, in this paper, a distance-based mega-trend-diffusion (DB-MTD) technique is proposed to appropriately estimate the degree of data diffusion with less impacts from extreme values. In the proposed method, it is assumed that the data is fitted by the triangular and trapezoidal membership functions to generate virtual samples. In addition, a possibility evaluation mechanism is proposed to measure the applicability of the virtual samples. In our experiment, two bladder cancer datasets are used to verify the effectiveness of the proposed DB-MTD method. The experimental results demonstrated that the proposed method outperforms other VSG techniques in classification and regression items for small bladder cancer datasets.

    Citation: Liang-Sian Lin, Susan C Hu, Yao-San Lin, Der-Chiang Li, Liang-Ren Siao. A new approach to generating virtual samples to enhance classification accuracy with small data—a case of bladder cancer[J]. Mathematical Biosciences and Engineering, 2022, 19(6): 6204-6233. doi: 10.3934/mbe.2022290

    Related Papers:

    [1] Roger Chang, Kemakorn Ithisuphalap, Ilona Kretzschmar . Impact of particle shape on electron transport and lifetime in zinc oxide nanorod-based dye-sensitized solar cells. AIMS Materials Science, 2016, 3(1): 51-65. doi: 10.3934/matersci.2016.1.51
    [2] Harold O. Lee III, Sam-Shajing Sun . Properties and mechanisms of iodine doped of P3HT and P3HT/PCBM composites. AIMS Materials Science, 2018, 5(3): 479-493. doi: 10.3934/matersci.2018.3.479
    [3] Takuto Eguchi, Shinya Kato, Naoki Kishi, Tetsuo Soga . Effect of thickness on photovoltaic properties of amorphous carbon/fullerene junction. AIMS Materials Science, 2022, 9(3): 446-454. doi: 10.3934/matersci.2022026
    [4] Said Karim Shah, Jahangeer Khan, Irfan Ullah, Yaqoob Khan . Optimization of active-layer thickness, top electrode and annealing temperature for polymeric solar cells. AIMS Materials Science, 2017, 4(3): 789-799. doi: 10.3934/matersci.2017.3.789
    [5] Silvia Colodrero . Conjugated polymers as functional hole selective layers in efficient metal halide perovskite solar cells. AIMS Materials Science, 2017, 4(4): 956-969. doi: 10.3934/matersci.2017.4.956
    [6] Avner Neubauer, Shira Yochelis, Gur Mittelman, Ido Eisenberg, Yossi Paltiel . Simple down conversion nano-crystal coatings for enhancing Silicon-solar cells efficiency. AIMS Materials Science, 2016, 3(3): 1256-1265. doi: 10.3934/matersci.2016.3.1256
    [7] Noah M. Johnson, Yuriy Y. Smolin, Chris Shindler, Daniel Hagaman, Masoud Soroush, Kenneth K. S. Lau, Hai-Feng Ji . Photochromic dye-sensitized solar cells. AIMS Materials Science, 2015, 2(4): 503-509. doi: 10.3934/matersci.2015.4.503
    [8] Nirmala Kumari, Ravindra N. Singh . Nanocomposites of nitrogen-doped graphene and cobalt tungsten oxide as efficient electrode materials for application in electrochemical devices. AIMS Materials Science, 2016, 3(4): 1456-1473. doi: 10.3934/matersci.2016.4.1456
    [9] Chia-Hao Hsieh, Wei-Chi Chen, Sheng-Hsiung Yang, Yu-Chiang Chao, Hsiao-Chin Lee, Chia-Ling Chiang, Ching-Yi Lin . A simple route to linear and hyperbranched polythiophenes containing diketopyrrolopyrrole linking groups with improved conversion efficiency. AIMS Materials Science, 2017, 4(4): 878-893. doi: 10.3934/matersci.2017.4.878
    [10] Nur Jassriatul Aida binti Jamaludin, Shanmugan Subramani, Mutharasu Devarajan . Thermal and optical performance of chemical vapor deposited zinc oxide thin film as thermal interface material for high power LED. AIMS Materials Science, 2018, 5(3): 402-413. doi: 10.3934/matersci.2018.3.402
  • In the medical field, researchers are often unable to obtain the sufficient samples in a short period of time necessary to build a stable data-driven forecasting model used to classify a new disease. To address the problem of small data learning, many studies have demonstrated that generating virtual samples intended to augment the amount of training data is an effective approach, as it helps to improve forecasting models with small datasets. One of the most popular methods used in these studies is the mega-trend-diffusion (MTD) technique, which is widely used in various fields. The effectiveness of the MTD technique depends on the degree of data diffusion. However, data diffusion is seriously affected by extreme values. In addition, the MTD method only considers data fitted using a unimodal triangular membership function. However, in fact, data may come from multiple distributions in the real world. Therefore, considering the fact that data comes from multi-distributions, in this paper, a distance-based mega-trend-diffusion (DB-MTD) technique is proposed to appropriately estimate the degree of data diffusion with less impacts from extreme values. In the proposed method, it is assumed that the data is fitted by the triangular and trapezoidal membership functions to generate virtual samples. In addition, a possibility evaluation mechanism is proposed to measure the applicability of the virtual samples. In our experiment, two bladder cancer datasets are used to verify the effectiveness of the proposed DB-MTD method. The experimental results demonstrated that the proposed method outperforms other VSG techniques in classification and regression items for small bladder cancer datasets.



    Angiogenesis is a multicellular phenomenon by which new blood vessels emerge from an existing vascular system. Especially, tip cell selection plays an important role not only for lateral inhibitation but also for elongation, branching, anastomosis, vessel stabilization, lumen formation, and so on [1], [2]. These events have elucidated the roles of endothelial cell (EC) signaling with the vascular endothelial growth factor (VEGF)-receptor, angiopoitin-Tie2 and Eprin–Eph pathways [1], [3], [4]. Moreover, it has been increasingly clarified that interaction between ECs and mural cells, including vascular smooth muscle cells and pericytes, has been implicated in the maintenance of the angiogenic process for appropriate organization [3][7].

    In fact, mathematical modeling for vascular growth has attracted considerable interest in the last few years along with elucidation of the mechanisms of development and proliferative and quiescent features, not only of normal cells but also of tumor cells. As a result, an enormous number of phenomena in biochemistry can be subject to mathematical modeling, such as proteins, capillaries, and tumors. Prime examples include the work of Frieboes et al., who coupled phase fields to discrete tumor growth with discrete random walks to model angiogenesis [8][10]. Also, Lima et al. [11] proposed the ten-species model. Perfahl et al. [12] considered multiscale modeling of vascular tumor growth.

    Cellular automata can be a type of mathematical model that is useful for simulating systems and diseases at the cellular level. Features of cellular automata are simple and discrete systems in which a limited number of states are defined for one cell. Cellular automata can generate complicated behavior based on simple and local rules that determine the states of cells and neighboring cells, whereas distant cells exert no influence [13]. To date, using cellular automata, many models have been presented to elucidate cancer behavior, especially tumor growth and its effective factors [12], [14][16]. Competition for nutrients among normal and cancerous cells has been analyzed, as has the influence of the immune system response on tumor growth, as described in an earlier study [17].

    Despite the numerous earlier studies, how the spatiotemporal regulation of molecules affects morphogenetic cell movement, and what type of cell collective/cell movement is involved in angiogenesis have remained elusive, largely because of the lack of a stable methodology for visualizing and assessing EC movement during angiogenesis. To clarify the relation between individual cell movements and angiogenesis morphology, and to dissect the underlying molecular and cellular mechanisms, Arima et al. [18] first established a system in which dynamic cell behavior is visualized using time-lapse microscopy. They set out to identify patterns of cellular behavior in an angiogenesis model through computational data processing. As a result, cell movements were quite dynamic. The ECs moved while changing their mutual relative positions at the tip in elongating branches. On the other hand, Takubo et al. [19] analyzed EC behaviours in an invitro angiogenic sprouting assay using mouse aortic explants in combination with mathematical modelling. From an experimentally validated mathematical model, cohesive movements with anisotropic cell-to-cell interactions charactarized the EC motolity, which may drive branch elongation depending on a constant cell supply.

    In light of the findings described above, we suggest a multilevel model to simulate dynamic cell movement affected by VEGF. Particularly, a multi-agent model is applied to describe cell movement. Some particles with a position vector and polarity mutually interact constantly, with effects of chemotaxis deriving from VEGF. Such cell movement leads to blood vessel formation based on a phase–field equation. The Cahn–Hilliard type, a representative fourth-order partial differential equation, is adopted. As described herein, both mathematical models are coupled to simulate angiogenesis. Then an arbitrary numerical scheme is prepared.

    Thermodynamically consistent phase field models refer to a class of models satisfying the second law of thermodynamics. These models have been used for modeling numerous non-equilibrium multi-phase thermodynamical processes ranging from material science and fluid science to life sciences. For this study, we specifically examined a phase field model for time-dependent dynamic of binary material systems in which A is represented by a phase variable at φ = 1 and phase B at φ = 0. We chose the phase variable φ ∈ [0,1] with φ identified at the volume fraction of material A and 1 − φ the volume fraction of material B. The interface between the two phases, known as the transition layer between the two phases in the phase field model, is given as 0 < φ < 1. The interface is defined mathematically at φ=12.

    The Cahn-Hilliard, introduced by J. Cahn and J. Hilliard in [20] to describe processes of phase separation, includes components of a binary fluid separated and forming pure domains for each component. It can be interpreted as the H−1 gradient flow of the Cahn–Hilliard energy functional as

    F(φ)=Ω[ϵ12|φ|2+f(φ)]dx,
    where ϵ1 is a parameter expressing the strength of the conformational entropy. For immiscible binary materials, we chose the bulk energy density as a double well potential for this study
    f(φ)=ϵ2φ2(1φ)2,
    where ϵ2 represents the strength of the bulk mixing free energy.

    Such a Cahn–Hilliard equation is a typical fourth-order partial differential equation. It is very difficult to obtain a numerical solution stably. A structure-preserving scheme based on the discrete variational derivative method (DVDM) proposed by Furihata and Matsuo [21] has successfully calculated the Cahn–Hilliard equation stably. However, the discontinuous Galerkin method has also been used to trace an interface between material A and material B with high efficiency [22], [23]. Nevertheless, these numerical techniques involve quite cumbersome procedures. Often, we face difficulties in implementing these numerical schemes into our mathematical model with fourth-order partial differential equations. For the study described herein, we chose the Morley finite element method discussed in [24], [25]. The Morley element is implemented in Freefem++ [26] using numerical calculations.

    This paper is organized as follows. Section 2 introduces a governing equation representing cell movement as a multi-agent system, with vessel development based on a phase-field model, particularly the Cahn–Hilliard equation. Numerical procedures used to calculate such a coupling model between multi-agent system and phase-field equation are presented in Section 3. Throughout the numerical calculations, numerical results are described in Section 4. Finally, Section 5 concludes this paper.

    In this section, we present some basic concepts related to the suggested mathematical multicellular model for angiogenesis. For these purposes, one must describe some mathematical notations.

    One can let Ωd(d=1,2,3) be a bounded domain, where a position vector is expressed as x ∈ Ω. Cells are located at a position vector xi and more moving at a velocity vector (polarity) qi at a time 0 < tT for a positive constant T, where 0 < iNCell represents an index for distinguishing each cell. In fact, VEGF c attracts such cells to be closer. Consequently, a new blood vessel (capillary) is formed after each cell degenerates fibronectin f and passes. At the next section, a position vector xi(t), a velocity vector (polarity) qi(t), and a fibronectin concentration f(t,x) are constructed under a given VEGF c(t,x).

    As discussed above, we suggest equations of three kinds for modeling a mathematical multicellular model for angiogenesis, position vector, polarity for cells, and the fibronectin concentration. Hereinafter, it is assumed that VEFG is given. Also, arbitrary parameters cell mobility a1, repulsive force between cells and fibronectin a2 for a position vector xi and Mq, chemotaxis to VEGF concentration a3, cell alignment effect a4, reference size of cell polarity q0 for polarity qi and square of transition area width of fibronectin phase variable D, mobility of fibronectin regions M, radius of influence of cells a5, fibronectin enzyme μ for the fibronectin concentration f are prepared. Let Δt represent a time increment, with n and N respectively denoting a time step for a time t = nΔt and a maximum times step for T = NΔt. These equations are expressed as described hereinafter.

    From the discussion presented in the section above, a governing equation of a position vector xi with the index i distinguishing each cell at a time t, is defined as follows

    dxidt=ϕ(f){ˆqi+Ncellji,j=1g1(ri,j)xjxiri,j}a2f in Ω×(0,T),
    xi=x0i att=0.
    The first term on the right-hand side represents that a particle can move in a vessel for f = 0, and not outside of a vessel for f = 1, where ϕ(f) is expressed as
    ϕ(f)=a1exp(4f),
    and where ˆqi represents the unit vector of qi, expressed as
    ˆqi=qi|qi|.
    The second term on the right-hand side depicts a collision: interaction between particles. Also, ri,j, g1(ri,j) studied in [19], [27] are a distance and a restitution coefficient between xi and xj as
    ri,j=|xjxi|,
    g1(ri,j)={Frepri,jdcoredcoreifri,j<dcore0ifdcoreri,j<dneutralFattri,jdneutraldadhdneutralifdneutralri,j<dadhFattdreachri,jdreachdadhifdadhri,j<dreach0ifdreachri,j.
    The third term on the right-hand side describes that cells are reaching and bouncing on the vessel wall. Also, x0i is an initial condition of xi.

    We present a governing equation of a polarity qi with the index i at a time t, defined as

    dqidt=Mq(q20|qi|2)qi+a3c+ζi+a4dxidt inΩ×(0,T),
    qi=q0i att=0.
    The first term on the right-hand-side has an effect of adjustment by which polarity qi comes to meet q20=|qi|2, where q0 is decided arbitrarily. Chemotaxis is modeled by c and ζi represents a random number obtained under a stochastic distribution. Fictitious force is given as dxidt. Also, q0i is an initial condition of qi.

    Finally, for real-valued parameters D, M, μ, a governing equation for a fibronectin concentration is defined as

    dfdt=[·M{DΔf+w}fμ]H in Ω×(0,T),
    n·f=0 onΩ×(0,T),
    n·(DΔf+w)=0 onΩ×(0,T),
    f=f0 att=0,
    where f0 is an initial condition for f, and
    H(x,xi,t)=Ncelli=1exp(|xxi|a5),
    w(f)=2f(f1)(2f1).

    Letting t, Δt respectively stand for time and a time increment, then from discretizing the governing equation in the time direction at a time step 0 < nN for t = nΔt, one can deduce

    xn+1ixniΔt=ϕ(fn){ˆqni+Ncellji,j=1g1xnjxnirni,j}a2fn,
    qn+1iqniΔt=Mq(q20|qni|2)qni+c+ζni+a3xn+1ixniΔt,
    Hn+1=Ncelli=1exp(|xxn+1i|a5),
    fn+1fnΔt=·[M{DΔfn+1+w(fn)}fn+1μ]Hn+1.
    A random number ζni is defined as
    ζni=ζn1iαζn1i+NΔt,
    and Nd is a vector-valued function. A standard distribution with mean 0 and variance 1 is set at every element. All numerical simulations are performed by Freefem++ [26]. Especially, a fibronectin concentration is governed by the fourth-order partial differential equation. This paper using Morley element implemented in Freefem++ [26] to calculation the weak form of the Cahn–Hilliard equation.

    A domain Ω is defined as

    Ω={x=[x,y]T;0x50,25y25}.

    Before applying and assessing our suggested mathematical model, we confirm basic features of the model for particle collision and chemotaxis. For all cases, 10 particles are prepared initially. The position vector xi and polarity qi of each cell are expressed as

    xi=[25+N(0,1),N(0,1)],

    qi=[N(0,1),N(0,1)],

    where all the parameters are listed in Table 1.

    Table 1.  Parameters.
    Symbol Parameter Value
    a1 Cell mobility 10
    a2 Repulsive force between cells and fibronectin 1
    a3 Chemotaxis to VEGF concentration 1
    a4 Cell alignment effect 0.1
    a5 Radius of influence of cells 0.1
    D Square of transition area width of fibronectin phase variable 1
    M Mobility of fibronectin regions 1
    μ Fibronectin Enzyme 1,10,100
    q0 Reference size of cell polarity 10
    α Persistence of cell polarity 0.1
    dcore Diameter of the cell (repulsive area) 12
    dneutral Diameter of the cell (neutral area) 1.5 dcore
    dadh Diameter of the cell (adhesive area) 1.534 dcore
    dreach Diameter of the cell (interactable area) 1.834 dcore
    Frep Intercellular repulsive force 5
    Fatt Intercellular attraction force 1
    Δt Time increment 0.002

     | Show Table
    DownLoad: CSV

    This paper is our first trial to simulate a dynamics of angiogenesis by coupling multi-agent model and phase-field model. Therefore, there is few of previous reserches, and we set most of arbitrary parameters in temporal and spatial scales. As a result, this numerical results might be helpful to understand a dynamics of angiogenesis qualitatively, not quantitatively.

    For investigation of particle collisions, VEGF is ignored with c = 0. The interparticle distance ri,j is much less than dcore = 12 because of an initial position vector xi = [25 + N(0,1), N(0,1)]. Therefore all particles are mutually repulsive, with no effects of chemotaxis by c = 0. Figures 1 and 2 show f and H to visualize the time evolution of vessel development and cell behaviors. In the early stages of numerical calculations, effects of repulsive force are the spreading of all cells. Subsequently, such cells with no effect of chemotaxis move slowly and become distant in the radial direction.

    Figure 1.  Time evolution of f without chemotaxis for c = 0.
    Figure 2.  Time evolution of H without chemotaxis for c = 0.

    Next, chemotaxis generated by VEGF is considered numerically with c = 100x. In the early stages of numerical calculations, effects of repulsive force are spreading of all cells, as discussed in the section above, because of stronger repulsive force than chemotaxis. Also, interparticle distance ri,j increases. After the repulsive force weakens sufficiently to become weaker than chemotaxis, all particles are moving in the right direction, where VEGF c has a larger value, as portrayed in Figures 3 and 4.

    Figure 3.  Time evolution of f with chemotaxis for c = 100x.
    Figure 4.  Time evolution of H with chemotaxis for c = 100x.

    Based on our suggested mathematical model and numerical techniques constructed in the sections presented above, we are performing numerical simulations not only of lateral inhibitation but also of elongation, branching, which are basic features of angiogenesis. The initial condition of position vector xi and polarity qi of each cell is expressed as

    xi=[0.05+N(0,1),N(0,1)],

    qi=[1,0],

    where the number of cells is 20. Then we set xi = 0.05 in the case of xi < 0; also, c = 100x is used as VEGF. In fact, as a governing equation of fibronectin concentration f, the Cahn–Hilliard equation is representative of PDE models, but reduces to an ODE model by substituting M = 0. We calculate both to compare the results obtained from calculations.

    Figure 5 portrays the time evolution of fibronectin f obtained in the case of ODE model with M = 0. The model has behavior of lateral inhibition and elongation, branching, and anastomosis. By contrast, Figure 6 portrays results obtained for the PDE model. The Cahn–Hilliard equation shows fibronectin f to be a distribution from 0 to 1. The structure of the vessel is therefore visualized as clearer than that of the ODE model. Next, we present Figure 7 in which the x coordinate of all cells is shown. As Figure 7 shows, all cells are moving quickly in the early stage because of mutual repulsion. Thereafter, chemotaxis ∇c attracts them to move slowly in the positive-x direction. Figure 8 presents data for comparison, illustrating differences of the same particles between the ODE model and PDE model. Some particles are observed to move with very different behaviors because of the effects of distribution of fibronectin f attributable to the distribution of the fibronectin concentration f in the ODE model, which differs from that of the PDE model.

    Figure 5.  Time evolution of f with chemotaxis in the case of c = 100x for the ODE model.
    Figure 6.  Time evolution of f with chemotaxis for c = 100x for the PDE model.
    Figure 7.  x coordinates of all cells with time steps up to t = 20.
    Figure 8.  Particle difference between M = 0 and M = 1.

    Finally, on the assumption that tumor cells express vascular endothelial growth factor receptors and respond to autocrine and paracrine VEGF signals, this section defines VEFG as

    c=1000exp[0.001{(x50)2+y2}],

    where Figure 9 depicts the distribution of this c. An initial condition of position vector xi and polarity qi of each cell is expressed as

    xi=[0.05+N(0,1),N(0,6)],

    qi=[1,0],

    where the number of cells is 20. We set xi = 0.05 in the case of xi < 0. By the way, N(0,1) is used in section 4.1 and 4.2. In the case of N(0,1), it is seemed that effects of cross-interaction between particles themselves are strong in eraly stage. Thereofre, we set N(0,6) in section 4.3 to focus on effects of a tumor cell.

    Figure 10 shows the time evolution of f with chemotaxis at t = 62, unlike in the case of Figures 5 and 6, not all particles are moving in the x-direction. They coming in the direction of a coordinate (x,y) = (50,0) with a maximum value of c. However, apparently, some particles are going to the lower right. Therefore we are concerned that this phenomenon is attributable to cell adhesion effects. The domain used for this simulation is 50 × 50 square. The cell diameter is dadh ≈ 18 from Table 1. Therefore, the top position cell affected by VEGF is moving in the lower right. Subsequent cells are following the top position cell because of cell adhesion.

    Next, we observe vessel development depending on μ representing the strength for degenerating fibronectin and its corresponding x coordinate at each cell in the case of the PDE model. As presented in Figures 1113, a small μ as in Figure 11 with μ = 1 can suppress vessel development more than that shown in Figure 13 with μ = 100 because a cell can be moving much more in the area of fibronectin f = 0 than f = 1. Consequently, numerical simulations of Figures 1113 represent biochemically adequate numerical results.

    Figure 9.  Distribution of c.
    Figure 10.  Time evolution of f.
    Figure 11.  (a) Fibronectin f at t = 30 and (b) x coordinate at each cells up to t = 20 for μ = 1.
    Figure 12.  (a) Fibronectin f at t = 30 and (b) x coordinate at each cells up to t = 20 for μ = 10.
    Figure 13.  (a) Fibronectin f at t = 30 and (b) x coordinate at each cells up to t = 20 for μ = 100.

    We demonstrated the characteristic features of angiogenic endothelial cell (EC) behaviors. Taken together with findings reported from earlier studies using experimentation such as that reported by Arima et al. [18], we suggest a multi-agent model coupled with phase-field model for angiogenic morphogenesis. Such a model enables us to dissect cellular mechanisms regulating dynamic and complex multicellular processes systematically. The Cahn–Hilliard equation is regarded as phase-field model, and ODE and PDE equation are prepared by switching M = 0,1.

    Our first numerical result is that multi-agent based EC movements with ODE model, represented as a process driven by a simple stochastic rule in this suggested multi-agent model, can adequately describe branch elongation, which plays an important role as one angiogenic model. However, EC behaviors were insufficiently explained using cell processes alone because the ODE model was not able to consider an effect of elasticity of a vessel wall. This numerical result leads to our second result suggesting that PDE model might be necessary for degenerating fibronectin f in the whole cell. This prediction was verified biologically. Consequently, this study advances our mathematical understanding of angiogenic morphogenesis as coordinated multicellular processes.

    As described herein, a smaller area is adopted to suppress calculation cost to the greatest degree possible. Effects of cell adhesion are increased as result. It is too difficult to survey some biochemical phenomena precisely. Future work must be done for numerical calculation with a larger area than this study to investigate additional details about relations with chemotaxis ∇c and enzyme of fibronectin μ.

    In fact, we need to analyze correlation between chemotaxis and heterogeneous directionality, cell mixing, cell overtaking, branching morphogenesis. However, it is seemed that effects of chemotaxis is much stronger than that of them, and we did not observe about details of the movement of ECs from our numerical simulation.

    In the future, a multi cellular model suggested here will be quantitatively leading a life science reserch on molecular level and a cellular level to acquisitions of a macroscopic histogenesis and a vital function.



    [1] P. Gontero, A. Tizzani, G. H. Muir, E. Caldarera, M. Pavone Macaluso, The genetic alterations in the oncogenic pathway of transitional cell carcinoma of the bladder and its prognostic value, Urol. Res., 29 (2001), 377–387. https://doi.org/10.1007/s002400100216 doi: 10.1007/s002400100216
    [2] V. Tut, K. Braithwaite, B. Angus, D. Neal, J. Lunec, J. Mellon, Cyclin D1 expression in transitional cell carcinoma of the bladder: correlation with p53, waf1, pRb and Ki67, Br. J. Cancer, 84 (2001), 270–275. https://doi.org/10.1054/bjoc.2000.1557 doi: 10.1054/bjoc.2000.1557
    [3] A. Colquhoun, S. Sundar, P. Rajjayabun, T. Griffiths, R. Symonds, J. Mellon, Epidermal growth factor receptor status predicts local response to radical radiotherapy in muscle-invasive bladder cancer, Clin. Oncol., 18 (2006), 702–709. https://doi.org/10.1016/j.clon.2006.08.003 doi: 10.1016/j.clon.2006.08.003
    [4] P. Luukka, Similarity classifier in diagnosis of bladder cancer, Comput. Methods Programs Biomed., 89 (2008), 43–49. https://doi.org/10.1016/j.cmpb.2007.10.001 doi: 10.1016/j.cmpb.2007.10.001
    [5] G. Y. Chao, T. I. Tsai, T. J. Lu, H. C. Hsu, B. Y. Bao, W. Y. Wu, et al, A new approach to prediction of radiotherapy of bladder cancer cells in small dataset analysis, Expert Syst. Appl., 38 (2011), 7963–7969. https://doi.org/10.1016/j.eswa.2010.12.035 doi: 10.1016/j.eswa.2010.12.035
    [6] T. W. Liao, Diagnosis of bladder cancers with small sample size via feature selection, Expert Syst. Appl., 38 (2011), 4649–4654. https://doi.org/10.1016/j.eswa.2010.09.135 doi: 10.1016/j.eswa.2010.09.135
    [7] T. I. Tsai, Y. Zhang, Z. Zhang, G. Y. Chao, C. C. Tsai, Considering relationship of proteins for radiotherapy prognosis of bladder cancer cells in small data set, Methods Inf. Med., 57 (2018), 220–229. https://doi.org/10.3414/ME17-02-0003 doi: 10.3414/ME17-02-0003
    [8] M. D. Robinson, G. K. Smyth, Small-sample estimation of negative binomial dispersion, with applications to SAGE data, Biostatistics, 9 (2008), 321–332. https://doi.org/10.1093/biostatistics/kxm030 doi: 10.1093/biostatistics/kxm030
    [9] S. Lee, M. J. Emond, M. J. Bamshad, K. C. Barnes, M. J. Rieder, D. A. Nickerson, et al., Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies, Am. J. Hum. Genet., 91 (2012), 224–237. https://doi.org/10.1016/j.ajhg.2012.06.007 doi: 10.1016/j.ajhg.2012.06.007
    [10] Y. Zhao, N. J. Fesharaki, H. Liu, J. Luo, Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation, BMC Med. Inf. Decis. Making, 18 (2018), 1–13. https://doi.org/10.1186/s12911-018-0645-3 doi: 10.1186/s12911-017-0580-8
    [11] L. Stainier, A. Leygue, M. Ortiz, Model-free data-driven methods in mechanics: material data identification and solvers, Comput. Mech., 64 (2019), 381–393. https://doi.org/10.1007/s00466-019-01731-1 doi: 10.1007/s00466-019-01731-1
    [12] E. Ntoutsi, P. Fafalios, U. Gadiraju, V. Iosifidis, W. Nejdl, M. E. Vidal, et al., Bias in data‐driven artificial intelligence systems—An introductory survey, Wiley Interdiscip. Rev.: Data Min. Knowl. Discovery, 10 (2020), e1356. https://doi.org/10.1002/widm.1356 doi: 10.1002/widm.1356
    [13] T. Mao, L. Yu, Y. Zhang, L. Zhou, Modified Mahalanobis-Taguchi System based on proper orthogonal decomposition for high-dimensional-small-sample-size data classification, Math. Biosci. Eng., 18 (2020), 426–444. https://doi.org/10.3934/mbe.2021023 doi: 10.3934/mbe.2021023
    [14] I. Izonin, R. Tkachenko, I. Dronyuk, P. Tkachenko, M. Gregus, M. Rashkevych, Predictive modeling based on small data in clinical medicine: RBF-based additive input-doubling method, Math. Biosci. Eng., 18 (2021), 2599–2613. https://doi.org/10.3934/mbe.2020392 doi: 10.3934/mbe.2021132
    [15] Y. Liu, Y. Zhou, X. Liu, F. Dong, C. Wang, Z. Wang, Wasserstein GAN-based small-sample augmentation for new-generation artificial intelligence: a case study of cancer-staging data in biology, Engineering, 5 (2019), 156–163. https://doi.org/10.1016/j.eng.2018.11.018 doi: 10.1016/j.eng.2018.11.018
    [16] H. Han, M. Zhou, Y. Zhang, Can virtual samples solve small sample size problem of KISSME in pedestrian re-identification of smart transportation?, IEEE Trans. Intell. Transp. Syst., 21 (2020), 3766–3776. https://doi.org/10.1109/TITS.2019.2933509 doi: 10.1109/TITS.2019.2933509
    [17] Z. Liu, Y. Li, Small data-driven modeling of forming force in single point incremental forming using neural networks, Eng. Comput., 36 (2020), 1589–1597. https://doi.org/10.1007/s00366-019-00781-6 doi: 10.1007/s00366-019-00781-6
    [18] Q. X. Zhu, Z. S. Chen, X. H. Zhang, A. Rajabifard, Y. Xu, Y. Q. Chen, Dealing with small sample size problems in process industry using virtual sample generation: a Kriging-based approach, Soft Comput., 24 (2020), 6889–6902. https://doi.org/10.1007/s00500-019-04326-3 doi: 10.1007/s00500-019-04326-3
    [19] N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, SMOTE: synthetic minority over-sampling technique, J. Artif. Intell. Res., 16 (2002), 321–357. https://doi.org/10.1613/jair.953 doi: 10.1613/jair.953
    [20] B. Efron, R. LePage, Introduction to Bootstrap, Wiley & Sons, New York, 1992.
    [21] S. Lee, A. Ahmad, G. Jeon, Combining bootstrap aggregation with support vector regression for small blood pressure measurement, J. Med. Syst., 42 (2018), 1–7. https://doi.org/10.1007/s10916-018-0913-x doi: 10.1007/s10916-017-0844-y
    [22] M. F. Ijaz, M. Attique, Y. Son, Data-driven cervical cancer prediction model with outlier detection and over-sampling methods, Sensors, 20 (2020), 2809. https://doi.org/10.3390/s20102809 doi: 10.3390/s20102809
    [23] M. La Rocca, C. Perna, Nonlinear autoregressive sieve bootstrap based on extreme learning machines, Math. Biosci. Eng., 17 (2020), 636–653. https://doi.org/10.3934/mbe.202003 doi: 10.3934/mbe.2020033
    [24] S. Cho, M. Jang, S. Chang, Virtual sample generation using a population of networks, Neural Process Lett., 5 (1997), 21–27. https://doi.org/10.1023/A:1009653706403 doi: 10.1023/A:1009653706403
    [25] C. Huang, C. Moraga, A diffusion-neural-network for learning from small samples, Int. J. Approx. Reasoning, 35 (2004), 137–161. https://doi.org/10.1016/j.ijar.2003.06.001 doi: 10.1016/j.ijar.2003.06.001
    [26] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., Generative adversarial nets, in Proceedings of the International Conference on Neural Information Processing Systems (NIPS), (2014), 2672–2680.
    [27] X. H. Zhang, Y. Xu, Y. L. He, Q. X. Zhu, Novel manifold learning based virtual sample generation for optimizing soft sensor with small data, ISA Trans., 109 (2021), 229–241. https://doi.org/10.1016/j.isatra.2020.10.006 doi: 10.1016/j.isatra.2020.10.006
    [28] D. C. Li, C. S. Wu, T. I. Tsai, Y. S. Lina, Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge, Comput. Oper. Res., 34 (2007), 966–982. https://doi.org/10.1016/j.cor.2005.05.019 doi: 10.1016/j.cor.2005.05.019
    [29] M. R. Rahimi, H. Karimi, F. Yousefi, Prediction of carbon dioxide diffusivity in biodegradable polymers using diffusion neural network, Heat Mass Transfer, 48 (2012), 1357–1365. https://doi.org/10.1007/s00231-012-0982-1 doi: 10.1007/s00231-012-0982-1
    [30] A. Majid, S. Ali, M. Iqbal, N. Kausar, Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines, Comput. Methods Programs Biomed., 113 (2014), 792–808. https://doi.org/10.1016/j.cmpb.2014.01.001 doi: 10.1016/j.cmpb.2014.01.001
    [31] B. Zhu, Z. Chen, L. Yu, A novel mega-trend-diffusion for small sample, CIESC J., 67 (2016), 820–826. https://doi.org/10.11949/j.issn.0438-1157.20151921 doi: 10.11949/j.issn.0438-1157.20151921
    [32] L. Yu, X. Zhang, Can small sample dataset be used for efficient internet loan credit risk assessment? Evidence from online peer to peer lending, Finance Res. Lett., 38 (2021), 101521. https://doi.org/10.1016/j.frl.2020.101521 doi: 10.1016/j.frl.2020.101521
    [33] J. Yang, X. Yu, Z. Q. Xie, J. P. Zhang, A novel virtual sample generation method based on Gaussian distribution, Knowl. Based. Syst., 24 (2011), 740–748. https://doi.org/10.1016/j.knosys.2010.12.010 doi: 10.1016/j.knosys.2010.12.010
    [34] K. Wang, J. Li, F. Tsung, Distribution inference from early-stage stationary data streams by transfer learning, ⅡSE Trans., (2021), 1–25. https://doi.org/10.1080/24725854.2021.1875520 doi: 10.1080/24725854.2021.1875520
    [35] O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, et al., Missing value estimation methods for DNA microarrays, Bioinformatics, 17 (2001), 520–525. https://doi.org/10.1093/bioinformatics/17.6.520 doi: 10.1093/bioinformatics/17.6.520
    [36] G. E. Batista, M. C. Monard, An analysis of four missing data treatment methods for supervised learning, Appl. Artif. Intell., 17 (2003), 519–533. https://doi.org/10.1080/713827181 doi: 10.1080/713827181
    [37] D. V. Nguyen, N. Wang, R. J. Carroll, Evaluation of missing value estimation for microarray data, Data Sci. J., 2 (2004), 347–370. https://doi.org/10.6339/JDS.2004.02(4).170 doi: 10.6339/JDS.2004.02(4).170
    [38] A. Jadhav, D. Pramod, K. Ramanathan, Comparison of performance of data imputation methods for numeric dataset, Appl. Artif. Intell., 33 (2019), 913–933. https://doi.org/10.1080/08839514.2019.1637138 doi: 10.1080/08839514.2019.1637138
    [39] T. Cover, P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theory, 13 (1967), 21–27. https://doi.org/10.1109/TIT.1967.1053964 doi: 10.1109/TIT.1967.1053964
    [40] G. H. Cha, Non-metric similarity ranking for image retrieval, in International Conference on Database and Expert Systems Applications: Springer, (2006), 853–862. https://doi.org/10.1007/11827405_83
  • This article has been cited by:

    1. Mohammadjavad Mahdavinejad, Marzieh Nazari, Sina Khazforoosh, Commercialization Strategies for Industrial Applications of Nanomaterials in Building Construction, 2013, 829, 1662-8985, 879, 10.4028/www.scientific.net/AMR.829.879
    2. Alp Karakoç, Jouni Paltakari, Ertugrul Taciroglu, Data-Driven Computational Homogenization Method Based on Euclidean Bipartite Matching, 2020, 146, 0733-9399, 04019132, 10.1061/(ASCE)EM.1943-7889.0001708
    3. Alp Karakoҫ, Jouni Paltakari, Ertugrul Taciroglu, On the computational homogenization of three-dimensional fibrous materials, 2020, 242, 02638223, 112151, 10.1016/j.compstruct.2020.112151
    4. Alp Karakoç, Özgür Keleş, A predictive failure framework for brittle porous materials via machine learning and geometric matching methods, 2020, 55, 0022-2461, 4734, 10.1007/s10853-019-04339-1
    5. Alp Karakoç, 2022, 9780128222072, 145, 10.1016/B978-0-12-822207-2.00015-5
  • Reader Comments
  • © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2755) PDF downloads(116) Cited by(5)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog