
Citation: Vladimir Donskoy. BOMD: Building Optimization Models from Data (Neural Networks based Approach)[J]. Quantitative Finance and Economics, 2019, 3(4): 608-623. doi: 10.3934/QFE.2019.4.608
[1] | An-Hsing Chang, Li-Kai Yang, Rua-Huan Tsaih, Shih-Kuei Lin . Machine learning and artificial neural networks to construct P2P lending credit-scoring model: A case using Lending Club data. Quantitative Finance and Economics, 2022, 6(2): 303-325. doi: 10.3934/QFE.2022013 |
[2] | Tzu-Chien Wang . Deep Learning-Based Prediction and Revenue Optimization for Online Platform User Journeys. Quantitative Finance and Economics, 2024, 8(1): 1-28. doi: 10.3934/QFE.2024001 |
[3] | David Alaminos, M. Belén Salas, Ángela M. Callejón-Gil . Managing extreme cryptocurrency volatility in algorithmic trading: EGARCH via genetic algorithms and neural networks. Quantitative Finance and Economics, 2024, 8(1): 153-209. doi: 10.3934/QFE.2024007 |
[4] | Lukáš Pichl, Taisei Kaizoji . Volatility Analysis of Bitcoin Price Time Series. Quantitative Finance and Economics, 2017, 1(4): 474-485. doi: 10.3934/QFE.2017.4.474 |
[5] | David Anderson, Urban Ulrych . Accelerated American option pricing with deep neural networks. Quantitative Finance and Economics, 2023, 7(2): 207-228. doi: 10.3934/QFE.2023011 |
[6] | Larysa Dokiienko, Nataliya Hrynyuk, Igor Britchenko, Viktor Trynchuk, Valentyna Levchenko . Determinants of enterprise's financial security. Quantitative Finance and Economics, 2024, 8(1): 52-74. doi: 10.3934/QFE.2024003 |
[7] | Andrea Ferrario, Massimo Guidolin, Manuela Pedio . Comparing in- and out-of-sample approaches to variance decomposition-based estimates of network connectedness an application to the Italian banking system. Quantitative Finance and Economics, 2018, 2(3): 661-701. doi: 10.3934/QFE.2018.3.661 |
[8] | Yonca Erdem Demirtaş, Neslihan Fidan Keçeci . The efficiency of private pension companies using dynamic data envelopment analysis. Quantitative Finance and Economics, 2020, 4(2): 204-219. doi: 10.3934/QFE.2020009 |
[9] | Luís Miguel Marques, Flávio Morais, Zélia Serrasqueiro . How does society satisfaction affect the capital structure of firms? A two-part fractional regression approach. Quantitative Finance and Economics, 2024, 8(2): 210-234. doi: 10.3934/QFE.2024008 |
[10] | Fisnik Morina, Valdrin Misiri, Shpejtim Alijaj . Examining the relationship between public debt and private consumption in European OECD countries (2011–2020). Quantitative Finance and Economics, 2024, 8(1): 75-102. doi: 10.3934/QFE.2024004 |
In this paper, it is shown how can be synthesized models of optimal control or planning based on samples or precedents. The proposed BOMD (Building Optimization Models from Data) approach is based on empirical induction and directed to obtaining regularities in the form of empirical optimization models which are synthesized in analytical form. We follow the Kolmogorov idea about regularity as non-randomness. This allows us to estimate the probability of non-random model selection from the set of admissible models that are consistent with the sample or to given initial data. The proposed methods and algorithms can be applied to solve a wide range of tasks of intelligent control and planning, in particular, production planning and robotics. Although the control models derived from the data are approximate, their application can be more successful than the use of less realistic, inconsistent with the objects models which are chosen on the base of subjective considerations.
Unfortunately, very few scientific papers are devoted to the theory of optimization models extraction from data. The main results in this direction are presented mainly in Mazurov (1971); Eremin and Mazurov (1979); Donskoy(1993, 2017, 2018). In broad terms, problems considered in this paper are included in the scope of problems acquisition of optimization models from data and decision making with incomplete initial information. Such problems were the first investigated by Vl.D. Mazurov Mazurov (1971) on the base of theory of pattern recognition which is an essential element of the BOMD paradigm Donskoy (1993); Antonova (2011). In recent years, interest in the subject has increased due to the development of cognitive technologies and big data technology, especially in interests to business Barton D and Court D (2013); MathWorks (2017). Previously, the author obtained results in the BOMD domain for the case of linear Donskoy (1993) and pseudo-Boolean models Donskoy (2018).
This paper continues the research within the paradigm of extracting or building optimization models from data for intelligent control systems. The obtained results are devoted to nonlinear models with real variables, generally speaking, of any functional complexity in the class of functions of arbitrary degree of smoothness and constraints represented by piecewise linear approximation. This is achieved through the use of neural networks as the main used mathematical apparatus. If the learning sample contains simultaneously the values of the objective function and the values of characteristic function of constraints, it is proposed to use an approach based on the training of two neural networks: NN1 - for the synthesis of the objective function and NN2 - for the synthesis of the approximating characteristic function of constraints. Unfortunately, the solution of the problem defined by such 2-neural synthesized model may end up finding, generally speaking, only a local conditional extremum. To find the global extremum of the multiextremal neural objective function, a heuristic algorithm based on a preliminary classification of the search area by using the decision tree is developed. The presented in the paper approach to an extraction of conditionally optimization model from the data for the case when there is no information on the points not belonging to the set of admissible solutions is fundamentally novel. In this case, a heuristic algorithm for approximating the region of admissible solutions based on the allocation of regular (non-random) empty segments of the search area is developed. When using this approach in practice in intelligent control systems, it is necessary to additionally apply human-machine procedures for verification and correction of synthesized models.
Let Xn=X1×⋯×Xi⋯×Xn is limited area in Rn which is called the space of variables-features of dimension n; ˜x=(x1,⋯,xi,⋯,xn) - an arbitrary point in the space of variables which is a description of an admissible object.
Each coordinate xi, i=1,…,n, of the object description ˜x belongs to some fixed limited set of admissible values Xi, Xi⊂R, mi≤xi≤Mi.
TOpt={(˜aj,γj,yj)}lj=1 is a reliable empirical sampling for the especial machine learning problem – model extraction based on partial information about some existing but unknown scalar criterion F:Xn→R and unknown constraints that can formally be represented as Ω(˜x)=1.
In the problem under consideration denoted below ZΩ,F, it is believed that the set Xn is divided into two classes: the class L1 consisting of points which satisfy a system of the constraints in the problem of the best choice (termed admissible points), and the class L0 contains the points which is knowingly not satisfy the system constraints. We denote Ω:Xn→{0;1} – the characteristic function of the constraints, which is partially defined by the learning sample; Ω(˜x)=1⇔˜x∈L1; Ω(˜x)=0⇔˜x∈L0. We assume that standard learning sample TOpt contains reliable information γj=Ω(˜aj), γj∈{0;1}, yj=F(˜aj), ˜aj=(aj1,…,ajn)∈Rn.
In the learning process, we should build a rule (algorithm) that allows to choose such a solution ˆ˜x∗ which would satisfy the constraints (Ω(ˆ˜x∗)=1) and the value F(ˆ˜x∗) would be as big as possible (or less - in the sense of the assigned problem).
In the problems under consideration the criterion F and the constraints (characteristic function Ω) are not given exactly - neither analytically, nor completely tabular, nor using of any formal system. They are "reflected" by the values of the learning sample TOpt and are only partially defined.
Statement of the problem solved in this paper is as follows. It is required (using partial initial information TOpt) choose a solution ˆ˜x∗ that is as close as possible to optimal solution ˜x∗ determined unknown but true exists objects F and Ω, which are approximated by neural networks*. If the scalar criterion and the characteristic function of the constraints are approximated independently by separate algorithms that compute as accurate as possible in any sense of the approximation ˆF and ˆΩ then restored from the learning sample problem (extracted mathematical model) finding the best solution is as follows:
max(min) ˆF(˜x):ˆΩ(˜x)=1∧˜x∈Xn. |
* In some cases, classifying trees will be used to approximate the Ω domain.
A pair of functions <ˆF,ˆΩ> resulting from machine learning is called empirical information model.
The idea of BOMD paradigm to facilitate its understanding can be illustrated by the following simple scheme (Figure 1). In the process of observation of the object and the environment, numerical data on its current characteristics - parameters (features), as well as the values characterizing the quality of operation of the object, must be recorded in the computer's memory. Besides, the facts of being in unacceptable (wrong) state the object is recorded. Such wrong object states are defined by experts-observers. The data obtained are grouped into training samples that reflect the agammaegate of valid and invalid object states and quality values for each fixed set of parameters. Then, by performing empirical generalization using machine learning, the synthesis of the empirical information model is carried out.
We emphasize the main advantage of the BOMD paradigm: the optimization empirical model is derived from real data about the control object, and it is not subjective proposed by some expert. In contrast, a subjective model that is not consistent with real information can lead to decisions that do not correspond to reality, although mathematically can be calculated accurately.
Applications BOMD technology is very various. It is known how widely used mathematical models of optimization in economics and engineering. This is because each production and technical system aims to maximize quality or profit (or minimize costs).
Having an empirical optimization model, it is possible to implement a controller of any technical device. The control process is realized as shown in the Figure 2. The main element of the controller is this optimization model which is synthesized from data. The coordinates of the vector ˆ˜x∗ are the control actions x1,...,xn.
It should be noted, that for some production systems it is extremely difficult and even impossible to subjectively write out a mathematical model of management. For example, for multi-machine production with a large number of products and processes. BOMD technology also may be successfully applied in this case.
One example of the practical implementation of the proposed approach is presented below (see Section 5, Numerical experiment).
The use of BOMD technology involves a lot of preparatory work for the collection of initial information and the formation of training samples. Despite the importance of these preparatory actions, their description is not the purpose of this article and the main problem on which the focus pays attention. Principles of partition of initial data (percentage of training, validation, etc.) remain outside the scope of this paper. These issues are widely covered in publications, for example, Roh et al. (2019); Zhang et al. (2003).
We assume that by using precedents describing unknown objective function F, we'll synthesize an approximating neural network† NN1 for implementation the function ˆFNN1(˜x)=F(˜x), i.e. F is an approximation for F. And by precedents describing the constrains the classifying neural network denoted NN2 will be learned to approximate the characteristic function of constraints Ω(˜x): ˆΩNN2(˜x). For convenience following calculations, we present a multilayer fully connected approximating neural network NN1 by entering an additional "hidden" layer with the number zero (Fig. 3).
† Neural networks are now so widely used in various fields of economics and technology that we can even talk about a excessive "boom" in this direction. Therefore, it seems unnecessary to include a section on neural networks in this article. For those who are not familiar with the above subject, one can recommend the article Abiodun et al. (2018).
"Activation functions" of the layer 0 of the form φo(z)=z are added for the convenience computations.
The activation functions in all other layers will be considered logistic:
φ(z)=11+e−z; φ′(z)=φ(z)(1−φ(z)). | (1) |
We will use the following notation:
vj – weighted sum of all neuron j inputs which is called its induced local field Haykin (2008);
l – layer number, 0≤l≤L; a layer number 0 is a special input layer; output layer has numberL;
m1,m2,…,mL−1 - number of neurons in hidden layers 1,2,…,L−1;
˜x=(x1,…,xi,…,xn) - the input of neural network;
yj=φ(vj) - the output of neuron j; Y=Y(˜x) - the network output.
To find the gradient of the function implemented by the trained neural network presented in Figure 3 we will use a recursive scheme which underpins the learning algorithm of the neural network by the method of backpropagation of error Haykin (2008).
The local gradient of the output neuron is determined by the expression
δ(L)L=∂Y∂vL=φ′(vL)=φ(vL)(1−φ(vL))=Y(1−Y), | (2) |
where the superscript in parentheses indicates the network layer number.
Local gradient of neuron j of hidden layer with number l:
δ(l)j=φ′(vj)∑kδ(l+1)kωjk=y(l)j(1−y(l)j)∑kδ(l+1)kωjk, | (3) |
where the sum is taken from all the neuron numbers of the layer immediately following the layer containing the neuron j.
δ(1)j=φ′(n∑i=1xiωij)∑kδ(2)kωjk=y(1)j(1−y(1)j)∑kδ(2)kωjk. | (4) |
δ(0)i=φ′0(xi)∑kδ(1)kωik=∑kδ(1)kωik. | (5) |
∂Y∂xi=δ(0)i; grad Y=(δ(0)1,…,δ(0)n). | (6) |
As with the backpropagation calculations, for a given input ˜x, the local fields and outputs of all neurons are first computed in the forward direction from the input to the output of the network. Then, in the opposite direction, starting with the equation (2), the calculations of local gradients by the formula (3) are performed recursively and completed by the calculation of the gradient by the formulas (5) and (6).
Finding the extremum of the empirical information model <ˆFNN1,ˆΩNN2> can be done by gradient algorithm 1 presented below. In general, given that the ˆFNN1 approximation may be multiextremal, this algorithm will look for a local extremum.
Algorithm 1. Search for conditional (local) minimum by ˆFNN1 and ˆΩNN2(˜x).
Input: Learning sample TOpt={(˜aj,γj,yj)}lj=1, neural approximations ˆFNN1, and ˆΩNN2(˜x).
Output: ˜x∗:ˆFNN1(˜x∗)≈minˆFNN1(˜x) / ˆΩNN2(˜x)=1 – the point of extremum of empirical information model and value y∗ of the function ˆFNN1 at this point.
1: Take from the training sample a point ˜x0=˜aj∗: yj∗=minjyj as an initial approximation and calculate ˆFNN1(˜x0).
2: Choose the initial value η0 and the accuracy of the approximation ε.
3: Choose a learning rate reducing coefficient δ: 0,8<δ<1.
4: for t:=1,2,3,… do
5: ˜xt:=˜xt−1−ηt−1 grad ˆFNN1(˜xt−1);
6: ηt:=ηt−1⋅δ;
7: if ˆΩNN2(~xt)=0 then
8: { if ρ(˜xt,˜xt−1)<ε then goto 12 else ηt:=ηt−1⋅δ }
9: else
10: if ‖ˆFNN1(˜xt)−ˆFNN1(˜xt−1)‖<ε then goto 13;
11: end for;
12: ˜x∗:=˜xt−1; y∗:=ˆFNN1(˜xt−1); stop.
13: ˜x∗:=˜xt; y∗:=ˆFNN1(˜xt); stop.
A search for the global extremum for the problem under consideration is based on the repeated application of the above algorithm 1 for finding a local extremum starting from different initial points of the domain of admissible solutions.
The First, traditional approach based on the application of genetic algorithms, the description of which can be found in extensive scientific and technical literature El-Sawi et al. (2014), will not be adapted to the problem. The genetic algorithm is a heuristic search algorithm used to solve optimization problems by random selection, combination, and variation of the desired parameters using mechanisms similar to natural selection in nature Malhotra et al. (2011). Genetic algorithms are members of the class of evolutionary algorithms Corne and Lones (2018), which are based on the principles of population genetics. To solve the problem of approximate search of the global extremum of the model <ˆFNN1,ˆΩNN2>, it can be used a different approach, described below, which will allow to better use the regularities reflected in the training sample, and not to use random selection, combinations and variation of local solutions.
This second, new approach outlined below, is based on the preliminary clustering of the points from the sample TOpt={(˜aj,γj,yj)}lj=1 by the values of the objective function yj, j=¯1,l. For this purpose, an algorithm of building a decision tree, defining the partitioning of the region of the admissible values of the variables into hyperparallelepipeds is used Breiman L, Friedman JH, Olshen R, et al. (1984). In the region corresponding to each such hyperparallelepiped, the values of the objective function yj belong to the fixed semi-interval. The number of such semi-intervals is the number of classes in the preliminary clustering problem. After constructing the classifying tree each class receives a logical description in terms of threshold predicates of the form [xi≤b] where b is a real constant, i∈{1,...,n}.
It is necessary to determine two constants m and M such that m<yj<M for all j=¯1,l from the content considerations determined by the problem domain of the problem. Then divide the segment [m,M] into k equal segments [m,m+Δ),[m+Δ,m+2Δ)…,[m+kΔ,M], where Δ=(M−m)/k. If the point ˜aj of the learning sample falls into the segment with the number q, q=¯1,k, this point is considered to belong to the class Kq. Thus, in each of the obtained classes, there will be points where the values of the objective function differ by no more than Δ.
Denote ˉyq – the average value (a mid of q-th interval) ˉyq=m+qΔ−Δ/2. Each end vertex of the tree (leaf) contains the value ˉyq corresponding to the class number q and a special label-flag used to remember the viewed leaves when searching for the global extremum.
Algorithm 2. Heuristic search for a conditional global minimum of an empirical information model <ˆFNN1,ˆΩNN2(˜x)>.
Input: a learning sample TOpt={(˜aj,γj,yj)}lj=1; neural approximations ˆFNN1 and ˆΩNN2(˜x); the algorithm 1 for finding a local minimum as used internal procedure; the point clustering tree by target values of objective function as the second used internal procedure; the most allowed number of local search steps S.
Output: ˜x∗:ˆFNN1(˜x∗)≈minˆFNN1(˜x) / ˆΩNN2(˜x)=1 - the intended point of global extremum of empirical information model and value y∗ of the function ˆFNN1 at this point.
1: Clear labels in all the leaves of the tree (all labels will get null values).
2: Take a point ˜x0=˜aj∗: yj∗=minjyj from the training sample as the initial value and execute the algorithm 1.
3: Remember the found point as ˜x∗ and the value of the local minimum at it in variable y∗.
4: Mark a leaf of a tree in which the point ˜x∗ was hit (set the label to one).
5: Select the unmarked leaf of the tree with the lowest value ˉyq and the point of the training sample contained in it with the smallest value of the objective function.
6: Apply algorithm 1 to the selected point to get the extremum y⋄ at the point ˜x⋄.
7: Mark the selected leaf as viewed (check label to one).
8: if y⋄<y∗ then {y∗:=y⋄;˜x∗:=˜x⋄};
9: If unlabeled leaves of the tree still exist and the number of leaf viewing (applications of the algorithm 1) did not exceed the given number S then goto 5;
10: The end of search.
The substantiation for the use of the classifying algorithm in the search for the global extremum in the algorithm 2 is as follows.
1. The number of l points in the training sample is many times the number of leaves in the classification tree. Therefore, if instead of iterating over all these l points to initialize and execute the local search, we use the above described classifying algorithm then the local search will be repeated no more than μ≪l times – each time with a starting point taken from the segment corresponding to the selected tree sheet.
2. The leaves of the tree correspond to different, remote from each other segments‡ of the global search area from which initial points for extremum finding are chosen, while other, for example, random way of choice of points from the whole training sample to initialize a local search may result in repeated calculations in a sufficiently narrow neighborhood of the same local minimum.
‡ The distance between two regions W1 and W1 is understood as
inf{˜x∈W1, ˜y∈W2}ρ(˜x,˜y) |
where ρ is Euclidean distance between points in Rn.
The collection of initial training information in solving problems in the management of large systems is no less difficult than the development of the necessary software, especially because in recent years, many systems and programming environments are equipped with powerful libraries that implement various methods of machine learning and decision-making.
It is particularly difficult to get data on points or states of optimizing or managed objects that are not admissible – do not satisfy system constraints. In our case, it is part of the standard learning information TOpt={(˜aj,γj,yj)}lj=1 consisting of points ˜aj such that γj=Ω(˜aj)=0. If the data about the optimized object are accumulated in the process of its functioning, usually there are some admissible states of its operation; the other states are understood as "breakdown" ones and are not admissible. The occurrence of such a state can be considered a rare event.
The above considerations lead to the expediency of considering the case when the training information has the form T−Opt={(˜aj,yj)}lj=1 and it contains only admissible points for which γj=Ω(˜aj)=1.
The idea of an approach to the construction of constraints for such a case is related to the allocation in the search area of the global extremum of the so-called empty segments in which did not hit the sampling points from T−Opt belonging to the area of admissible solutions. Figure 4 explains this approach. Search areas are conditionally shown in which asterisks denote admissible points and empty segments are denoted as E1,…,E6.
Recall that in this paper we consider regular processes and objects. In this case, we are not talking about any probability distributions, but it is still possible to estimate the selected empty segments based on the Kolmogorov approach to estimating the regularity as non-randomness.
A. N. Kolmogorov emphasized the need to distinguish between actual randomness as the absence of regularity and a stochastic randomness as the subject of the theory of probability (Kolmogorov, 1991, p. 42). When empirical extraction of regularities the Kolmogorov approach allows us to estimate the non-randomness of the presence of an empty segment.
The absence of regularity in the arrangement of points in the area of possible values of variables in the form of hyperparallelepiped of the volume
V0=n∏i=1[mi,Mi], mi≤xi≤Mi, | (7) |
takes a place when the points are distributed uniformly, randomly and independently. But if, for example, the projection of points aj, j=¯1,l, of the training sample to some coordinate xi is represented by the histogram shown on the fig. 5 then the regularity in the data is obvious and it has a description in the form of a predicate [xi≥b].
If we consider a one-dimensional random variable uniformly distributed on the segment [mi,Mi] with independent realizations - hits in this segment, the probability to get into [mi,b] would be geometrically estimated as
p=b−miMi−mi, | (8) |
and the probability not to get into [mi,b] l times in l independent tests – as (1−p)l. This value is the probability of an event that the specified interval will be accidentally empty as a result of l tests. And with probability 1−(1−p)l this interval will be empty not by chance, i.e. in this case, we can say that the probability of the regularity [xi≥b] as a result of l sample tests is 1−(1−p)l.
To approximate the domain of admissible solutions Ω we will use a classifying tree with threshold predicates of the form [xi≥b] at the vertices Loh (2014) constructed by precedents representing only points-representatives of this domain Ω. The classifying tree that allocates empty segments is shown in figure 4 to illustrate the idea.
The main element of the synthesis of such a tree is the choice of threshold values b for each branch (splitting the current area Gν) and the construction of another internal or terminal vertex of the tree. The following algorithm 3 is the main procedure providing the process of synthesis of the approximation tree of the Ω domain.
Algorithm 3. Clearing up the possibility of allocating an empty segment and the selection of the predicate of the form [xi≥b] ([xi≤b]) for the splitting of the current region Gν and building the next vertex ν of the classification tree.
Input: the study area – hyperparallelepiped Gν; Δ1 – the minimum permitted width of the projection of the empty region.
Output: the value of the pointer Flag; if Flag=1 then it is possible an allocation of the empty segment and construct a conditional predicate for vertices; if Flag=0 the the area Gν cannot be split, and the instruction is issued to build the end vertex marked with a admissible segment.
1. Select all sample points that are in the area Gν.
2. Flag:=0;
3. Cycle through all the coordinates of the points in the area Gν;
4: if Flag:=1 then goto 14;
5. Find the average distance Δ2 between the projections of the points;
6. Find the minimum μ and maximum M values of the current coordinate;
7. if r1=μ−mi>Δ1 ∨ r2=Mi−M>Δ1 then
8: { Flag:=1;
9: if r1≥r2 then
10: { b:=μ−Δ2; add a vertex with the predicate [xi≤b] }
11: else
12: {b:=M+Δ2; add a vertex with the predicate [xi≥b]};
}
13. the end of the cycle;
14. if Flag:=0 then add a leaf marked as a region of admissible solutions.
Each terminal vertex τ marked Eτ as "empty" region Gτ corresponds to the probability P(Eτ) which estimates a non-randomness of its detection or, in other words, the probability that the detected empty region Gτ is a regularity:
p(Eτ)=1−(1−pτ)l, pτ=V(Gτ)/V0, | (9) |
where V(Gτ) is the volume of the empty region Gτ. This volume is easy to calculate carrying out a reverse pass through the branches of the tree from the terminal vertex τ to the root of the tree "reading" all threshold predicates in the viewing vertexes passable branches and forming a description of hyperparallelepiped Gτ.
To illustrate the approach to the synthesis of a nonlinear neural optimization model, a sample of 17 manufacturing enterprises was used (Table 1; cost indicators are presented for the month). The observations in the table correspond only to admissible points (Ω(˜x)=1).
Number | x1 | x2 | x3 | x4 | x5 | y |
1 | 1404.5 | 345.9 | 170 | 0.313 | 0.061 | 112.5 |
2 | 1709.8 | 431.9 | 225 | 0.285 | 0.067 | 113.7 |
3 | 1808.7 | 886.2 | 238 | 0.398 | 0.089 | 193.0 |
4 | 1437.1 | 484.2 | 181 | 0.322 | 0.076 | 125.0 |
5 | 1496.1 | 724.6 | 177 | 0.367 | 0.085 | 173.4 |
6 | 1034.3 | 200.7 | 179 | 0.206 | 0.039 | 81.4 |
7 | 1335.0 | 317.6 | 162 | 0.314 | 0.049 | 106.4 |
8 | 1256.1 | 156.1 | 159 | 0.187 | 0.038 | 72.6 |
9 | 1581.4 | 364.3 | 255 | 0.319 | 0.054 | 110.7 |
10 | 1826.5 | 554.2 | 275 | 0.338 | 0.083 | 146.3 |
11 | 1697.7 | 387.7 | 251 | 0.219 | 0.064 | 112.9 |
12 | 1294.6 | 302.5 | 154 | 0.214 | 0.044 | 105.9 |
13 | 1174.7 | 483.9 | 215 | 0.324 | 0.075 | 134.5 |
14 | 1180.9 | 220.1 | 165 | 0.217 | 0.039 | 91.4 |
15 | 1319.0 | 243.6 | 156 | 0.207 | 0.040 | 98.4 |
16 | 1460.0 | 347.3 | 202 | 0.316 | 0.053 | 107.6 |
17 | 1478.3 | 313.5 | 186 | 0.211 | 0.044 | 102.3 |
Following 5 features-factors that characterize the enterprises are used:
x1 - gross output in value terms (thousand dollars);
x2 - cost of fixed assets (thousand dollars);
x3 - number of employees;
x4 - specialization ratio§;
§multiplied by 100 the ratio of the cost of finished products of the profile direction to the cost of all finished products (per month).
x5 - marketing mix ratio¶.
¶linear convolution of the coefficients of the main marketing indicators.
A target attribute y is a profit (thousand dollars).
The neural network with only one hidden layer was used (Figure 6).
It is known that such networks have sufficiently great possibilities of functions approximation when training for regression Hornik et al. (1990).
Sigmoid activation function
φ(z)=11.0+exp(−1.0351⋅z) | (10) |
was used. The error estimated at neural network training is
E(˜w,˜V)=12l∑j=1(fNN(Xsamplej)−ysamplej)2; | (11) |
fNN(x1,…,xn)=−Vo+m∑q=1Vq⋅φ(−w0,q+n∑i=1wi,qxi) | (12) |
is the output of the neural network at sample number j; ˜w={wi,q,i=¯0,n;q=¯1,m;} are the input weights of the network; Xsamplej – j-th row of the training table; Vq, q=¯0,m, – the output weights of the network; ysamplej – profit value in the row j of the sample.
The following formulas were used:
ηk=φ(−w0,k+m∑s=1ws,kxs); ∂fNN∂wi,q=m∑k=1Vkxiηk(1−ηk), i=¯1,n;q=¯1,m; | (13) |
∂fNN∂w0,q=−m∑k=1Vkηk(1−ηk), q=¯1,m; ∂fNN∂Vk=ηk, k=¯1,m; ∂fNN∂V0=−1; | (14) |
∂E(˜w,˜V)∂wi,q=(fNN(Xsamplej)−ysamplej)∂fNN∂wi,q; ∂E(˜w,˜V)∂Vk=(fNN(Xsamplej)−ysamplej)ηk. | (15) |
Well-known gradient descent formulas were used in the training:
wti,q:=wt−1i,q−γt∂E(˜w,˜V)∂wi,q; Vtq:=Vt−1q−γt∂E(˜w,˜V)∂Vk, | (16) |
where γt, t=1,2,... is decreasing with the growth of t learning rate.
During 94 eras of learning, there was a monotonous decrease in the average error 1l∑17j=1(yNNj−ysamplej)2 from 1.0878 until 0.0277.
After the neural network was trained, the problem of finding the maximum of the function realized by this neural network under the following restrictions was solved: the values of features not exceeding the boundaries of the segment
0,9minj=¯1,17xsamplej,i≤xi≤1,1maxj=¯1,17xsamplej,i | (17) |
were allowed. The following formulas were used for gradient lifting:
∂fNN∂xi=m∑q=1Vqηq(1−ηq)wi,q; xti:=xt−1i+γtΔyt∂fNN∂xi, | (18) |
where Δyt is the increment at the next step of the function defined by the constructed neural network.
Exactly 250 optimization steps have been completed. As a result, the maximum Y=195,54 was found for the values of the variables x1=1096,56, x2=934,67, x3=199,94, x4=0,392, x5=0,0371. Note that the maximum profit value (maxj=¯1,17ysamplej) in the sample is equal to 193,0.
Setting the optimal parameters found by the person finally making the decision will lead to an increase in profits. This is the management implications.
The scope of the model presented in the numerical example is quite wide. Such a model can be synthesized to find the values of parameters of any manufactures that maximize profits, to find the values of technical parameters of complex engineering systems. The use of a neural network makes it possible to learn, generally speaking, any nonlinear dependence that will be reflected in empirical samples.
The use of neural networks for the synthesis of target functions makes it possible to implement complex dependencies on a large number of variables, but it is difficult to find out how different values of a set of independent variables affect a particular dependent variable under certain specific conditions. Therefore sensitivity analysis should be considered an additional task. To solve it, one can use the calculation of partial derivatives by independent variables. Promising for this purpose also is the use of methods of transformation of the trained neural network into a system of logical rules, for example, represented by a decision tree Boz (2002). However, sensitivity analysis issues are beyond the scope of this article.
In conclusion, the following should be noted.
BOMD technology is an adequate tool for solving many problems of intelligent management in economics and technology. Although the results of the synthesis of optimization models from the data and decision-making using these models can give significant errors, these errors can be significantly smaller compared to the errors of subjective models since there is the consistency of decisions with real data. The objectivity of decision-making in this sense is crucial.
In this paper, we considered the synthesis of nonlinear models from data and used neural networks for this aim. A synthesis of linear models was previously studied in detail by the author in his previous papers listed in the references. We emphasize that the use of nonlinear models is advisable only when a test of the admissibility of the application of linear models gives a negative result Donskoy (2012).
The author declares no conflicts of interest in this paper
[1] | Abiodun OI, Jantan A, Omolara AE, et al. (2018) State-of-the-art in artificial neural network applications: A survey. Heliyon 4: 1-7. |
[2] |
Antonova GM (2011) Application of Pattern Recognition Methods to Solve Optimization Problems Using Imitation Models. Pattern Recognit Image Anal 21: 113-116. doi: 10.1134/S1054661811020076
![]() |
[3] | Barton D, Court D (2013) Three Keys to Building a Data-Driven Strategy. Available from:https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/three-keys-to-building-a-data-driven-strategy. |
[4] | Boz O (2002) Converting A Trained Neural Network To a Decision Tree DecText - Decision Tree Extractor. ICMLA 2002: Las Vegas, Nevada, USA 4: 110-116. |
[5] | Breiman L, Friedman JH, Olshen R, et al. (1984) Classification and Regression Trees, Chapman and Hal, New York, NY. |
[6] | Corne D, Lones MA (2018) Evolutionary Algorithms, In: Marti R, Panos P, and Resende M. (eds) Handbook of Heuristics, Springer, Cham. |
[7] | Donskoy VI (2018) A Synthesis of Pseudo-Boolean Empirical Models by Precedential Information. Bulletin of the South Ural State University, Series: Mathematical Modelling, Programming and Computer Software 11: 96-107. |
[8] | Donskoy VI (2017) Extraction of Optimization Models from Data: a Decision Tree and Forest based Approach. Taurida J Comput Sci Theory Math 4: 59-86. |
[9] |
Donskoy VI (1993) Partially Defined Optimization Problems: An Approach to a Solution that is based on Pattern Recognition Theory. J Sov Math 65: 1664-1668. doi: 10.1007/BF01097516
![]() |
[10] | Donskoy VI (2012) Synthesis of coordinated linear optimization models according to precedential information: an approach based on Kolmogorov complexity. Taurida J Comput Sci Theory Math 1:13-25. |
[11] | El-Sawi AA, Hussein MA, Zaki EM, et al. (2014) An Introduction to Genetic Algorithms: A survey. A practical Issues. Int J Sci Eng Res 5: 252-262. |
[12] | Eremin II, Mazurov VlD (1979) Unsteady processes of mathematical programming, Nauka, Moscow. |
[13] | Haykin S (2008) Neural Networks and Learning Machines, Prentice Hall. |
[14] |
Hornik K, Stinchcombe M, White H (1990) Univrsal Approximation of an Unknown Mapping and Derivatives Using Multilayer Feedforward Networks. Newral Networks 3: 551-560. doi: 10.1016/0893-6080(90)90005-6
![]() |
[15] | Kolmogorov AN (1991) Algorithm, information, complexity, Znanie, Moscow. |
[16] |
Loh WY (2014) Fifty Years of Classification and Regression Trees. Int Stat Rev 82: 329-348. doi: 10.1111/insr.12016
![]() |
[17] | MathWorks (2017) Building Models from Data and Scientific Principles. Available from: https://www.mathworks.com/solutions/mathematical-modeling/building-models-data-scientificprinciples.html. |
[18] | Mazurov VlD (1971) Application of Methods of Theory of Pattern Recognition in the Optimal Planning and Management. Proceeding of I-st all-Union Conference on Optimal Planning and National Economy Management. Moscow. |
[19] | Malhotra R, Singh N, Singh Y (2011) Genetic Algorithms: Concepts, Design for Optimization of Process Controllers. Comput Inf Sci 4: 39-54. |
[20] | Roh Y, Heo G, Whang SE (2019) A Survey on Data Collection for Machine Learning. ArXiv: 1811.03402v2(201 [cs.LG]. |
[21] |
Zhang S, Zhang C, Yang Q (2003) Data Preparation for Data Mining. Appl Artif Intell 17: 375-381. doi: 10.1080/713827180
![]() |
1. | Dehao Ruan, Zhehao Huang, Xiaoxia Guo, Inequalities and stability of stochastic Hopfield neural networks with discrete and distributed delays, 2020, 407, 09252312, 281, 10.1016/j.neucom.2020.05.005 | |
2. | Qinghua Zhou, Li Wan, Hongbo Fu, Qunjiao Zhang, Exponential stability of stochastic Hopfield neural network with mixed multiple delays, 2021, 6, 2473-6988, 4142, 10.3934/math.2021245 | |
3. | Dehao Ruan, Weiguo Liu, Minran Yang, Zhehao Huang, Xiaoxia Guo, Novel Stability Results for Halanay Inequality and Applications to Delay Neural Networks, 2020, 8, 2169-3536, 19504, 10.1109/ACCESS.2020.2968760 | |
4. | Lean Yu, Lihang Yu, Kaitao Yu, A high-dimensionality-trait-driven learning paradigm for high dimensional credit classification, 2021, 7, 2199-4730, 10.1186/s40854-021-00249-x | |
5. | Fanhong Zhang, Chen Fei, Weiyin Fei, Stability of stochastic Hopfield neural networks driven by G-Brownian motion with time-varying and distributed delays, 2023, 520, 09252312, 320, 10.1016/j.neucom.2022.10.065 | |
6. | Li Wan, Qinghua Zhou, Hongbo Fu, Qunjiao Zhang, Exponential stability of Hopfield neural networks of neutral type with multiple time-varying delays, 2021, 6, 2473-6988, 8030, 10.3934/math.2021466 | |
7. | Huahai Qiu, Li Wan, Zhigang Zhou, Qunjiao Zhang, Qinghua Zhou, Global exponential periodicity of nonlinear neural networks with multiple time-varying delays, 2023, 8, 2473-6988, 12472, 10.3934/math.2023626 |
Number | x1 | x2 | x3 | x4 | x5 | y |
1 | 1404.5 | 345.9 | 170 | 0.313 | 0.061 | 112.5 |
2 | 1709.8 | 431.9 | 225 | 0.285 | 0.067 | 113.7 |
3 | 1808.7 | 886.2 | 238 | 0.398 | 0.089 | 193.0 |
4 | 1437.1 | 484.2 | 181 | 0.322 | 0.076 | 125.0 |
5 | 1496.1 | 724.6 | 177 | 0.367 | 0.085 | 173.4 |
6 | 1034.3 | 200.7 | 179 | 0.206 | 0.039 | 81.4 |
7 | 1335.0 | 317.6 | 162 | 0.314 | 0.049 | 106.4 |
8 | 1256.1 | 156.1 | 159 | 0.187 | 0.038 | 72.6 |
9 | 1581.4 | 364.3 | 255 | 0.319 | 0.054 | 110.7 |
10 | 1826.5 | 554.2 | 275 | 0.338 | 0.083 | 146.3 |
11 | 1697.7 | 387.7 | 251 | 0.219 | 0.064 | 112.9 |
12 | 1294.6 | 302.5 | 154 | 0.214 | 0.044 | 105.9 |
13 | 1174.7 | 483.9 | 215 | 0.324 | 0.075 | 134.5 |
14 | 1180.9 | 220.1 | 165 | 0.217 | 0.039 | 91.4 |
15 | 1319.0 | 243.6 | 156 | 0.207 | 0.040 | 98.4 |
16 | 1460.0 | 347.3 | 202 | 0.316 | 0.053 | 107.6 |
17 | 1478.3 | 313.5 | 186 | 0.211 | 0.044 | 102.3 |
Number | x1 | x2 | x3 | x4 | x5 | y |
1 | 1404.5 | 345.9 | 170 | 0.313 | 0.061 | 112.5 |
2 | 1709.8 | 431.9 | 225 | 0.285 | 0.067 | 113.7 |
3 | 1808.7 | 886.2 | 238 | 0.398 | 0.089 | 193.0 |
4 | 1437.1 | 484.2 | 181 | 0.322 | 0.076 | 125.0 |
5 | 1496.1 | 724.6 | 177 | 0.367 | 0.085 | 173.4 |
6 | 1034.3 | 200.7 | 179 | 0.206 | 0.039 | 81.4 |
7 | 1335.0 | 317.6 | 162 | 0.314 | 0.049 | 106.4 |
8 | 1256.1 | 156.1 | 159 | 0.187 | 0.038 | 72.6 |
9 | 1581.4 | 364.3 | 255 | 0.319 | 0.054 | 110.7 |
10 | 1826.5 | 554.2 | 275 | 0.338 | 0.083 | 146.3 |
11 | 1697.7 | 387.7 | 251 | 0.219 | 0.064 | 112.9 |
12 | 1294.6 | 302.5 | 154 | 0.214 | 0.044 | 105.9 |
13 | 1174.7 | 483.9 | 215 | 0.324 | 0.075 | 134.5 |
14 | 1180.9 | 220.1 | 165 | 0.217 | 0.039 | 91.4 |
15 | 1319.0 | 243.6 | 156 | 0.207 | 0.040 | 98.4 |
16 | 1460.0 | 347.3 | 202 | 0.316 | 0.053 | 107.6 |
17 | 1478.3 | 313.5 | 186 | 0.211 | 0.044 | 102.3 |