List of symbols
1.
Introduction
There exist lots of nonsmooth optimization problems in fields of applications, for example, in economics [1] and mechanical engineering [2]. Sometimes, nonsmooth optimization problems often have several objectives. During the last three decades, rapid development has been characteristic to the areas of nonsmooth and multiobjective optimization [3,4,5,6]. Multiobjective optimization (also known as multicriteria optimization) is an area of multiple criteria decision making, which is concerned with mathematical optimization problems involving more than one objective function to be optimized simultaneously. Multiobjective optimization has been applied in many fields of science, including engineering, economics and logistics, where optimal decisions need to be taken in the presence of trade-offs between two or more conflicting objectives. However, the methods and considerations for nonsmooth multiobjective optimization are much fewer, and there exists an increasing demand to solve efficiently optimization problems with several nonsmooth objective functions.
Quantum computing and quantum computers play an important role in optimization problems [7,8]. Han, K. H., and J. H. Kim [7] once proposed a novel computing method called a genetic quantum algorithm (GQA), which is based on the concept and principles of quantum computing, such as qubits and superposition of states. The effectiveness and the applicability of the GQA are demonstrated by experimental results on the knapsack problem, which is a well-known combinatorial optimization problem.
However, bundle methods are among the successful methods for nonsmooth convex optimization problems [9,10,11]. Little systematic work has been done on extending convex bundle methods to a nonconvex case. Most nonconvex bundle methods [12,13,14,15,16,17,18,19] belong to the proximal type, and they follow the idea of previous dual methods by employing "subgradient locality measures" in order to make the linearization errors nonnegative. The introduction of quadratic subgradient locality measures possesses the drawback that penalty parameters have to be fixed a priori. The redistributed proximal bundle algorithms [20,21] are designed to deal with nonconvex functions. The presented cutting-plane model forms certain local convexification centered at the stability center instead of the entire approximation for the corresponding functions, and the linearization errors are assured to be nonnegative by updating the local convexification parameter.
In some cases, exact information of the objective and constraint functions are expensive and unnecessary. Therefore, the emergence of inexact bundle methods is of necessity. In general, bundle methods which are based on inexact evaluations of the objective and constraint functions are often called inexact or approximate bundle methods. An early work can be found in Kiwiel [22]. Hintermuller [23] deals with inexact subgradients, but the evaluation of the objective function should be exact. Furthermore, uncontrollable inexactness is considered in [24], where they use noise attenuation techniques to cope with inexact oracle derived from a stochastic objective. For the latest unified theory of convex inexact bundle methods, see [25,26,27].
In this paper, I generalize the redistributed proximal bundle method for single objective nonsmooth optimization [28] to nonsmooth nonconvex multiobjective optimization with the help of an improvement function. I employ the approximate objective and constraint function values and the approximate subgradient values to construct a new cutting-plane model for approximating the nonconvex functions. The model is a local convexification model which overcomes the difficulty caused by the nonconvexity, and it is worth pointing out that the errors need not vanish in the limit, which makes the proposed algorithm widely used. I prove that the proposed method is implementable and can obtain a satisfactory convergence result.
The rest of this paper is organized as follows: Section 2 presents some basic concepts and results of nonsmooth and multiobjective optimization theory. Section 3 gives the cutting-plane model of a local convexification of the improvement function and describes the concrete redistributed proximal bundle method for nonsmooth multiobjective optimization. Some satisfactory convergence results are obtained in Section 4. Finally, in Section 5 some conclusions are given.
2.
Preliminaries
Consider a nonsmooth nonconvex multiobjective optimization of the form:
where S={x∈Rn|c(x)≤0}, fi:Rn→R,i=1,2,⋯,h, and c:Rn→R are locally Lipschitz continuous functions. As in most related literature [29], we only consider the constraint function c as a scalar function since, if multiple inequality constraints appear, constraint function c can be defined as the point-wise maximum of finitely many constraint functions, thus covering the case of multiple inequality constraints. Therefore, there is no loss of generality in formulating problem (2.1) with only one constraint.
For a locally Lipschitz continuous function f:Rn→R, the Clarke generalized directional derivative [13] at x∈Rn in the direction d∈Rn is defined by
and the Clarke subdifferential [13] of f at x∈Rn is defined by
which is a nonempty convex and compact subset of Rn.
A function f:Rn→R is fo-pseudoconvex [30] if it is locally Lipschitz continuous, and for all x,y∈Rn,
and it is fo-quasiconvex [30] if
A vector x∗∈Rn is said to be a global Pareto optimum [3] of (2.1) if there does not exist x∈S such that
A vector x∗∈Rn is said to be a global weak Pareto optimum [3] of (2.1) if there does not exist x∈S such that
A vector x∗∈Rn is said to be a local (weak) Pareto optimum [3] of (2.1) if there exists δ>0 such that x∗∈Rn is a global (weak) Pareto optimum on Bδ(x∗)∩S. Trivially, every Pareto optimal point is weakly Pareto optimal.
The contingent cone and polar cone [13] of set S⊂Rn at x∈Rn are defined respectively as
Furthermore, let
For the optimality condition we pose the following constraint qualification:
The following statements are equivalent [31,32]:
(i) f is lower-C1 on S.
(ii) ∀ˉx∈S,∀ε>0,∃ρ>0:∀x∈Bρ(ˉx)andg∈∂f(x), we have
whenever |u|≤ρ and x+u∈Bρ(ˉx).
(iii) ∀ˉx∈S,∀ε>0,∃ρ>0:∀y1,y2∈Bρ(ˉx) and g1∈∂f(y1),g2∈∂f(y2), we have
(iv) f is semismooth and regular on S.
3.
Model construction and redistributed bundle-type algorithm
In this part I generalize the redistributed proximal bundle method for single objective optimization to nonsmooth nonconvex multiobjective optimization. The improvement function is employed to handle nonsmooth multiobjective problems. The introduction of inexactness in the available information makes the proposed algorithm possess extensive applications.
3.1. Improvement function and available information
The improvement function [11] H:Rn×Rn→R is defined by
The next theorem reveals the relationship between the improvement function and problem (2.1), and it provides the theoretical foundation for constructing the concrete algorithm.
Theorem 3.1. [31]A necessary condition for x∗∈Rn to be a global weak Pareto optimum of (2.1) is that
Moreover, if fi is fo-pseudoconvex for all i=1,2,⋯,h, the constraint function c is fo-quasiconvex, and the constraint qualification (2.2) is valid, then the condition (3.1) is sufficient for x∗∈Rn to be a global weak Pareto optimum of (2.1).
Let ˆxk∈Rn be the current stability center, Jki be the index set for the information used for function fi, i=1,2,⋯,h, and Jk be the constraint function c at the kth iteration. We seek the search direction dk∈Rn as a solution to
Suppose that at the kth iteration, besides the current stability center ˆxk∈Rn, we have some auxiliary points xj∈Rn from previous iterations. We have inexact function and subgradient values as follows:
where σj,ˆσk and θj are unknown errors, and error terms σj,ˆσk and θj are assumed to be bounded:
but the error terms themselves and their bounds ˉσ and ˉθ are generally unknown.
3.2. Model construction and direction finding
In order to deal with possible nonconvexity of fi,i=1,2,⋯,h, and c, we follow the redistributed proximal approach [20] and construct a new cutting-plane model for the improvement function. Firstly, the convex piecewise linear model function for fi,i=1,2,⋯,h, and c are given respectively by
where
By adjusting dynamically along iterations, parameters ηki,i=1,2,⋯,h, and ηk are taken sufficiently large to make akij,i=1,2,⋯,h,j∈Jki, and akj,j∈Jk, nonnegative. In our redistributed proximal bundle method, we take
where γ∈R is a small positive constant. Working with inexact information of the objective function and the constraint function, the cutting-plane model for H(ˆxk+d,ˆxk) is presented by
We also define the inexact function value of H(ˆxk+d,ˆxk) by
where ˆfi(ˆxk+d),i=1,2,⋯,h, and ˆc(ˆxk+d) are approximately evaluated according to (3.3), which will be used to discuss the convergence of the proposed algorithm in Section 3.3. Note that for some J∈(∩hi=1Jki)∩Jk we have ˆxk=xJ. Therefore, bkiJ=bkJ=0, ekiJ=ekJ=0, so akiJ=akJ=0. Hence,
Obviously, we have
The new search direction dk∈Rn is given by solving the proximal point subproblem
where 0<tk∈R is an inverse proximal parameter. From the optimality condition of the subproblem above, we obtain
where xk+1=ˆxk+dk. Since the model (3.11) is piecewise linear, there exist simplicial multipliers
such that
Once the new iterate is known, we define the aggregate linearization
that is,
Thus, we have
By the subgradient inequality, it holds that
The aggregate error is defined by
and it has the following equivalent expression:
Similarly, for the aggregate linearization, it holds that
3.3. A redistributed bundle-type algorithm
In order to check whether the new iterate provides sufficient decrease or not, the predicted decrease is defined by
It follows from (3.23) that
The reason we use δk to stop the algorithm will be clarified by the relations in Theorem 4.1; it has the following equivalent expression:
Our assumptions on defining the next model function ˆHk+1 are standard:
A Redistributed Bundle-Type Algorithm (RBTA):
Step 0 (Initialization) Choose an initial point x1∈S, compute (f1i,g1i) and (c1,g1), and select m∈(0,1), γ>0, and a stopping tolerance tol≥0. Choose parameter t1>0. Set the iteration counter k=1, the index set J1i=J1={1}, ˆf1i:=f1i,ˆc1:=c1,ˆx1:=x1,i=1,2,⋯,h.
Step 1 (Model Construction and Trial Point Finding) Given the model ˆHk defined by (3.11), compute dk by solving (3.15), and define the associated Gk,Ek and δk by (3.18), (3.23) and (3.25), respectively. Set xk+1=ˆxk+dk. If δk≤tol, stop.
Step 2 (Descent Test) Compute (fk+1i,gk+1i), i=1,2,⋯,h, and (ck+1,gk+1). If
then declare a null step and go to Step 3. Otherwise, declare a serious step and set ˆxk+1=xk+1,ˆfk+1=fk+1,ˆck+1=ck+1. Select tk+1>0 and go to Step 4.
Step 3 (Null Step) Set ˆxk+1=ˆxk,ˆfk+1=ˆfk,ˆck+1=ˆck, and select tk>tk+1>0.
Step 4 (Bundle Update and Loop) Select the bundle index set Jk+1i,Jk+1,i=1,2,⋯,h, keeping the active elements. Select ηk+1i,ηk+1 as in (3.9) and (3.10), and update the model ˆHk+1 as needed. Increase k by 1 and go to Step 1.
4.
Convergence analysis
In this section I adapt the usual rationale of convergence proofs of bundle methods by considering two cases of infinitely many serious steps and finitely many serious steps followed by infinitely many null steps. It is shown that in both cases, the approximate stationarity condition holds under reasonable conditions.
Lemma 4.1. Suppose the set {∪hi=1{j∈Jki|αkij>0}∪{j∈Jk|αkj>0}} is uniformly bounded in k. If Ek→0 as k→∞, and {ˆck}→0 as k→∞, then
(i) ∑hi=1∑j∈Jkiαkij|xj−ˆxk|+∑j∈Jkαkj|xj−ˆxk|→0 as k→∞.
If, in addition, for some subset K⊂{1,2,⋯,}, ˆxk→ˉx,Gk→ˉG as K∋k→∞ with the set {∪hi=1{ηki|k∈K}}∪{ηk|k∈K} bounded, then we have
(ii) ˉG∈∂H(ˉx,ˉx)+2Bˉθ(0).
If, in addition, Gk→0 as K∋k→∞, then
(iii) ˉx∈Rn satisfies the following approximate stationarity condition:
Finally, if in addition, fi,i=1,2,⋯,h, and c are lower−C1, then for each ε>0, there exists ρ>0 such that
(iv)
Proof. (i) According to the ways of choosing η1 and η2, we have
Furthermore, it follows from Ek→0 that
Thus, we obtain αkij|xj−ˆxk|→0 as k→∞ for all i=1,2,⋯,h, j∈Jki. Similarly, αkj|xj−ˆxk|→0 as k→∞ for all j∈Jk. By the assumption, the sum in item (i) is over a finite set of indices, and each element in the sum tends to zero, so the assertion (i) holds.
(ii) For each j and i=1,2,⋯,h, choose pji to be the orthogonal projection of gji onto ∂fi(xj) such that |gji−pji|≤θj≤ˉθ and pj to be the orthogonal projection of gj onto ∂c(xj) such that |gj−pj|≤θj≤ˉθ. By (3.7), (3.8) and (3.18), we have
Passing onto a further subsequence in the set K if necessary, assumptions ˆxk→ˉx,Gk→ˉG as K∋k→∞ with the set {∪hi=1{ηki|k∈K}}∪{ηk|k∈K} bounded and outer semicontinuity of the Clarke subdifferential imply that
Since the second and the fifth terms in (4.1) are both in Bˉθ(0), the third and the sixth terms tend to zero by item (i), and the assertion of item (ii) follows.
(iii) The conclusion of item (iii) can be obtained easily from item (ii) if we take ˉG=0.
(iv) Because fi,i=1,2,⋯,h, and c are lower−C1, by the equivalent statement of lower−C1 functions in Section 2 and (3.3), (3.7), (3.8) and the nonnegativity of bkij and bkj, fixing any ε>0, there exists ρ>0 such that for any y∈Bρ(xj), we have
Taking convex combinations in (4.2) and (4.3) using the simplicial multipliers in (3.17), and using (3.24), we have
Passing onto the limit if necessary, using item (i) and ˉG=0 again, we obtain that
where we employ the relation max{A,B}≥λA+(1−λ)B for any λ∈[0,1] and the definition of the improvement function H(x,y).
Theorem 4.1. Suppose the RBTA generates an infinite number of bounded serious steps. Then, δk→0 as k→∞. Suppose the sequences {ηki},i=1,2,⋯,h, and {ηk} are bounded in k.
(i) If ∑∞k=1tk=+∞, then Ek→0 as k→∞, and there exist K⊂{1,2,⋯} and ˉx such that ˆxk→ˉx,Gk→0 as K∋k→∞. In particular, if the set {∪hi=1{j∈Jki|αkij>0}}∪{j∈Jk|αkj>0} is uniformly bounded in k, then the conclusions of Lemma 4.1 hold.
(ii) If lim infk→∞tk>0, then these assertions hold for all accumulation points ˉx of {ˆxk}.
Proof. If we take a serious step at the kth iteration, we have
and since δk≥0, the sequence {ˆck} is nonincreasing. By the boundedness of ˆxk and ˆσk and the Lipschitz continuity of c, {c(ˆxk)−ˆσk} is bounded below, i.e., {ˆck} is bounded below, and we conclude that it converges. It follows from (4.5) that
As a result, δk→0 as k→∞. Since δk=Ek+tk|Gk|2, it also holds that
If ∑∞k=1tk=+∞, but for some β>0, Gk≥β for all k, then (4.6) results in a contradiction. Hence, there exists an index set K⊂{1,2,⋯} such that
Furthermore, we can take a subsequence if necessary and assume ˆxk→ˉx. Item (i) is proved. If the set {∪hi=1{j∈Jki|αkij>0}}∪{j∈Jk|αkj>0} is uniformly bounded in k, then, from Ek→0 as k→∞, the conclusions of Lemma 4.1 hold.
If lim infk→∞tk>0, then the second relation in (4.7) implies that (4.8) holds for K={1,2,⋯}, and thus the same assertions hold for all accumulation points of {ˆxk}.
The next simple relation we present is crucial for proving the convergence of the proposed algorithm under the case that a finite number of serious steps occurs. Suppose the stability center is ˆxk=ˆx for all k>ˉk, where ˉk is some positive integer number. It follows from (3.8) that
Theorem 4.2. Suppose that a finite number of serious steps is followed by infinite null steps. Let k be large enough that ˆxk=ˆx for k≥ˉk. Let the sequences {ηki},i=1,2,⋯,h, and {ηk} be bounded in k and lim infk→∞tk>0. Then, ˆxk→ˆx,δk→0,Ek→0 as k→∞, and there exists K⊂{1,2,⋯} such that Gk→0 as K∋k→∞. In particular, if the set {∪hi=1{j∈Jki|αkij>0}}∪{j∈Jk|αkj>0} is uniformly bounded in k, the conclusions of Lemma 4.1 hold for ˉx=ˆx.
Proof. Since ˆxk=ˆx for k>ˉk, we have ˆfk=ˆf,ˆck=ˆc. Define the optimal value of subproblem (3.15)
By (3.19) we obtain that
so the sequence {ψk} is bounded. Since
the sequence {ψk} is increasing; therefore, it converges. Taking into account that tk≤tˉk, it follows that
By the definition of δk and the equivalent expression of Ek, we have that
therefore,
According to the assumption (3.27),
Since ˆc=ˆck+1, we add (4.9) to (4.16), and then we have
Note that ˜H(ˆx)=max{0,ˆc}, and combining this relation with (4.15) yields
The rest of the proof is very similar to Theorem 7 in [26], so we omit it.
Remark: According to Theorem 3.1, under the conditions that the objective functions are f0-pseudoconvex, the constraint function is f0-quasiconvex, and the constraint qualification (2.2) is valid, if x∗ satisfies x∗=argminx∈RnH(x,x∗), then x∗ is a global weak Pareto optimum of problem (2.1). Lemma 4.1 assures that under mild conditions x∗, the cluster point of stability centers, satisfies the following approximate stationary condition: 0∈∂H(ˉx,ˉx)+2Bˉθ(0), i.e., x∗ is the approximate global weak Pareto optimum of problem (2.1). Regardless of whether the proposed RBTA generates an infinite number of bounded serious steps or a finite number of serious steps followed by infinite null steps, the conclusions of Lemma 4.1 hold; in other words, the approximate global weak Pareto optimum of problem (2.1) can be obtained.
5.
Conclusions
I construct a new cutting-plane model for approximating the nonconvex functions in multiobjective optimization and develop a new redistributed proximal bundle algorithm. First and foremost, the algorithm based on the new model generates approximate proximal points, computed using a variation of the algorithm presented in [20], in which proximal points of a special cutting-plane model are used to compute increasingly accurate approximations to the exact proximal points, and I generalize the unconstrained optimization to the constrained case with a Lipschitz constrained function. At the same time, the local convexification model gives new insight on the first-order models from [33]. Secondly, for multiobjective optimization with nonsmooth nonconvex functions, the multiple objective functions are treated individually by employing the improvement function without employing any scalarization, which is the conventional technique and can be found in [34]. Similar multiobjective optimization problems were once studied in [35,36], which introduces an optimization strategy for cutting-plane methods to cope with multiobjective problems without any scalarization procedure, but the presented methods there employ the exact information of the objective and constrained functions. When compared with the results obtained in [34], even though under some generalized convexity assumptions it can be proved to find a globally weak Pareto optimal solution, it requires evaluation of the exact function values without any errors, which will limit the wide applications of the proposed algorithm. Note that in some cases computing the exact function value is not easy, for instance, the Lagrangian relaxation problem: If f is a max-type function of the form f(y)=sup{Fz(y)|z∈Z}, where each Fz(y) is convex, and Z is an infinite set, then it may be impossible to calculate f(y) since f itself is defined by a minimization problem involving another function F. The assumptions for using approximate subgradients and approximate values of the function are realistic. Also note that the result of convergence is indeed weaker than what can be obtained by other methods [11,15,24]. It is quite natural, as the convex case takes advantage of the corresponding tools (like the subgradient inequality), which are not available in our nonconvex setting.
Conflict of interest
There is no conflict of interest.