Abstract
McCormick’s classical relaxation technique constructs closed-form convex and concave relaxations of compositions of simple intrinsic functions. These relaxations have several properties which make them useful for lower bounding problems in global optimization: they can be evaluated automatically, accurately, and computationally inexpensively, and they converge rapidly to the relaxed function as the underlying domain is reduced in size. They may also be adapted to yield relaxations of certain implicit functions and differential equation solutions. However, McCormick’s relaxations may be nonsmooth, and this nonsmoothness can create theoretical and computational obstacles when relaxations are to be deployed. This article presents a continuously differentiable variant of McCormick’s original relaxations in the multivariate McCormick framework of Tsoukalas and Mitsos. Gradients of the new differentiable relaxations may be computed efficiently using the standard forward or reverse modes of automatic differentiation. Extensions to differentiable relaxations of implicit functions and solutions of parametric ordinary differential equations are discussed. A C++ implementation based on the library MC++ is described and applied to a case study in nonsmooth nonconvex optimization.
Similar content being viewed by others
Change history
References
Achterberg, T.: SCIP: solving constraint integer programs. Math. Program. Comput. 1, 1–41 (2009)
Achterberg, T., Berthold, T., Koch, T., Wolter, K.: Constraint integer programming: a new approach to integrate CP and MIP. In: Proceedings of the Fifth International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, pp. 6–20. Paris (2008)
Adjiman, C.S., Dallwig, S., Floudas, C.A., Neumaier, A.: A global optimization method, \(\alpha \)BB, for general twice-differentiable constrained NLPs—I. Theoretical advances. Comput. Chem. Eng. 22, 1137–1158 (1998)
Alefeld, G., Mayer, G.: Interval analysis: theory and applications. J. Comput. Appl. Math. 121, 421–464 (2000)
Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 3rd edn. Wiley, Hoboken (2006)
Beckers, M., Mosenkis, V., Naumann, U.: Adjoint mode computation of subgradients for McCormick relaxations. In: Forth, S., Hovland, P., Phipps, E., Utke, J., Walther, A. (eds.) Recent Advances in Algorithmic Differentiation, pp. 103–113. Springer, Berlin (2012)
Belotti, P.: COUENNE: A user’s manual. https://projects.coin-or.org/Couenne (2006)
Bertsekas, D.P.: Nondifferentiable optimization via approximation. In: Balinski, M., Wolfe, P. (eds.) Mathematical Programming Study 3, pp. 1–25. North-Holland Publishing Company, Amsterdam (1975)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
Bompadre, A., Mitsos, A.: Convergence rate of McCormick relaxations. J. Glob. Optim. 52, 1–28 (2012)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Broyden, C.G., Dennis Jr., J.E., Moré, J.J.: On the local and superlinear convergence of quasi-Newton methods. J. Inst. Math. Appl. 12, 223–245 (1973)
Chachuat, B.: MC++: a toolkit for bounding factorable functions, v1.0. Retrieved 2 July 2014 https://projects.coin-or.org/MCpp (2014)
Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM, Philadelphia (1990)
Coddington, E.A., Levinson, N.: Theory of Ordinary Differential Equations. McGraw Hill Co., Inc., New York (1955)
Du, K., Kearfott, R.B.: The cluster problem in multivariate global optimization. J. Glob. Optim. 5, 253–265 (1994)
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. 2. Springer, New York (2003)
Feehery, W.F., Tolsma, J.E., Barton, P.I.: Efficient sensitivity analysis of large-scale differential-algebraic systems. Appl. Numer. Math. 25, 41–54 (1997)
Gabriel, S.A., Moré, J.J.: Smoothing of Mixed Complementarity Problems. Preprint MCS-P541-0995, Argonne National Laboratory (1995)
Gill, P.E., Murray, W., Saunders, M.A.: SNOPT: an SQP algorithm for large-scale constrained optimization. SIAM Rev. 47(1), 99–131 (2002)
Griewank, A., Rabier, P.J.: On the smoothness of convex envelopes. Trans. Am. Math. Soc. 322, 691–709 (1990)
Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Other Titles in Applied Mathematics, 2nd edn. SIAM, Philadelphia (2008)
Grossmann, I.E., Yeomans, H., Kravanja, Z.: A rigorous disjunctive optimization model for simultaneous flowsheet optimization and heat integration. Comput. Chem. Eng. 22(98), 157–164 (1998)
Hartman, P.: Ordinary Differential Equations, 2nd edn. SIAM, Philadelphia (2002)
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I: Fundamentals. A Series of Comprehensive Studies in Mathematics. Springer, Berlin (1993)
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods. A Series of Comprehensive Studies in Mathematics. Springer, Berlin (1993)
Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches, 2nd edn. Springer, Berlin (1993)
Kesavan, P., Allgor, R.J., Gatzke, E.P., Barton, P.I.: Outer approximation algorithms for separable nonconvex mixed-integer nonlinear programs. Math. Program. Ser. A 100, 517–535 (2004)
Khan, K.A.: Sensitivity analysis for nonsmooth dynamic systems. Ph.D. thesis, Massachusetts Institute of Technology (2015)
Kiwiel, K.C.: Methods of Descent for Nondifferentiable Optimization. Lecture Notes in Mathematics. Springer, Berlin (1985)
Lemaréchal, C., Strodiot, J.J., Bihain, A.: On a bundle algorithm for nonsmooth optimization. In: Mangasarian, O.L., Meyer, R.R., Robinson, S.M. (eds.) Nonlinear Programming 4. Academic Press, New York (1981)
Li, X., Tomasgard, A., Barton, P.I.: Nonconvex generalized Benders decomposition for stochastic separable mixed-integer nonlinear programs. J. Optim. Theory Appl. 151, 425–454 (2011)
Liberti, L., Pantelides, C.C.: Convex envelopes of monomials of odd degree. J. Glob. Optim. 25, 157–168 (2003)
Lin, Y., Schrage, L.: The global solver in the LINDO API. Optim. Methods Softw. 24, 657–668 (2009)
Mäkelä, M.M.: Multiobjective proximal bundle method for nonconvex nonsmooth optimization: Fortran subroutine MPBNGC 2.0. Reports of the Department of Mathematical Information Technology, Series B, Scientific computing B 13/2003, University of Jyväskylä (2003)
Maly, T., Petzold, L.R.: Numerical methods and software for sensitivity analysis of differential-algebraic systems. Appl. Numer. Math. 20, 57–79 (1996)
Mangasarian, O.L.: A simple characterization of solution sets of convex programs. Oper. Res. Lett. 7(1), 21–26 (1988)
McCormick, G.P.: Computability of global solutions to factorable nonconvex programs: Part I—convex underestimating problems. Math. Program. 10, 147–175 (1976)
Misener, R., Floudas, C.A.: ANTIGONE: algorithms for continuous/integer global optimization of nonlinear equations. J. Glob. Optim. 59, 503–526 (2014)
Mitsos, A., Chachuat, B., Barton, P.I.: McCormick-based relaxations of algorithms. SIAM J. Optim. 20, 573–601 (2009)
Moore, R.E.: Methods and Applications of Interval Analysis. SIAM, Philadelphia (1979)
Najman, J., Mitsos, A.: Convergence analysis of multivariate McCormick relaxations. J. Glob. Optim. in press (2016)
Naumann, U.: The Art of Differentiating Computer Programs. SIAM, Philadelphia (2012)
Neumaier, A.: Interval Methods for Systems of Equations. Cambridge University Press, Cambridge (1990)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer series in operations research and financial engineering, 2nd edn. Springer, New York (2006)
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Classics in Applied Mathematics. SIAM, Philadelphia (2000)
Qi, L., Sun, D.: Smoothing functions and smoothing Newton method for complementarity and variational inequality problems. J. Optim. Theory Appl. 113, 121–147 (2002)
Rockafellar, R.T.: Convex Analysis. Princeton Landmarks in Mathematics and Physics. Princeton University Press, Princeton (1970)
Ryoo, H.S., Sahinidis, N.V.: Global optimization of nonconvex NLPs and MINLPs with applications in process design. Comput. Chem. Eng. 19(5), 551–566 (1995)
Sahinidis, N.V.: BARON: a general purpose global optimization software package. J. Glob. Optim. 8, 201–205 (1996)
Sahinidis, N.V.: BARON 15.9: Global Optimization of Mixed-Integer Nonlinear Programs, User’s Manual. https://www.gams.com/help/topic/gams.doc/solvers/baron/index.html (2015)
Schaber, S.D.: Tools for dynamic model development. Ph.D. thesis, Massachusetts Institute of Technology (2014)
Scholz, D.: Theoretical rate of convergence for interval inclusion functions. J. Glob. Optim. 53, 749–767 (2012)
Scott, J.K.: Reachability analysis and deterministic global optimization of differential-algebraic systems. Ph.D. thesis, Massachusetts Institute of Technology (2012)
Scott, J.K., Barton, P.I.: Convex and concave relaxations for the parametric solutions of semi-explicit index-one differential-algebraic equations. J. Optim. Theory Appl. 156, 617–649 (2013)
Scott, J.K., Barton, P.I.: Improved relaxations for the parametric solutions of ODEs using differential inequalities. J. Glob. Optim. 57, 143–176 (2013)
Scott, J.K., Chachuat, B., Barton, P.I.: Nonlinear convex and concave relaxations for the solutions of parametric ODEs. Optim. Control Appl. Methods 34, 145–163 (2013)
Scott, J.K., Stuber, M.D., Barton, P.I.: Generalized McCormick relaxations. J. Glob. Optim. 51, 569–606 (2011)
Shor, N.Z.: Minimization Methods for Non-differentiable Functions. Springer series in computational mathematics. Springer, Berlin (1985)
Stuber, M.D., Scott, J.K., Barton, P.I.: Convex and concave relaxations of implicit functions. Optim. Methods Softw. 30(3), 424–460 (2015)
Tawarmalani, M., Sahinidis, N.V.: Convexification and Global Optimization in Continuous and Mixed-Integer Nonlinear Programming: Theory, Algorithms, Software, and Applications. Nonconvex Optimization and Its Applications. Springer, Dordrecht (2002)
Tawarmalani, M., Sahinidis, N.V.: Global optimization of mixed-integer nonlinear programs: a theoretical and computational study. Math. Program. A 99, 563–591 (2004)
Tsoukalas, A., Mitsos, A.: Multivariate McCormick relaxations. J. Glob. Optim. 59, 633–662 (2014)
Watson, H.A.J., Khan, K.A., Barton, P.I.: Multistream heat exchanger modeling and design. AIChE J. 61(10), 3390–3403 (2015)
Wechsung, A.: Global optimization in reduced space. Ph.D. thesis, Massachusetts Institute of Technology (2014)
Wechsung, A., Aspelund, A., Gundersen, T., Barton, P.I.: Synthesis of heat exchanger networks at subambient conditions with compression and expansion of process streams. AIChE J. 57(8), 2090–2108 (2011)
Wechsung, A., Schaber, S.D., Barton, P.I.: The cluster problem revisited. J. Glob. Optim. 58, 429–438 (2014)
Wechsung, A., Scott, J.K., Watson, H.A.J., Barton, P.I.: Reverse propagation of McCormick relaxations. J. Glob. Optim. 63(1), 1–36 (2015)
Whitney, H.: Analytic extensions of differentiable functions defined in closed sets. Trans. Amer. Math. Soc. 36, 63–89 (1934)
Acknowledgments
The authors would like to thank Achim Wechsung and Spencer Schaber for several helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
This material was supported by Novartis Pharmaceuticals as part of the Novartis-MIT Center for Continuous Manufacturing, was also supported by Statoil, and was based in part on work supported by the US Department of Energy, Office of Science, under contract DE-AC02-06CH11357.
A correction to this article is available online at https://doi.org/10.1007/s10898-017-0601-2.
Appendices
Appendices
Proofs of results
1.1 Proof of Proposition 3
This proof proceeds by showing that the requirements of [69, Theorem I] are met by f on C. Since the components of f may be considered separately, it suffices to consider the case in which \(m=1\). For each \(x\in C\), assume that \(N_x\) is convex without loss of generality; if this is not true, then redefine \(N_x\) to be an open convex subset containing x. Since C is compact, there exists a finite subset \(I\subset C\) for which \(C\subset \bigcup _{x\in I}N_x\).
Suppose, to obtain a contradiction, that there exist \(x,\xi \in I\) and \(y\in C\) for which \(y\in N_x\cap N_\xi \) but \({\nabla } \phi _x(y)\ne {\nabla } \phi _\xi (y)\). Since C is convex and has nonempty interior, either \(y\in \mathrm {int}(C)\) or \(y\in \mathrm {bd}\left( \mathrm {int}(C)\right) \). In either case, there exists a sequence \(\{z_{(i)}\}_{i\in {\mathbb {N}}}\) in the nonempty open set \(\tilde{N}:=N_x\cap N_\xi \cap \mathrm {int}(C)\) that converges to y. Since \(\phi _x\equiv \phi _\xi \equiv f\) on \(\tilde{N}\), \({\nabla } \phi _x(z_{(i)})={\nabla } \phi _\xi (z_{(i)})\) for each i. The continuity of \({\nabla } \phi _x\) and \({\nabla } \phi _\xi \) on \(\tilde{N}\) then yields \({\nabla } \phi _{x}(y)={\nabla } \phi _\xi (y)\), which contradicts the choices of x, \(\xi \), and y. Thus, there exists a single continuous function \(g:C\rightarrow {\mathbb {R}}\) for which, for each \(x\in I\), \(g\equiv {\nabla } \phi _x\) on \(N_x\cap C\).
To show that f is Whitney-\({\mathscr {C}}^1\) on C, it suffices in light of [69, Theorem I] to show that, for each \(\epsilon >0\), there exists \(\delta _\epsilon >0\) for which
whenever \(x,y\in C\) and \(\Vert y-x\Vert <\delta _\epsilon \). Thus, choose any \(\epsilon >0\). Since g is continuous on the compact set C, g is uniformly continuous on C; there exists \(\tilde{\delta }_\epsilon >0\) for which \(\Vert g(y)-g(x)\Vert <\epsilon \) whenever \(x,y\in C\) satisfy \(\Vert y-x\Vert <\tilde{\delta }_\epsilon \). Now, consider any \(x,y\in C\) with \(\Vert y-x\Vert <\tilde{\delta }_{\epsilon }\); the bound (16) will be shown to hold for x and y.
Define the line segment \(L:=\mathrm {conv}\{x,y\}\). Since L is compact and \(L\subset C\), choose \(J\subset I\) as a set for which \(L\subset \bigcup _{\xi \in J}N_\xi \) but \(L\not \subset (\bigcup _{\xi \in J}N_\xi )\backslash N_\eta \) for each \(\eta \in J\). Using these constructions, choose \(k\in {\mathbb {N}}\), \(0=\lambda _0<\lambda _1<\cdots <\lambda _{k}=1\), and \(\xi _{(1)},\ldots ,\xi _{(k)}\in J\) for which:
-
\(x_{(0)}:=x\in N_{\xi _{(1)}}\),
-
\(x_{(k)}:=y\in N_{\xi _{(k)}}\), and
-
\(x_{(q)}:=\lambda _qx + (1-\lambda _q)y\in N_{\xi _{(q+1)}}\cap N_{\xi _{(q)}}\cap L\), for each \(q\in \{1,2,\ldots ,k-1\}\).
Observe that, for each \(q\in \{1,\ldots ,k\}\), \(x_{(q-1)}\in N_{\xi _{(q)}}\) and \(x_{(q)}\in N_{\xi {(q)}}\). So, the mean-value theorem and the established properties of g yield the following, for some \(y_{(q)}\in \mathrm {conv}\{x_{(q-1)},x_{(q)}\}\subset L\):
Hence,
as required; the final equation above follows from the definitions of each \(x_{(q)}\) and the inequality chain \(0=\lambda _0<\lambda _1<\cdots <\lambda _k=1\). \(\square \)
1.2 Proof of Proposition 4
This proof employs the following intermediate result.
Lemma 1
Consider an interval , a Lipschitz continuous function , and the convex envelope of f on . Then, \(\underline{f}^{\mathrm {C}}(\underline{x})=f(\underline{x})\) and \(\underline{f}^{\mathrm {C}}(\overline{x})=f(\overline{x})\). Moreover, \(\underline{f}^{\mathrm {C}}\) is Lipschitz continuous on , with the same Lipschitz constant as f. Analogous results hold for the concave envelope of f on .
Proof
Only the convex envelope of f will be considered here; a similar argument addresses the concave envelope. The required results are trivial if \(\underline{x}=\overline{x}\), so assume that \(\underline{x}<\overline{x}\). Let \(k_f\) denote a Lipschitz constant for f on . Applying the definition of the convex envelope,
the first inequality above is due to f dominating \(\underline{f}^{\mathrm {C}}\), and the second inequality is due to \(\underline{f}^{\mathrm {C}}\) dominating each convex underestimator of f on . Setting y to \(\underline{x}\) in the above inequality chain yields \(\underline{f}^{\mathrm {C}}(\underline{x})=f(\underline{x})\).
A similar argument yields:
setting y to \(\overline{x}\) yields \(\underline{f}^{\mathrm {C}}(\overline{x})=f(\overline{x})\).
Defining \(D_+\underline{f}^{\mathrm {C}}\) and \(D_-\underline{f}^{\mathrm {C}}\) as the right-derivative and left-derivative of \(\underline{f}^{\mathrm {C}}\) described in [25, Theorem I.4.1.1], it follows from [25, Proposition I.4.1.3] that \(D_+\underline{f}^{\mathrm {C}}(\underline{x})\) and \(D_-\underline{f}^{\mathrm {C}}(\overline{x})\) both exist, are finite, and satisfy \(D_+\underline{f}^{\mathrm {C}}(\underline{x}) \ge -k_f\), and \(D_-\underline{f}^{\mathrm {C}}(\overline{x}) \le k_f\). Thus, \(\underline{f}^{\mathrm {C}}\) is continuous at \(\underline{x}\) and \(\overline{x}\). Moreover, [25, Theorem I.4.2.1] implies that for each , each subgradient of \(\underline{f}^{\mathrm {C}}\) at y is an element of \([-k_f,k_f]\). This result, combined with the mean-value theorem [25, Theorem I.4.2.4], shows that \(\underline{f}^{\mathrm {C}}\) is Lipschitz continuous on , with a Lipschitz constant of \(k_f\). \(\square \)
Using the above lemma, Proposition 4 may be proved as follows. Only the convex envelope of f will be considered here; a similar argument addresses the concave envelope. The required result is trivial if \(\underline{x}=\overline{x}\), so assume that \(\underline{x}<\overline{x}\). Theorem 3.2 in [21] implies that \(\underline{f}^{\mathrm {C}}\) is \({\mathscr {C}}^1\) on ; it remains to be shown that \(\underline{f}^{\mathrm {C}}\) is also \({\mathscr {C}}^1\) at \(\underline{x}\) and \(\overline{x}\). Noting that f is Lipschitz continuous on , construct the right-derivative \(D_+\underline{f}^{\mathrm {C}}\) and the left-derivative \(D_-\underline{f}^{\mathrm {C}}\) as in the proof of Lemma 1. As in the proof of Lemma 1, \(D_+\underline{f}^{\mathrm {C}}(\underline{x})\) and \(D_-\underline{f}^{\mathrm {C}}(\overline{x})\) each exist and are finite. Define the following function, which extends the domain of \(\underline{f}^{\mathrm {C}}\) to \({\mathbb {R}}\):
The function \(\psi \) is evidently continuous, and is \({\mathscr {C}}^1\) at each \(y\in {\mathbb {R}}\backslash \{\underline{x},\overline{x}\}\). Applying the definitions of \(D_+\underline{f}^{\mathrm {C}}\) and \(D_-\underline{f}^{\mathrm {C}}\), it follows that \(\psi \) is differentiable at \(\underline{x}\) and \(\overline{x}\) as well; thus,
This equation, together with [25, Theorem I.4.2.1(iii)], shows that \(\psi \) is \({\mathscr {C}}^1\) even at \(\underline{x}\) and \(\overline{x}\), and is therefore \({\mathscr {C}}^1\) on \({\mathbb {R}}\). Hence, \(\underline{u}^{\mathrm {C}}\) is Whitney-\({\mathscr {C}}^1\) on . \(\square \)
1.3 Proof of Theorem 4
The proof of Theorem 4 uses several intermediate results concerning a generic optimal-value function described by the following assumption.
Assumption 3
For some \(m\in {\mathbb {N}}\), consider a convex open set \(X\subset {\mathbb {R}}^m\) and a convex set \(C\subset X\) with nonempty interior. Define sets:
Consider a convex \({\mathscr {C}}^1\) function \(\psi :X\rightarrow {\mathbb {R}}\), and an optimal-value function:
Lemma 2
Suppose that Assumption 3 holds with \(m=2\). Define a function
The function \(\omega \) is convex and \({\mathscr {C}}^1\) on \(\mathrm {int}(H)\times X\).
Proof
For each \(((a,b),c)\in H\times X\), observe that \(\omega ((a,b),c)=\min \{\psi (\xi ):\xi _1=c,\,\,a\le \xi _2\le b\}\). Hence, according to [48, Section 29], \(\omega \) is convex. It then suffices by [48, Corollary 25.5.1] to show that \(\omega \) is differentiable at some arbitrary \(((\hat{a},\hat{b}),\hat{c})\in \mathrm {int}(H)\times X\). Let (CP\({}_\omega \)) denote the convex program \(\min \{\psi (\xi ):\xi _1=\hat{c},\,\,\hat{a}\le \xi _2\le \hat{b}\}\). Weierstrass’s Theorem guarantees the existence of an optimal solution of (CP\({}_\omega \)); thus, choose a particular solution \(\xi ^*\in C\). By [48, Theorem 28.3], there exists a Karush-Kuhn-Tucker (KKT) vector \((\lambda ,\mu )\in {\mathbb {R}}\times {\mathbb {R}}^2\) satisfying the following KKT conditions (among others) for all solutions \(\eta ^*\) of (CP\({}_\omega \)) simultaneously:
Moreover, since \(\hat{a}<\hat{b}\), any such vector \((\lambda ,\mu )\) is unique; when \(\eta ^*:=\xi ^*\), the above KKT conditions imply:
According to [48, Corollary 29.1.3], this uniqueness shows that \(\omega \) is differentiable at \(((\hat{a},\hat{b}),\hat{c})\), as required.
Lemma 3
Suppose that Assumption 3 holds, \(m=2\), C is compact, and there exists a vector \(d\in {\mathbb {R}}^2\) such that, for all \(\xi \in X\), \(\langle {\nabla } \psi (\xi ), d \rangle > 0\). The function \(\gamma \) is Whitney-\({\mathscr {C}}^1\) on the compact set K.
Proof
Without loss of generality, suppose that \(d\ge 0\); the other cases are handled similarly. Observe that K has nonempty interior under Assumption 3; to see this, choose any \(x\in \mathrm {int}(C)\) and let \(e\in {\mathbb {R}}^m\) be a vector whose components are all equal to unity. For sufficiently small \(\tau >0\), \((x,x+\tau e)\in \mathrm {int}(K)\), as required.
By Proposition 3, it then suffices to show that, for some arbitrary \((\ell ,u)\in K\), there exists a neighborhood \(N\in {\mathbb {R}}^2\times {\mathbb {R}}^2\) of \((\ell ,u)\) and a \({\mathscr {C}}^1\) function \(\phi :N\rightarrow {\mathbb {R}}\) for which \(\phi \equiv \gamma \) on \(N\cap K\). Now,
Since \(d\ge 0\), the above inequality implies that \(\frac{\partial {\psi }}{\partial {x_i}}(\ell )> 0\) for some \(i\in \{1,2\}\). Suppose that \(\frac{\partial {\psi }}{\partial {x_1}}(\ell )> 0\); the case in which \(\frac{\partial {\psi }}{\partial {x_2}}(\ell )> 0\) is handled similarly. Since \({\nabla } \psi \) is continuous on X, there exists a neighborhood \(N_\ell \subset X\) of \(\ell \) for which \(\frac{\partial {\psi }}{\partial {x_1}}(a)>0\) for each \(a\in N_\ell \). Since \(N_\ell \) is open and \(C\subset X\) is compact, choose \(\delta >0\) for which:
-
\(y\in X\) whenever \(x\in C\) and \(\Vert y-x\Vert <3\delta \), and
-
\(a\in N_\ell \) whenever \(\Vert a-\ell \Vert <3\delta \).
Define a neighborhood
and a function:
Since \(\ell ,u\in C\) and \(\ell \le u\), if \((a,b)\in N_{(\ell ,u)}\), then \(a_2<u_2+2\delta \) and \(\ell _2-2\delta <b_2\). Thus, \(\phi \) is indeed well-defined. Lemma 2 shows that each “\(\gamma \)” term in the definition of \(\phi \) is \({\mathscr {C}}^1\) with respect to (a, b), which in turn shows that \(\phi \) is \({\mathscr {C}}^1\) on \(N_{(\ell ,u)}\).
To complete this proof, it will be shown that \(\phi \equiv \gamma \) on \(N_{(\ell ,u)}\cap K\). Consider some arbitrary point \((a,b)\in N_{(\ell ,u)}\cap K\), and let (\(\text {CP}_\gamma \)) denote the convex program:
First, it will be shown that the set \(\{\xi \in [a,b]:\xi _1=a_1\text { or }\xi _2=a_2\}\) contains all solutions of (\(\text {CP}_\gamma \)). To obtain a contradiction, suppose that this is not so, and choose some solution \(\eta ^*\) of (\(\text {CP}_\gamma \)) accordingly for which \(a_1<\eta ^*_1\le b_1\) and \(a_2<\eta ^*_2\le b_2\). Since \(d\ge 0\), there exists \(\tau >0\) solving the linear program:
Thus, with \(\zeta :=\eta ^*-\tau d\), \(\zeta \in [a,b]\) and either \(\zeta _1=a_1\) or \(\zeta _2=a_2\). By the mean-value theorem, there exists \(s\in [0,\tau ]\) for which
which contradicts the definition of \(\eta ^*\).
Thus, all solutions of (\(\text {CP}_\gamma \)) lie in the set \(\{\xi \in [a,b]:\xi _1=a_1\text { or }\xi _2=a_2\}\). Now, since the mapping \(t\mapsto \psi (a+te_{(1)})\) is convex on \([0,b_1-a_1]\), the mapping \(t\mapsto \frac{\partial {\psi }}{\partial {x_1}}(a+te_{(1)})\) is increasing on \([0,b_1-a_1]\). This implies:
Hence, there exists a solution of (\(\text {CP}_\gamma \)) in the set \(\{\xi \in [a,b]:\xi _1=a_1\}\), which implies that
The convexity of \(\psi \) then yields:
to see this, assume, to obtain a contradiction, that (19) does not hold. In this case, since \((a,b)\in N_{(\ell ,u)}\) implies that \(b_2<u_2+2\delta \) and \(\ell _2-2\delta <a_2\), the definition of \(\gamma \) implies that the left-hand side of (19) is strictly greater than the right-hand side. Thus, there exist \(\underline{\beta }\in [\ell _2-2\delta ,b_2]\) and \(\overline{\beta }\in [a_2,u_2+2\delta ]\) for which
The above inequalities and the definition of \(\gamma \) imply that neither \(\underline{\beta }\) nor \(\overline{\beta }\) are contained in the interval \([a_2,b_2]\); thus, \(\underline{\beta }\in [\ell _2-2\delta ,a_2]\) and \(\overline{\beta }\in [b_2,u_2+2\delta ]\). Since \([a_2,b_2]\subset [\underline{\beta },\overline{\beta }]\), the convexity of \(\psi \) would then imply that
which is contradicted by the definition of \(\gamma \).
Lastly, the following equation follows from the definition of \(\gamma \):
Along with the definition of \(\phi \), (19) and (20) show that \(\phi (a,b)=\gamma (a,b)\), as required. \(\square \)
Lemma 4
Suppose that Assumption 3 holds, \(m=2\), \(X={\mathbb {R}}^2\), C is compact, and there exists a nonzero vector \(d\in {\mathbb {R}}^2\) such that, for all \(\xi \in {\mathbb {R}}^2\), \(\langle {\nabla } \psi (\xi ), d \rangle = 0\). The function \(\gamma \) is Whitney-\({\mathscr {C}}^1\) on the compact set K.
Proof
Since \(d\ne 0\), either \(d_1\ne 0\), \(d_2\ne 0\), or both. Thus, without loss of generality, suppose that \(d_1\ge 0\) and \(d_2>0\); the other cases are handled similarly. Let \(\pi \) denote the linear transformation \(\xi \in {\mathbb {R}}^2\mapsto (\xi _1-(\frac{d_1}{d_2})\xi _2,0)\in {\mathbb {R}}^2\), and observe that \(\pi (\xi ) = \xi - (\tfrac{\xi _2}{d_2})d\) for each \(\xi \in {\mathbb {R}}^2\). Thus,
which implies that, for each \((\ell ,u)\in K\),
Since the transformation \(\pi \) is linear and \([\ell ,u]\) is convex, the set \(\{\pi (\xi ):\ell \le \xi \le u\}\) is the convex hull of \(\{\pi (\ell _1,\ell _2),\pi (\ell _1,u_2),\pi (u_1,\ell _2),\pi (u_1,u_2)\}\). Since \(d\ge 0\), this set is readily evaluated to be:
The following reformulation of \(\gamma \) is thus obtained:
Since the univariate mapping \(\eta _1\in {\mathbb {R}}\mapsto \psi (\eta _1,0)\) is convex and \({\mathscr {C}}^1\), Theorem 2 implies that \(\gamma \) is Whitney-\({\mathscr {C}}^1\) on K. \(\square \)
Theorem 4 is then proved as follows. The claim of the theorem regarding \(\underline{g}^{\mathrm {C}}\) will be demonstrated; a similar argument yields the claim regarding \(\overline{g}^{\mathrm {C}}\). Define a compact convex set
Lemmas 3 and 4 show that the function: \(\gamma :B\rightarrow {\mathbb {R}}:(\ell ,u)\mapsto \min \left\{ \underline{\phi }^{\mathrm {C}}(\xi ):\ell \le \xi \le u\right\} \) is Whitney-\({\mathscr {C}}^1\). Observing that \(\underline{g}^{\mathrm {C}}\) is equivalent to the mapping
on Z, the chain rule for Whitney-\({\mathscr {C}}^1\) functions applies to this representation of \(\underline{g}^{\mathrm {C}}\). This yields the required result. \(\square \)
Gradients for suggested multivariate relaxations
The following two propositions present gradients for the Whitney-\({\mathscr {C}}^1\) relaxations provided in Theorems 6 and 7. In each case, the provided gradients may be computed directly using the chain rule for Whitney-\({\mathscr {C}}^1\) functions.
Proposition 15
Suppose that the conditions of Theorem 6 hold. Gradients for the relaxations \(\underline{g}^{\mathrm {C}}_\times \) and \(\overline{g}^{\mathrm {C}}_\times \) are as follows, at any \(z\in Z\). Arguments of partial derivatives are suppressed here, and are the same as the analogous function arguments in Theorem 6. The partial derivatives of \(\underline{\psi }_{\times ,\mathrm {A}}\) may be computed at any argument corresponding to the “min” in the definition of \(\underline{g}^{\mathrm {C}}_{\times ,\mathrm {A}}\); this follows from a gradient invariance property of Mangasarian [37].
where the required partial derivatives of \(\underline{\psi }_{\times ,\mathrm {A}}\) and \(\underline{\psi }_{\times ,\mathrm {B}}\) are as follows.
Proposition 16
Suppose that the conditions of Theorem 7 hold. Gradients for the relaxations \(\underline{g}^{\mathrm {C}}_{\max }\) and \(\overline{g}^{\mathrm {C}}_{\max }\) are as follows, at any \(z\in Z\). The required partial derivatives of \(\underline{\psi }_{\max }\) may be evaluated at any argument that yields \(\underline{g}^{\mathrm {C}}_{\max }\) in Theorem 7; this follows from a gradient invariance property of Mangasarian [37].
and the mapping \(\overline{g}^{\mathrm {C}}_{\max }\) has the following gradient, evaluated at any \(z\in Z\):
where \({\Delta }:=\max \{\underline{x}_1,\overline{x}_2\} +\max \{\overline{x}_1,\underline{x}_2\} -\max \{\underline{x}_1,\underline{x}_2\} -\max \{\overline{x}_1,\overline{x}_2\}\), and where the required partial derivatives of \(\underline{\psi }_{\max }\) are as follows.
Rights and permissions
About this article
Cite this article
Khan, K.A., Watson, H.A.J. & Barton, P.I. Differentiable McCormick relaxations. J Glob Optim 67, 687–729 (2017). https://doi.org/10.1007/s10898-016-0440-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-016-0440-6