I completed this almost within one day…..so this can only be used as a reference. I cannot guarantee all of below are right. Email me if any question.

## Notations:

- $f(x)$: objective function
- $X$: feasible region
- $h_i(x)$: equality constraints
- $g_i(x)$: inequality constraints

then for a optimization problem, the general formulation is:

$$z=\min_{x\in X}f(x)$$

subject to:

$$

\left\{

\begin{aligned}

h_i(x) = 0 \\

g_i(x)\le 0

\end{aligned}

\right.

$$

## Formulate the Optimization Problem

Three main steps to formulate the optimization problem:

- Translate the problem: A optimization are composed of objective, decision variables, parameters and constraints. Think what is the decision variables, objective and constraints before writting down the mathematical expression;
- Mathematical expression of the problem;
- Fine-tuning.

General principles when fine-tuning,

- Minimize the number of decision variables and constraints.
- Try to maintain linearity or convexity.
- Avoid introducing binary or integer variables.
- Pay attention to the units in which quantities are measured

### Fine-tuning Examples 1: Get Rid of “if…else…”

There are two decision variables, $x_1$ and $x_2$, if we add a constraint like this:

If the $x_1$ is greater than $a_1$, then the $x_2$ must be greater than $a_2$.

One feasible method is to split the optimization into two sub-problems. However, this is too complicated. The alternative to this method is introducing an extra binary variable $\delta \in \{0,1\}$ where $1$ and $0$ indicating the condition is true or false correspondingly.

The the “if…else…” condition can be converted to:

$$

\left\{

\begin{array}{c}

a_1\delta \le x_1 \le a_1 + M\delta\\

a_2 \delta \le x_2 \\

\delta \in \{0, 1\}

\end{array}

\right.

$$

### Fine-tuning Examples 2:

When the optimization is like this:

$$z=\min_x (\max (f(x),g(x),h(x)))$$

this optimization can be understood as optimizaing the upper coutour of the three functions and it is not a linear problem. This can be replaced by the linear problem by introducing an extra function $p(x)$:

$$p(x) = \min(f(x),g(x),h(x)) $$ $$z = \min p(x)$$ subject to: $$p(x)\ge f(x),g(x),h(x)$$

## Finding the Solutions

I think this is the most theoretical part of this course and it is very interesting.

### Convexity

Definition of convex set:

$\forall x,y \in X$ and $\forall \lambda \in[0,1]$, the $X$ is convex if

$$\lambda x+ (1-\lambda)y\in X$$

Using the concept, we can easily prove that the intersection of two convex set is also convex.

Definition of convex function:

$\forall x,y \in X$ which is a convex set, and $\forall \lambda \in [0,1]$:

$$f(\lambda x_1+(1-\lambda)x_2)\le \lambda f(x_1)+(1-\lambda)f(x_2)$$

A function is convex if the Hessian matrix of the function is semi-positive definite, the Hessian matrix is defined as:

$$

\nabla^2 f(x_1,…x_n) =

\left(

\begin{matrix}

\frac{\partial^2f}{\partial x_1^2} & \frac{\partial^2f}{\partial x_1 x_2} & … & \frac{\partial^2f}{\partial x_1 x_4} \\

\frac{\partial^2f}{\partial x_2 x_1} & \frac{\partial^2f}{\partial x_2^2} & … & \frac{\partial^2f}{\partial x_1 x_4} \\

… & … & … & … \\

\frac{\partial^2f}{\partial x_n x_1} & \frac{\partial^2f}{\partial x_nx_2} & … & \frac{\partial^2f}{\partial x_n^2}

\end{matrix}

\right)

$$

The Hessian matrix is symmetirc (when the $f(x)\in C^2$). For a symmetric matrix, it is postive definite if and only if all the eigenvalues are postive. (semi-definite <-> non-negative)

Relationship between convex function and convex set:

For the inequality constraint $g(x)\le 0$, if $g(x)$ is convex, then the feasible set given by the IEC is also convex, here is the proof:

Since $f(x)$ is a convex function, then $\forall x_1, x_2 \in g(x), \forall \lambda \in [0,1]$

$$g(\lambda x_1+(1-\lambda x_2))\le \lambda g(x_1) + (1-\lambda )g(x_2)\le 0$$

Another property of convex function is that the increased value from a specific point will be always no less than the increased value given by the first order derivative:

$$f(y)-f(x)\ge \nabla f(x)^T (y-x),\forall x,y \in X$$

### Extreme Values

Definition of global minimum:

$$f(x^*)\le f(x), \forall x\in X$$

Definition of local minimum:

$$f(x^{*})\le f(x),\forall x\in X, ||x-x^*||<\epsilon$$

It is hard to find the local/global minimum usint the definitions, so here we explore the properties of the local/global minimum to find the alternative conditions.

For a local minimum ($X$ is a convex set):

$$\nabla f(x^{*})^T (x-x^*)\ge 0, \forall x \in X$$

The most straight-forward understanding is that all the feasible directions are gradient non-descent directions. This is the necessary condition for a local minimum and it is for both constrained and unconstrained constraints condition. If the problem is unconstrained, then all the directions are feasible and it will turn to:

$$\nabla f(x^*)=0$$

If $f(x)$ is a convex function and $X$ is a convex set, then $x^*$ will become the global minimum point. Here is the proof:

$f(x)$ is the convex function, so we have:

$$f(x)-f(x^*)\ge \nabla f(x^*)(x-x^*)\ge 0$$

### Unconstrained Optimization

$$z=\min_{\forall x \in X}f(x)$$

For the local minimum point,

- Necessary condition: $\nabla f(x^*)=0$ and $\nabla^2f(x^*)$ is positive semi-definite at $x^*$.
- Sufficient condition: $\nabla f(x^*)=0$ and $\nabla^2f(x^*)$ is positive definite at $x^*$.

Generally procedure:

- Find the staionary point, $\nabla f(x^*)=0$
- Examine the sufficient condition, $\nabla^2f(x^*) \succ 0$

#### Newton’s Method

Newton’s method is to numerically calculate the stationary point of the function. For the one dimension optimization,

$$x^{n+1}=x^n-\frac{f’(x^n)}{f’’(x^n)}$$

The prcedure for the Newton’s method:

- Select $x^1$ and set $k=1$
- If $||\nabla f(x^k)||=0$, stop otherwise go the next step
- Update the $x^{k+1}$

$$x^{k+1}=x^{k}-[\nabla f(x^k)]^{-1}\nabla f(x^k)$$

This method can only used in the condition when the inverse of Hessian exists.

#### Iterative Descent Method

Framework:

- Start at an initial point $x_0\in X,k=0$
- Choose a descent direction $d_k$, which follows $\nabla f(x_k)^Td_k<0$
- Choose a step size $\alpha_k$
- Update the solution $x_{k+1}=x_k+\alpha\cdot d_k$
- Stop if the criterion is met otherwise repeat step 2-4

#### Steepest Descent Method

It is one of the iterative descent method:

$$

\left\{

\begin{array}{c}

d_k = -\nabla f(x_k) \\

\alpha_k = \arg \min_{\alpha_k}f(x_k+\alpha_k d_k)

\end{array}

\right.

$$

### Constrained Optimization

$$z=\min_{x\in X}f(x)$$

subject to:

$$

\left\{

\begin{aligned}

h_i(x) = 0 \\

g_i(x)\le 0

\end{aligned}

\right.

$$

KKT condition:

$$

\left\{

\begin{array}{c}

\nabla f(x^*)+\sum_{i=1}^m\lambda_i^*\nabla h_i(x^*)+\sum_{j=1}^r\mu_j^*\nabla g_j(x^*)=0\quad (Gradient) \\

h_i(x^*)=0,\forall i\in {1,2…m} \quad (Feasibility)\\

g_j(x^*)\le 0, \forall j\in {1,2,…r} \quad (Feasibility)\\

\mu_j^* \ge 0, \forall j\in {1,2,…r} \quad (Non-negativity) \\

\mu_j^* g_j(x^*)=0,\forall j \in {1,2,…r} \quad (Complementary Slackness Condition)

\end{array}

\right.

$$

Here is a brief explanation of the KKT condition:

Firstly, when we define $d$ as feasible direction, it can be understood as:

$$

\left\{

\begin{array}{c}

d=x-x^* =[dx_1, dx_2,…dx_n]^T,||x-x^*||<\epsilon\rightarrow 0 \\

x^*+d\in X\rightarrow h(x^*+d)=0

\end{array}

\right.

$$

for all the feasible directions:

$$h(x^*+d)-h(x^*)=\nabla h(x^*)^T d=0$$

if there is no inequality constraints, the local minimum $x^*$ has:

$$\nabla f(x^*)^T d=0$$

Combine the two equations together, we have:

$$(\nabla (\lambda_1 h(x^*)+\lambda_2f(x^*))^Td=0\rightarrow \nabla f(x^*) + \lambda \nabla h(x)=0$$

things become a little different after adding the inequality constraints. For a inequality constraints $g(x)\le 0$, if it is binding, then $g(x)=0\rightarrow \mu g(x)=0$. However, if it is not binding, which means that $g(x)<0$. Under this condition, all the directions are feasible direction so that the gradients of $f(x)$ should be zero. Therefore, if the inequality constraints are not binding, then $\mu=0\rightarrow \mu g(x)=0$.

For the non-negativity condition:

$$

\begin{array}{c}

(\nabla f(x^*) +\sum \lambda \nabla h(x^*) +\sum \nabla \mu g(x^*))^Td=0 \\

\Rightarrow \nabla f(x^*)^Td + \sum \mu \nabla g(x^*)^T d =0 \\

\Rightarrow … \Rightarrow \mu \nabla g(x^*)^Td\le 0\Rightarrow \mu \ge 0

\end{array}

$$

Therefore, $\mu_i\ge 0$ is to constrain the point to a minimum point. In other word, if there is any binding inequality constraint, the KKT point can only be the minimum point.

From the above we have roughly proved that:

$x$ is a local minimum $\Rightarrow$ $x$ is a KKT point

which means that KKT point is only the necessary condition for a local minimum point. However, if the function $f(x)$ is a convex function at the convex feasible region $X$, then the KKT condition will be the sufficient condition, the local minimum will also be the global minimum point (we have proven this before ^_^).

In the lecture, the sufficient conditions given by Prof. Yin is “linear $h$, convex $g$, convex $f$ $\Rightarrow$ global minimum”. I think the reason here is that:

linear $h\Rightarrow$ $X_h$ convex, convex $g\Rightarrow$ $X_g$ convex, Then $X=X_h\cap X_g$ is also convex because the intersection between two convex set is also a convex set.

Obviously, this condition is far too sufficient.

For more about the course review of CEE553, see CEE 553 Final Term Review, which is the conituned part of this post.