Lec 13 - Numerical optimization (cont.)

class: center, middle, inverse, title-slide

# Lec 13 - Numerical optimization (cont.)
## Statistical Computing and Computation
### Sta 663 | Spring 2022
### Dr. Colin Rundel

---

exclude: true

```python
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

import timeit

plt.rcParams['figure.dpi'] = 200

from scipy import optimize
```

```r
knitr::opts_chunk$set(
  fig.align="center",
  cache=FALSE
)
```

```r
local({
 hook_err_old <- knitr::knit_hooks$get("error") # save the old hook
 knitr::knit_hooks$set(error = function(x, options) {
 # now do whatever you want to do with x, and pass
 # the new x to the old hook
 x = sub("## \n## Detailed traceback:\n.*$", "", x)
 x = sub("Error in py_call_impl\$.*?\$\\: ", "", x)
 hook_err_old(x, options)
 })
 
 hook_warn_old <- knitr::knit_hooks$get("warning") # save the old hook
 knitr::knit_hooks$set(warning = function(x, options) {
 x = sub("<string>:1: ", "", x)
 hook_warn_old(x, options)
 })
})
```

---

## Method Summary

| SciPy Method | Description                                                    | Gradient | Hessian  |
|:-------------|:---------------------------------------------------------------|:--------:|:--------:|
| ---          | Newton's method (naive)                                        |    ✓     |    ✓     |
| ---          | Conjugate Gradient (naive)                                     |    ✓     |    ✓     |
| CG           | Nonlinear Conjugate Gradient (Polak and Ribiere variation)     |    ✓     |    ✗     |
| Newton-CG    | Truncated Newton method (Newton w/ CG step direction)          |    ✓     | Optional |
| BFGS         | Broyden, Fletcher, Goldfarb, and Shanno (Quasi-newton method)  | Optional |    ✗     |
| L-BFGS-B     | Limited-memory BFGS (Quasi-newton method)  | Optional |    ✗     |
| Nelder-Mead  | Nelder-Mead simplex reflection method                          |    ✗     |    ✗     |
|

---

## Methods collection

```python
def define_methods(x0, f, grad, hess, tol=1e-8):
  return {
    "naive_newton":    lambda: newtons_method(x0, f, grad, hess, tol=tol),
    "naive_cg":        lambda: conjugate_gradient(x0, f, grad, hess, tol=tol),
    "cg":              lambda: optimize.minimize(f, x0, jac=grad, method="CG", tol=tol),
    "newton-cg":       lambda: optimize.minimize(f, x0, jac=grad, hess=None, method="Newton-CG", tol=tol),
    "newton-cg w/ H":  lambda: optimize.minimize(f, x0, jac=grad, hess=hess, method="Newton-CG", tol=tol),
    "bfgs":            lambda: optimize.minimize(f, x0, jac=grad, method="BFGS", tol=tol),
    "bfgs w/o G":      lambda: optimize.minimize(f, x0, method="BFGS", tol=tol),
    "l-bfgs":          lambda: optimize.minimize(f, x0, method="L-BFGS-B", tol=tol),
    "nelder-mead":     lambda: optimize.minimize(f, x0, method="Nelder-Mead", tol=tol)
  }
```

---

## Method Timings

```python
x0 = (1.6, 1.1)
f, grad, hess = mk_quad(0.7)
methods = define_methods(x0, f, grad, hess)

df = pd.DataFrame({
  key: timeit.Timer(methods[key]).repeat(10, 100) for key in methods
})
  
df
```

```
##    naive_newton  naive_cg        cg  ...  bfgs w/o G    l-bfgs  nelder-mead
## 0      0.023537  0.039970  0.011881  ...    0.066303  0.036481     0.147036
## 1      0.022836  0.040031  0.011484  ...    0.066409  0.036509     0.145659
## 2      0.023006  0.040840  0.011460  ...    0.065983  0.036171     0.146303
## 3      0.023108  0.040619  0.011740  ...    0.065224  0.036673     0.146443
## 4      0.022910  0.040613  0.011933  ...    0.065597  0.036137     0.146067
## 5      0.022782  0.040496  0.011701  ...    0.066092  0.036383     0.147324
## 6      0.022979  0.040472  0.011504  ...    0.065924  0.036287     0.146281
## 7      0.023019  0.040539  0.011490  ...    0.066140  0.036171     0.146400
## 8      0.022744  0.039657  0.011497  ...    0.065693  0.036117     0.145820
## 9      0.022946  0.039879  0.011523  ...    0.065842  0.036078     0.146332
## 
## [10 rows x 9 columns]
```

---

```python
g = sns.catplot(data=df.melt(), y="variable", x="value", aspect=2)
g.ax.set_xlabel("Time (100 iter)")
g.ax.set_ylabel("")
plt.show()
```

---

## Timings across cost functions

.pull-left[ .small[

```python
def time_cost_func(x0, name, cost_func, *args):
  x0 = (1.6, 1.1)  
  f, grad, hess = cost_func(*args)
  methods = define_methods(x0, f, grad, hess)
  
  return ( pd.DataFrame({
      key: timeit.Timer(methods[key]).repeat(10, 20) for key in methods
    })
    .melt()
    .assign(cost_func = name)
  )

df = pd.concat([
  time_cost_func(x0, "Well-cond quad", mk_quad, 0.7),
  time_cost_func(x0, "Ill-cond quad", mk_quad, 0.02),
  time_cost_func(x0, "Rosenbrock", mk_rosenbrock)
])
```
] ]

.pull-right[.small[

```python
df
```

```
##         variable     value       cost_func
## 0   naive_newton  0.004699  Well-cond quad
## 1   naive_newton  0.004590  Well-cond quad
## 2   naive_newton  0.004567  Well-cond quad
## 3   naive_newton  0.004557  Well-cond quad
## 4   naive_newton  0.004553  Well-cond quad
## ..           ...       ...             ...
## 85   nelder-mead  0.047754      Rosenbrock
## 86   nelder-mead  0.047654      Rosenbrock
## 87   nelder-mead  0.047935      Rosenbrock
## 88   nelder-mead  0.047746      Rosenbrock
## 89   nelder-mead  0.047725      Rosenbrock
## 
## [270 rows x 3 columns]
```
] ]

---

```python
g = sns.catplot(data=df, y="variable", x="value", hue="cost_func", alpha=0.5, aspect=2)
g.ax.set_xlabel("Time (20 iter)")
g.ax.set_ylabel("")
plt.show()
```

---

## Profiling - BFGS

.small[

```python
import cProfile

f, grad, hess = mk_quad(0.7)

def run():
  for i  in range(100):
    optimize.minimize(fun = f, x0 = (1.6, 1.1), jac=grad, method="BFGS", tol=1e-11)

cProfile.run('run()', sort="tottime")
```

```
## 112904 function calls (112804 primitive calls) in 0.047 seconds
## 
## Ordered by: internal time
## 
## ncalls tottime percall cumtime percall filename:lineno(function)
## 100 0.010 0.000 0.046 0.000 _optimize.py:1253(_minimize_bfgs)
## 13700/13600 0.004 0.000 0.015 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
## 4200 0.003 0.000 0.003 0.000 {method 'reduce' of 'numpy.ufunc' objects}
## 1000 0.002 0.000 0.003 0.000 <string>:8(gradient)
## 1000 0.002 0.000 0.006 0.000 <string>:2(f)
## 900 0.002 0.000 0.025 0.000 _linesearch.py:91(scalar_search_wolfe1)
## 2000 0.002 0.000 0.004 0.000 numeric.py:2388(array_equal)
## 2100 0.001 0.000 0.004 0.000 fromnumeric.py:69(_wrapreduction)
## 900 0.001 0.000 0.010 0.000 _linesearch.py:77(derphi)
## 5200 0.001 0.000 0.004 0.000 <__array_function__ internals>:177(dot)
## 900 0.001 0.000 0.013 0.000 _linesearch.py:73(phi)
## 2100 0.001 0.000 0.001 0.000 shape_base.py:23(atleast_1d)
## 1000 0.001 0.000 0.008 0.000 _differentiable_functions.py:132(fun_wrapped)
## 1000 0.001 0.000 0.005 0.000 _differentiable_functions.py:162(grad_wrapped)
## 900 0.001 0.000 0.026 0.000 _linesearch.py:31(line_search_wolfe1)
## 900 0.001 0.000 0.027 0.000 _optimize.py:1084(_line_search_wolfe12)
## 1000 0.001 0.000 0.012 0.000 _differentiable_functions.py:264(fun)
## 2100 0.001 0.000 0.001 0.000 {built-in method numpy.array}
## 1000 0.001 0.000 0.003 0.000 _optimize.py:166(vecnorm)
## 2100 0.001 0.000 0.002 0.000 <__array_function__ internals>:177(atleast_1d)
## 2000 0.001 0.000 0.001 0.000 {built-in method numpy.arange}
## 2100 0.001 0.000 0.002 0.000 <__array_function__ internals>:177(copy)
## 2000 0.001 0.000 0.005 0.000 <__array_function__ internals>:177(array_equal)
## 1000 0.000 0.000 0.002 0.000 fromnumeric.py:2160(sum)
## 2100 0.000 0.000 0.000 0.000 fromnumeric.py:70(<dictcomp>)
## 8400 0.000 0.000 0.000 0.000 {built-in method numpy.asarray}
## 1000 0.000 0.000 0.008 0.000 _differentiable_functions.py:270(grad)
## 900 0.000 0.000 0.002 0.000 _differentiable_functions.py:240(update_x)
## 100 0.000 0.000 0.002 0.000 _differentiable_functions.py:86(__init__)
## 1000 0.000 0.000 0.002 0.000 fromnumeric.py:2675(amax)
## 2000 0.000 0.000 0.000 0.000 {method 'copy' of 'numpy.ndarray' objects}
## 2000 0.000 0.000 0.002 0.000 {method 'all' of 'numpy.ndarray' objects}
## 100 0.000 0.000 0.047 0.000 _minimize.py:45(minimize)
## 2000 0.000 0.000 0.002 0.000 _methods.py:60(_all)
## 1900 0.000 0.000 0.000 0.000 {built-in method numpy.zeros}
## 5200 0.000 0.000 0.000 0.000 multiarray.py:736(dot)
## 2100 0.000 0.000 0.001 0.000 function_base.py:846(copy)
## 1000 0.000 0.000 0.003 0.000 <__array_function__ internals>:177(sum)
## 1100 0.000 0.000 0.008 0.000 _differentiable_functions.py:249(_update_fun)
## 1000 0.000 0.000 0.003 0.000 <__array_function__ internals>:177(amax)
## 1100 0.000 0.000 0.006 0.000 _differentiable_functions.py:254(_update_grad)
## 2100 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
## 1000 0.000 0.000 0.008 0.000 _differentiable_functions.py:154(update_fun)
## 1000 0.000 0.000 0.000 0.000 {method 'astype' of 'numpy.ndarray' objects}
## 1000 0.000 0.000 0.006 0.000 _differentiable_functions.py:166(update_grad)
## 100 0.000 0.000 0.000 0.000 linalg.py:2349(norm)
## 2500 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
## 2100 0.000 0.000 0.000 0.000 function_base.py:842(_copy_dispatcher)
## 1000 0.000 0.000 0.000 0.000 numeric.py:1859(isscalar)
## 2000 0.000 0.000 0.000 0.000 numeric.py:2384(_array_equal_dispatcher)
## 900 0.000 0.000 0.000 0.000 {built-in method builtins.min}
## 2400 0.000 0.000 0.000 0.000 {built-in method builtins.len}
## 2100 0.000 0.000 0.000 0.000 shape_base.py:19(_atleast_1d_dispatcher)
## 2200 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
## 100 0.000 0.000 0.000 0.000 twodim_base.py:161(eye)
## 2200 0.000 0.000 0.000 0.000 {built-in method numpy.asanyarray}
## 100 0.000 0.000 0.002 0.000 _optimize.py:175(_prepare_scalar_function)
## 1000 0.000 0.000 0.000 0.000 fromnumeric.py:2155(_sum_dispatcher)
## 1000 0.000 0.000 0.000 0.000 fromnumeric.py:2670(_amax_dispatcher)
## 100 0.000 0.000 0.000 0.000 shape_base.py:81(atleast_2d)
## 900 0.000 0.000 0.000 0.000 {method 'pop' of 'dict' objects}
## 1 0.000 0.000 0.047 0.047 <string>:1(run)
## 100 0.000 0.000 0.000 0.000 _minimize.py:933(standardize_constraints)
## 100 0.000 0.000 0.000 0.000 fromnumeric.py:2305(any)
## 700 0.000 0.000 0.000 0.000 {built-in method builtins.callable}
## 100 0.000 0.000 0.000 0.000 {method 'flatten' of 'numpy.ndarray' objects}
## 100 0.000 0.000 0.000 0.000 <__array_function__ internals>:177(any)
## 100 0.000 0.000 0.000 0.000 <__array_function__ internals>:177(norm)
## 100 0.000 0.000 0.000 0.000 <__array_function__ internals>:177(atleast_2d)
## 100 0.000 0.000 0.000 0.000 {method 'reshape' of 'numpy.ndarray' objects}
## 100 0.000 0.000 0.000 0.000 {method 'any' of 'numpy.ndarray' objects}
## 100 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
## 100 0.000 0.000 0.000 0.000 _methods.py:54(_any)
## 100 0.000 0.000 0.000 0.000 _base.py:1291(isspmatrix)
## 100 0.000 0.000 0.000 0.000 linalg.py:116(isComplexType)
## 200 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}
## 100 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects}
## 100 0.000 0.000 0.000 0.000 _optimize.py:147(_check_unknown_options)
## 100 0.000 0.000 0.000 0.000 {method 'lower' of 'str' objects}
## 100 0.000 0.000 0.000 0.000 linalg.py:2345(_norm_dispatcher)
## 100 0.000 0.000 0.000 0.000 fromnumeric.py:2300(_any_dispatcher)
## 100 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects}
## 1 0.000 0.000 0.047 0.047 {built-in method builtins.exec}
## 100 0.000 0.000 0.000 0.000 _optimize.py:255(hess)
## 100 0.000 0.000 0.000 0.000 shape_base.py:77(_atleast_2d_dispatcher)
## 1 0.000 0.000 0.047 0.047 <string>:1(<module>)
## 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
```
]

---

## Profiling - Nelder-Mead

.small[

```python
def run():
  for i  in range(100):
    optimize.minimize(fun = f, x0 = (1.6, 1.1), method="Nelder-Mead", tol=1e-11)

cProfile.run('run()', sort="tottime")
```

```
## 756504 function calls in 0.270 seconds
## 
## Ordered by: internal time
## 
## ncalls tottime percall cumtime percall filename:lineno(function)
## 100 0.071 0.001 0.269 0.003 _optimize.py:635(_minimize_neldermead)
## 18600 0.034 0.000 0.087 0.000 <string>:2(f)
## 38000 0.028 0.000 0.028 0.000 {method 'reduce' of 'numpy.ufunc' objects}
## 86400 0.017 0.000 0.108 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
## 28500 0.013 0.000 0.040 0.000 fromnumeric.py:69(_wrapreduction)
## 18600 0.011 0.000 0.117 0.000 _optimize.py:491(function_wrapper)
## 18600 0.008 0.000 0.035 0.000 fromnumeric.py:2160(sum)
## 29400 0.007 0.000 0.019 0.000 fromnumeric.py:51(_wrapfunc)
## 19600 0.006 0.000 0.006 0.000 {method 'take' of 'numpy.ndarray' objects}
## 28500 0.005 0.000 0.005 0.000 fromnumeric.py:70(<dictcomp>)
## 18800 0.005 0.000 0.005 0.000 {built-in method numpy.array}
## 19600 0.005 0.000 0.025 0.000 <__array_function__ internals>:177(take)
## 19600 0.005 0.000 0.016 0.000 fromnumeric.py:93(take)
## 18600 0.004 0.000 0.044 0.000 <__array_function__ internals>:177(sum)
## 18600 0.004 0.000 0.004 0.000 {built-in method numpy.arange}
## 18600 0.004 0.000 0.016 0.000 <__array_function__ internals>:177(copy)
## 9800 0.004 0.000 0.004 0.000 {method 'argsort' of 'numpy.ndarray' objects}
## 9700 0.003 0.000 0.017 0.000 fromnumeric.py:2675(amax)
## 9600 0.003 0.000 0.006 0.000 fromnumeric.py:1755(ravel)
## 18600 0.003 0.000 0.003 0.000 {method 'copy' of 'numpy.ndarray' objects}
## 47000 0.003 0.000 0.003 0.000 {built-in method builtins.isinstance}
## 18600 0.002 0.000 0.007 0.000 function_base.py:846(copy)
## 9800 0.002 0.000 0.010 0.000 fromnumeric.py:1012(argsort)
## 9800 0.002 0.000 0.015 0.000 <__array_function__ internals>:177(argsort)
## 9600 0.002 0.000 0.011 0.000 <__array_function__ internals>:177(ravel)
## 9700 0.002 0.000 0.022 0.000 <__array_function__ internals>:177(amax)
## 18600 0.002 0.000 0.003 0.000 numeric.py:1859(isscalar)
## 29500 0.002 0.000 0.002 0.000 {built-in method builtins.getattr}
## 9600 0.001 0.000 0.001 0.000 {method 'ravel' of 'numpy.ndarray' objects}
## 18600 0.001 0.000 0.001 0.000 function_base.py:842(_copy_dispatcher)
## 18600 0.001 0.000 0.001 0.000 fromnumeric.py:2155(_sum_dispatcher)
## 19600 0.001 0.000 0.001 0.000 fromnumeric.py:89(_take_dispatcher)
## 28500 0.001 0.000 0.001 0.000 {method 'items' of 'dict' objects}
## 18800 0.001 0.000 0.001 0.000 {built-in method numpy.asarray}
## 9800 0.001 0.000 0.001 0.000 fromnumeric.py:1008(_argsort_dispatcher)
## 9600 0.001 0.000 0.001 0.000 fromnumeric.py:1751(_ravel_dispatcher)
## 9700 0.001 0.000 0.001 0.000 fromnumeric.py:2670(_amax_dispatcher)
## 9700 0.001 0.000 0.001 0.000 {built-in method numpy.asanyarray}
## 100 0.000 0.000 0.270 0.003 _minimize.py:45(minimize)
## 1 0.000 0.000 0.270 0.270 <string>:1(run)
## 100 0.000 0.000 0.000 0.000 shape_base.py:23(atleast_1d)
## 100 0.000 0.000 0.000 0.000 _minimize.py:933(standardize_constraints)
## 100 0.000 0.000 0.000 0.000 numeric.py:289(full)
## 100 0.000 0.000 0.000 0.000 type_check.py:84(asfarray)
## 100 0.000 0.000 0.000 0.000 numerictypes.py:356(issubdtype)
## 200 0.000 0.000 0.000 0.000 {built-in method numpy.empty}
## 100 0.000 0.000 0.000 0.000 <__array_function__ internals>:177(copyto)
## 100 0.000 0.000 0.000 0.000 fromnumeric.py:2800(amin)
## 100 0.000 0.000 0.000 0.000 fromnumeric.py:2305(any)
## 100 0.000 0.000 0.000 0.000 <__array_function__ internals>:177(atleast_1d)
## 200 0.000 0.000 0.000 0.000 numerictypes.py:282(issubclass_)
## 100 0.000 0.000 0.000 0.000 <__array_function__ internals>:177(asfarray)
## 100 0.000 0.000 0.000 0.000 <__array_function__ internals>:177(any)
## 100 0.000 0.000 0.000 0.000 {method 'flatten' of 'numpy.ndarray' objects}
## 100 0.000 0.000 0.000 0.000 <__array_function__ internals>:177(amin)
## 100 0.000 0.000 0.000 0.000 _optimize.py:484(_wrap_scalar_function_maxfun_validation)
## 300 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}
## 200 0.000 0.000 0.000 0.000 {built-in method builtins.len}
## 200 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects}
## 100 0.000 0.000 0.000 0.000 _optimize.py:147(_check_unknown_options)
## 200 0.000 0.000 0.000 0.000 {built-in method builtins.callable}
## 100 0.000 0.000 0.000 0.000 fromnumeric.py:2300(_any_dispatcher)
## 1 0.000 0.000 0.270 0.270 {built-in method builtins.exec}
## 100 0.000 0.000 0.000 0.000 multiarray.py:1071(copyto)
## 100 0.000 0.000 0.000 0.000 {method 'lower' of 'str' objects}
## 100 0.000 0.000 0.000 0.000 type_check.py:80(_asfarray_dispatcher)
## 100 0.000 0.000 0.000 0.000 fromnumeric.py:2795(_amin_dispatcher)
## 100 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
## 100 0.000 0.000 0.000 0.000 shape_base.py:19(_atleast_1d_dispatcher)
## 1 0.000 0.000 0.270 0.270 <string>:1(<module>)
## 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
```
]

---

## `optimize.minimize()` output

```python
f, grad, hess = mk_quad(0.7)
```

.pull-left[ .small[

```python
optimize.minimize(fun = f, x0 = (1.6, 1.1), jac=grad, method="BFGS")
```

```
##       fun: 1.2739256453436805e-11
##  hess_inv: array([[ 1.51494475, -0.00343804],
##        [-0.00343804,  3.03497828]])
##       jac: array([-3.51014018e-07, -2.85996115e-06])
##   message: 'Optimization terminated successfully.'
##      nfev: 7
##       nit: 6
##      njev: 7
##    status: 0
##   success: True
##         x: array([-5.31839421e-07, -8.84341728e-06])
```
] ]

.pull-right[ .small[

```python
optimize.minimize(fun = f, x0 = (1.6, 1.1), jac=grad, hess=hess, method="Newton-CG")
```

```
##      fun: 2.3418652989289317e-12
##      jac: array([0.00000000e+00, 4.10246332e-06])
##  message: 'Optimization terminated successfully.'
##     nfev: 12
##     nhev: 11
##      nit: 11
##     njev: 12
##   status: 0
##  success: True
##        x: array([0.0000000e+00, 3.8056246e-06])
```
] ]

---

## Collect

.pull-left[ .small[

```python
def run_collect(name, x0, cost_func, *args, tol=1e-8, skip=[]):
  f, grad, hess = cost_func(*args)
  methods = define_methods(x0, f, grad, hess, tol)
  
  res = []
  for method in methods:
    if method in skip:
      continue
    
    x = methods[method]()
    
    d = {
      "name":    name,
      "method":  method,
      "nit":     x["nit"],
      "nfev":    x["nfev"],
      "njev":    x.get("njev"),
      "nhev":    x.get("nhev"),
      "success": x["success"],
      "message": x["message"]
    }
    res.append( pd.DataFrame(d, index=[1]) )
  
  return pd.concat(res)

df = pd.concat([
  run_collect(name, (1.6, 1.1), cost_func, arg, skip=['naive_newton', 'naive_cg']) 
  for name, cost_func, arg in zip(
    ("Well-cond quad", "Ill-cond quad", "Rosenbrock"), 
    (mk_quad, mk_quad, mk_rosenbrock), 
    (0.7,0.02, None)
  )
])
```
] ]

.pull-right[ .small[

```python
df.drop(["message"], axis=1)
```

```
##              name          method  nit  nfev  njev  nhev  success
## 1  Well-cond quad              cg    2     5     5  None     True
## 1  Well-cond quad       newton-cg    5     6    13     0     True
## 1  Well-cond quad  newton-cg w/ H   15    15    15    15     True
## 1  Well-cond quad            bfgs    8     9     9  None     True
## 1  Well-cond quad      bfgs w/o G    8    27     9  None     True
## 1  Well-cond quad          l-bfgs    6    21     7  None     True
## 1  Well-cond quad     nelder-mead   76   147  None  None     True
## 1   Ill-cond quad              cg    9    17    17  None     True
## 1   Ill-cond quad       newton-cg    3     4     9     0     True
## 1   Ill-cond quad  newton-cg w/ H   54   106   106    54     True
## 1   Ill-cond quad            bfgs    5    11    11  None     True
## 1   Ill-cond quad      bfgs w/o G    5    33    11  None     True
## 1   Ill-cond quad          l-bfgs    5    30    10  None     True
## 1   Ill-cond quad     nelder-mead  102   198  None  None     True
## 1      Rosenbrock              cg   17    52    48  None     True
## 1      Rosenbrock       newton-cg   18    22    60     0     True
## 1      Rosenbrock  newton-cg w/ H   17    21    21    17     True
## 1      Rosenbrock            bfgs   23    26    26  None     True
## 1      Rosenbrock      bfgs w/o G   23    78    26  None     True
## 1      Rosenbrock          l-bfgs   19    75    25  None     True
## 1      Rosenbrock     nelder-mead   96   183  None  None     True
```
] ]

---

```python
sns.catplot(
  y = "method", x = "value", hue = "variable", col="name", kind="bar",
  data = df.melt(id_vars=["name","method"], value_vars=["nit", "nfev", "njev", "nhev"]).astype({"value": "float64"})
)
```

---

## Exercise 1

Try minimizing the following function using different optimization methods starting from `$x_0 = [0,0]$`, which appears to work best?

$$
`\begin{align}
f(x) = \exp(x_1-1) + \exp(-x_2+1) + (x_1-x_2)^2
\end{align}`
$$
<img src="Lec13_files/figure-html/unnamed-chunk-15-7.png" width="40%" style="display: block; margin: auto;" />

---

## Random starting locations

.pull-left[

```python
rng = np.random.default_rng(seed=1234)
x0s = rng.uniform(-2,2, (100,2))

df = pd.concat([
  run_collect(name, x0, cost_func, arg, skip=['naive_newton', 'naive_cg']) 
  for name, cost_func, arg in zip(
    ("Well-cond quad", "Ill-cond quad", "Rosenbrock"), 
    (mk_quad, mk_quad, mk_rosenbrock), 
    (0.7,0.02, None)
  )
  for x0 in x0s
])
```
]

.pull-right[.small[

```python
df.drop(["message"], axis=1)
```

```
##               name          method  nit  nfev  njev  nhev  success
## 1   Well-cond quad              cg    2     5     5  None     True
## 1   Well-cond quad       newton-cg    5     6    13     0     True
## 1   Well-cond quad  newton-cg w/ H   15    15    15    15     True
## 1   Well-cond quad            bfgs    6     7     7  None     True
## 1   Well-cond quad      bfgs w/o G    6    21     7  None     True
## ..             ...             ...  ...   ...   ...   ...      ...
## 1       Rosenbrock  newton-cg w/ H   17    17    17    17     True
## 1       Rosenbrock            bfgs   26    31    31  None     True
## 1       Rosenbrock      bfgs w/o G   26    93    31  None     True
## 1       Rosenbrock          l-bfgs   18    57    19  None     True
## 1       Rosenbrock     nelder-mead   75   145  None  None     True
## 
## [2100 rows x 7 columns]
```
] ]

---

## Performance (random start)

---

## MVN Cost Function

.pull-left[

For an `$n$`-dimensional multivariate normal we define 
the `$n \times 1$` vectors `$x$` and `$\mu$` and the `$n \times n$` 
covariance matrix `$\Sigma$`,

.small[
$$
`\begin{align}
f(x) &= \frac{1}{\sqrt{\det(2\pi\Sigma)}} \exp \left[-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu) \right] \\
\\
\nabla f(x) &= -f(x) \Sigma^{-1}(x-\mu) \\
\\
\nabla^2 f(x) &= f(x) \left( \Sigma^{-1}(x-\mu)(x-\mu)^T\Sigma^{-1} - \Sigma^{-1}\right) \\
\end{align}`
$$
] ]

.pull-right[ .small[

```python
def mk_mvn(mu, Sigma):
  Sigma_inv = np.linalg.inv(Sigma)
  #norm_const = 1 / (np.sqrt(np.linalg.det(2*np.pi*Sigma)))
  norm_const = 1
  
  def f(x):
    x_m = x - mu
    return -(norm_const * 
      np.exp( -0.5 * (x_m.T @ Sigma_inv @ x_m).item() ))
  
  def grad(x):
    return (-f(x) * Sigma_inv @ (x - mu))
  
  def hess(x):
    n = len(x)
    x_m = x - mu
    return f(x) * ((Sigma_inv @ x_m).reshape((n,1)) @ (x_m.T @ Sigma_inv).reshape((1,n)) - Sigma_inv)
  
  return f, grad, hess
```
] ]

.footnote[From Section 8.1.1 of the [Matrix Cookbook](https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf)]

---

## Gradient checking

One of the most common issues when implementing an optimizer is to get the gradient calculation wrong which can produce problematic results. It is possible to numerically check the gradient function by comparing results between the gradient function and finite differences from the objective function via `optimize.check_grad()`.

.pull-left[

```python
# 2d
f, grad, hess = mk_mvn(np.zeros(2), np.eye(2,2))
optimize.check_grad(f, grad, [0,0])
```

```
## 1.0536712127723509e-08
```

```python
optimize.check_grad(f, grad, [1,1])
```

```
## 2.8653603296584263e-09
```

```python
# 4d
f, grad, hess = mk_mvn(np.zeros(4), np.eye(4,4))
optimize.check_grad(f, grad, [0,0,0,0])
```

```
## 1.4901161193847656e-08
```

```python
optimize.check_grad(f, grad, [1,1,1,1])
```

```
## 2.346240979278491e-10
```
]

.pull-right[

```python
# 20d
f, grad, hess = mk_mvn(np.zeros(20), np.eye(20))
optimize.check_grad(f, grad, np.zeros(20))
```

```
## 3.332000937312528e-08
```

```python
optimize.check_grad(f, grad, np.ones(20))
```

```
## 7.079924060068647e-13
```

```python
# 50d
f, grad, hess = mk_mvn(np.zeros(50), np.eye(50))
optimize.check_grad(f, grad, np.zeros(50))
```

```
## 5.268356063861754e-08
```

```python
optimize.check_grad(f, grad, np.ones(50))
```

```
## 8.383419667181015e-19
```
]

---

## Testing optimizers

.pull-left[ .small[

```python
f, grad, hess = mk_mvn(np.zeros(4), np.eye(4,4))
optimize.minimize(fun=f, x0=[1,1,1,1], jac=grad, method="CG", tol=1e-11)
```

```
##      fun: -1.0
##      jac: array([6.46744384e-16, 6.46744384e-16, 6.46744384e-16, 6.46744384e-16])
##  message: 'Optimization terminated successfully.'
##     nfev: 8
##      nit: 3
##     njev: 8
##   status: 0
##  success: True
##        x: array([6.46744384e-16, 6.46744384e-16, 6.46744384e-16, 6.46744384e-16])
```

```python
optimize.minimize(fun=f, x0=[1,1,1,1], jac=grad, method="BFGS", tol=1e-11)
```

```
##       fun: -1.0
##  hess_inv: array([[1.00000001e+00, 1.32716307e-08, 1.32716307e-08, 1.32716306e-08],
##        [1.32716307e-08, 1.00000001e+00, 1.32716306e-08, 1.32716306e-08],
##        [1.32716307e-08, 1.32716306e-08, 1.00000001e+00, 1.32716307e-08],
##        [1.32716306e-08, 1.32716306e-08, 1.32716306e-08, 1.00000001e+00]])
##       jac: array([9.80523384e-16, 9.80523381e-16, 9.80523384e-16, 9.80523384e-16])
##   message: 'Optimization terminated successfully.'
##      nfev: 10
##       nit: 5
##      njev: 10
##    status: 0
##   success: True
##         x: array([9.80523384e-16, 9.80523381e-16, 9.80523384e-16, 9.80523384e-16])
```
] ]

.pull-right[ .small[

```python
n = 20
f, grad, hess = mk_mvn(np.zeros(n), np.eye(n,n))
optimize.minimize(fun=f, x0=np.ones(n), jac=grad, method="CG", tol=1e-11)
```

```
##      fun: -1.0
##      jac: array([3.67929936e-19, 3.67929936e-19, 3.67929936e-19, 3.67929936e-19,
##        3.67929936e-19, 3.67929936e-19, 3.67929936e-19, 3.67929936e-19,
##        3.67929936e-19, 3.67929936e-19, 3.67929936e-19, 3.67929936e-19,
##        3.67929936e-19, 3.67929936e-19, 3.67929936e-19, 3.67929936e-19,
##        3.67929936e-19, 3.67929936e-19, 3.67929936e-19, 3.67929936e-19])
##  message: 'Optimization terminated successfully.'
##     nfev: 14
##      nit: 2
##     njev: 14
##   status: 0
##  success: True
##        x: array([3.67929936e-19, 3.67929936e-19, 3.67929936e-19, 3.67929936e-19,
##        3.67929936e-19, 3.67929936e-19, 3.67929936e-19, 3.67929936e-19,
##        3.67929936e-19, 3.67929936e-19, 3.67929936e-19, 3.67929936e-19,
##        3.67929936e-19, 3.67929936e-19, 3.67929936e-19, 3.67929936e-19,
##        3.67929936e-19, 3.67929936e-19, 3.67929936e-19, 3.67929936e-19])
```
] ]

---

## Unit MVNs

.pull-left[

```python
df = pd.concat([
  run_collect(
    name, np.ones(n), mk_mvn, 
    np.zeros(n), np.eye(n), 
    tol=1e-10, 
    skip=['naive_newton', 'naive_cg']
  ) 
  for name, n in zip(
    ("2d", "5d", "10d", "20d", "50d"), 
    (2, 5, 10, 20, 50)
  )
])
```
]

.pull-right[ .small[

```python
df.drop(["message"], axis=1)
```

```
##   name          method   nit   nfev  njev  nhev  success
## 1   2d              cg     3      6     6  None     True
## 1   2d       newton-cg     2      3     5     0     True
## 1   2d  newton-cg w/ H     2      2     2     2     True
## 1   2d            bfgs     4      8     8  None     True
## 1   2d      bfgs w/o G     4     24     8  None     True
## 1   2d          l-bfgs     5     21     7  None     True
## 1   2d     nelder-mead    75    157  None  None     True
## 1   5d              cg     3      8     8  None     True
## 1   5d       newton-cg     4      8    12     0     True
## 1   5d  newton-cg w/ H     4      7     7     4     True
## 1   5d            bfgs     5     11    11  None     True
## 1   5d      bfgs w/o G     5     72    12  None     True
## 1   5d          l-bfgs     4     54     9  None     True
## 1   5d     nelder-mead   297    529  None  None     True
## 1  10d              cg     2     24    22  None     True
## 1  10d       newton-cg     3      9    12     0     True
## 1  10d  newton-cg w/ H     3      8     8     3     True
## 1  10d            bfgs     3     12    12  None     True
## 1  10d      bfgs w/o G     2    121    11  None     True
## 1  10d          l-bfgs     3    110    10  None     True
## 1  10d     nelder-mead  1408   2000  None  None    False
## 1  20d              cg     2     14    14  None     True
## 1  20d       newton-cg     3     10    13     0     True
## 1  20d  newton-cg w/ H     3      9     9     3     True
## 1  20d            bfgs     2     15    15  None     True
## 1  20d      bfgs w/o G     2    315    15  None     True
## 1  20d          l-bfgs     3    210    10  None     True
## 1  20d     nelder-mead  3217   4000  None  None    False
## 1  50d              cg     0      1     1  None     True
## 1  50d       newton-cg     1      1     1     0     True
## 1  50d  newton-cg w/ H     2     10    10     2     True
## 1  50d            bfgs     0      1     1  None     True
## 1  50d      bfgs w/o G     0     51     1  None     True
## 1  50d          l-bfgs     0     51     1  None     True
## 1  50d     nelder-mead  8872  10000  None  None    False
```
] ]

---

## Adding correlation

.pull-left[

```python
def build_Sigma(n):
  S = np.full((n,n), 0.5)
  np.fill_diagonal(S, 1)
  return S

df = pd.concat([
  run_collect(
    name, np.ones(n), mk_mvn, 
    np.zeros(n), build_Sigma(n), 
    tol=1e-9/n, 
    skip=['naive_newton', 'naive_cg']
  ) 
  for name, n in zip(
    ("2d", "5d", "10d", "20d", "50d"), 
    (2, 5, 10, 20, 50)
  )
])
```
]

.pull-right[ .small[

```python
df.drop(["message"], axis=1)
```

```
##   name          method   nit   nfev  njev  nhev  success
## 1   2d              cg    15     18    18  None    False
## 1   2d       newton-cg     5      7    12     0     True
## 1   2d  newton-cg w/ H     5      6     6     5     True
## 1   2d            bfgs     3      7     7  None     True
## 1   2d      bfgs w/o G     3     24     8  None    False
## 1   2d          l-bfgs     5     21     7  None     True
## 1   2d     nelder-mead    73    145  None  None     True
## 1   5d              cg     5     19    19  None     True
## 1   5d       newton-cg     5      7    12     0     True
## 1   5d  newton-cg w/ H     5      6     6     5     True
## 1   5d            bfgs     5      8     8  None     True
## 1   5d      bfgs w/o G     7    528    86  None    False
## 1   5d          l-bfgs     4     54     9  None     True
## 1   5d     nelder-mead   224    421  None  None     True
## 1  10d              cg    10     23    23  None    False
## 1  10d       newton-cg     5      6    11     0     True
## 1  10d  newton-cg w/ H     5      5     5     5     True
## 1  10d            bfgs     4      9     9  None     True
## 1  10d      bfgs w/o G     6    132    12  None     True
## 1  10d          l-bfgs     4     99     9  None     True
## 1  10d     nelder-mead  1151   1754  None  None     True
## 1  20d              cg     5     25    25  None     True
## 1  20d       newton-cg     4      5     9     0     True
## 1  20d  newton-cg w/ H     4      4     4     4     True
## 1  20d            bfgs     5      9     9  None     True
## 1  20d      bfgs w/o G     6    210    10  None    False
## 1  20d          l-bfgs     5    189     9  None     True
## 1  20d     nelder-mead  2745   4000  None  None    False
## 1  50d              cg    12     35    35  None    False
## 1  50d       newton-cg     4      5     9     0     True
## 1  50d  newton-cg w/ H     4      4     4     4     True
## 1  50d            bfgs     4      9     9  None     True
## 1  50d      bfgs w/o G     6   4143    81  None    False
## 1  50d          l-bfgs     6    510    10  None     True
## 1  50d     nelder-mead  6447  10000  None  None    False
```
] ]

---

```python
df
```

```
## name method nit nfev njev nhev success message
## 1 2d cg 15 18 18 None False Desired error not necessarily achieved due to ...
## 1 2d newton-cg 5 7 12 0 True Optimization terminated successfully.
## 1 2d newton-cg w/ H 5 6 6 5 True Optimization terminated successfully.
## 1 2d bfgs 3 7 7 None True Optimization terminated successfully.
## 1 2d bfgs w/o G 3 24 8 None False Desired error not necessarily achieved due to ...
## 1 2d l-bfgs 5 21 7 None True CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
## 1 2d nelder-mead 73 145 None None True Optimization terminated successfully.
## 1 5d cg 5 19 19 None True Optimization terminated successfully.
## 1 5d newton-cg 5 7 12 0 True Optimization terminated successfully.
## 1 5d newton-cg w/ H 5 6 6 5 True Optimization terminated successfully.
## 1 5d bfgs 5 8 8 None True Optimization terminated successfully.
## 1 5d bfgs w/o G 7 528 86 None False Desired error not necessarily achieved due to ...
## 1 5d l-bfgs 4 54 9 None True CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
## 1 5d nelder-mead 224 421 None None True Optimization terminated successfully.
## 1 10d cg 10 23 23 None False Desired error not necessarily achieved due to ...
## 1 10d newton-cg 5 6 11 0 True Optimization terminated successfully.
## 1 10d newton-cg w/ H 5 5 5 5 True Optimization terminated successfully.
## 1 10d bfgs 4 9 9 None True Optimization terminated successfully.
## 1 10d bfgs w/o G 6 132 12 None True Optimization terminated successfully.
## 1 10d l-bfgs 4 99 9 None True CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
## 1 10d nelder-mead 1151 1754 None None True Optimization terminated successfully.
## 1 20d cg 5 25 25 None True Optimization terminated successfully.
## 1 20d newton-cg 4 5 9 0 True Optimization terminated successfully.
## 1 20d newton-cg w/ H 4 4 4 4 True Optimization terminated successfully.
## 1 20d bfgs 5 9 9 None True Optimization terminated successfully.
## 1 20d bfgs w/o G 6 210 10 None False Desired error not necessarily achieved due to ...
## 1 20d l-bfgs 5 189 9 None True CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
## 1 20d nelder-mead 2745 4000 None None False Maximum number of function evaluations has bee...
## 1 50d cg 12 35 35 None False Desired error not necessarily achieved due to ...
## 1 50d newton-cg 4 5 9 0 True Optimization terminated successfully.
## 1 50d newton-cg w/ H 4 4 4 4 True Optimization terminated successfully.
## 1 50d bfgs 4 9 9 None True Optimization terminated successfully.
## 1 50d bfgs w/o G 6 4143 81 None False Desired error not necessarily achieved due to ...
## 1 50d l-bfgs 6 510 10 None True CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
## 1 50d nelder-mead 6447 10000 None None False Maximum number of function evaluations has bee...
```

---

## What's going on?

```python
n = 50
f, grad, hess = mk_mvn(np.zeros(n), build_Sigma(n))
```

.pull-left[ .small[

```python
optimize.minimize(f, np.ones(n), jac=grad, method="CG", tol=1e-10)
```

```
##      fun: -1.0
##      jac: array([ 1.12332277e-10,  8.72935916e-11, -3.32972236e-10,  1.06144334e-11,
##         8.69788321e-11, -4.18555930e-11,  2.10910902e-11, -1.29827554e-11,
##        -1.56070646e-10, -1.73949531e-11, -2.46648960e-10, -9.01747667e-11,
##        -2.69689780e-10, -2.53304579e-10, -2.14723226e-10, -2.17383145e-10,
##        -6.02970488e-11, -2.62054475e-10, -1.88641458e-10,  1.12811546e-11,
##        -1.37095135e-10, -1.99919295e-10, -1.62557241e-10, -1.36324358e-10,
##        -1.82270787e-10, -1.70384755e-10, -1.70436047e-10, -1.45897261e-10,
##        -2.19757053e-10, -1.52952018e-10, -2.13786756e-10, -1.88950951e-10,
##        -2.04463656e-10, -2.43201436e-10, -2.04465933e-10, -2.17788778e-10,
##        -2.78825917e-10, -1.90142568e-10, -2.29375511e-10, -2.29324606e-10,
##        -2.43124923e-10, -2.29273546e-10, -2.27412666e-10, -2.14334485e-10,
##        -2.63387506e-10, -2.38475808e-10, -2.78480808e-10, -2.38501260e-10,
##        -2.58275285e-10, -2.50752825e-10])
##  message: 'Desired error not necessarily achieved due to precision loss.'
##     nfev: 35
##      nit: 12
##     njev: 35
##   status: 2
##  success: False
##        x: array([-4.12110457e-09, -4.13362391e-09, -4.34375683e-09, -4.17196349e-09,
##        -4.13378129e-09, -4.19819851e-09, -4.16672516e-09, -4.18376209e-09,
##        -4.25530603e-09, -4.18596819e-09, -4.30059519e-09, -4.22235809e-09,
##        -4.31211560e-09, -4.30392300e-09, -4.28463232e-09, -4.28596228e-09,
##        -4.20741923e-09, -4.30829795e-09, -4.27159144e-09, -4.17163013e-09,
##        -4.24581828e-09, -4.27723036e-09, -4.25854933e-09, -4.24543289e-09,
##        -4.26840610e-09, -4.26246309e-09, -4.26248873e-09, -4.25021934e-09,
##        -4.28714924e-09, -4.25374672e-09, -4.28416409e-09, -4.27174618e-09,
##        -4.27950254e-09, -4.29887143e-09, -4.27950368e-09, -4.28616510e-09,
##        -4.31668367e-09, -4.27234199e-09, -4.29195846e-09, -4.29193301e-09,
##        -4.29883317e-09, -4.29190748e-09, -4.29097704e-09, -4.28443795e-09,
##        -4.30896446e-09, -4.29650861e-09, -4.31651111e-09, -4.29652134e-09,
##        -4.30640835e-09, -4.30264712e-09])
```

```python
optimize.minimize(f, np.ones(n), jac=grad, method="CG", tol=1e-8)
```

```
##      fun: -0.9999999999999776
##      jac: array([-6.31370029e-09, -6.27931176e-09, -5.70211192e-09, -6.17399937e-09,
##        -6.27887946e-09, -6.10193619e-09, -6.18838818e-09, -6.14159061e-09,
##        -5.94507140e-09, -6.13553083e-09, -5.82066969e-09, -6.03557386e-09,
##        -5.78902506e-09, -5.81152876e-09, -5.86451701e-09, -5.86086383e-09,
##        -6.07660840e-09, -5.79951151e-09, -5.90033813e-09, -6.17491506e-09,
##        -5.97113268e-09, -5.88484897e-09, -5.93616261e-09, -5.97219127e-09,
##        -5.90908771e-09, -5.92541218e-09, -5.92534173e-09, -5.95904370e-09,
##        -5.85760347e-09, -5.94935458e-09, -5.86580317e-09, -5.89991307e-09,
##        -5.87860767e-09, -5.82540458e-09, -5.87860454e-09, -5.86030673e-09,
##        -5.77647734e-09, -5.89827648e-09, -5.84439333e-09, -5.84446324e-09,
##        -5.82550966e-09, -5.84453337e-09, -5.84708913e-09, -5.86505091e-09,
##        -5.79768071e-09, -5.83189483e-09, -5.77695132e-09, -5.83185987e-09,
##        -5.80470191e-09, -5.81503338e-09])
##  message: 'Optimization terminated successfully.'
##     nfev: 27
##      nit: 9
##     njev: 27
##   status: 0
##  success: True
##        x: array([-1.51405253e-07, -1.51388059e-07, -1.51099459e-07, -1.51335402e-07,
##        -1.51387842e-07, -1.51299371e-07, -1.51342597e-07, -1.51319198e-07,
##        -1.51220938e-07, -1.51316168e-07, -1.51158738e-07, -1.51266190e-07,
##        -1.51142915e-07, -1.51154167e-07, -1.51180661e-07, -1.51178835e-07,
##        -1.51286707e-07, -1.51148158e-07, -1.51198572e-07, -1.51335860e-07,
##        -1.51233969e-07, -1.51190827e-07, -1.51216484e-07, -1.51234498e-07,
##        -1.51202947e-07, -1.51211109e-07, -1.51211074e-07, -1.51227925e-07,
##        -1.51177204e-07, -1.51223080e-07, -1.51181304e-07, -1.51198359e-07,
##        -1.51187707e-07, -1.51161105e-07, -1.51187705e-07, -1.51178556e-07,
##        -1.51136641e-07, -1.51197541e-07, -1.51170599e-07, -1.51170634e-07,
##        -1.51161158e-07, -1.51170669e-07, -1.51171947e-07, -1.51180928e-07,
##        -1.51147243e-07, -1.51164350e-07, -1.51136878e-07, -1.51164333e-07,
##        -1.51150754e-07, -1.51155919e-07])
```
] ]

.pull-right[ .small[

```python
optimize.minimize(f, np.ones(n), jac=grad, method="BFGS", tol=1e-10)
```

```
##       fun: -1.0
##  hess_inv: array([[1.49000053, 0.49000053, 0.49000053, ..., 0.49000053, 0.49000053,
##         0.49000053],
##        [0.49000053, 1.49000053, 0.49000053, ..., 0.49000053, 0.49000053,
##         0.49000053],
##        [0.49000053, 0.49000053, 1.49000053, ..., 0.49000053, 0.49000053,
##         0.49000053],
##        ...,
##        [0.49000053, 0.49000053, 0.49000053, ..., 1.49000053, 0.49000053,
##         0.49000053],
##        [0.49000053, 0.49000053, 0.49000053, ..., 0.49000053, 1.49000053,
##         0.49000053],
##        [0.49000053, 0.49000053, 0.49000053, ..., 0.49000053, 0.49000053,
##         1.49000053]])
##       jac: array([-2.73970642e-12, -2.74307089e-12, -2.79949727e-12, -2.75330689e-12,
##        -2.74292110e-12, -2.76027528e-12, -2.75166780e-12, -2.75652657e-12,
##        -2.77555684e-12, -2.75705655e-12, -2.78801410e-12, -2.76681212e-12,
##        -2.79098818e-12, -2.78892573e-12, -2.78363550e-12, -2.78392371e-12,
##        -2.76262526e-12, -2.78983521e-12, -2.77993127e-12, -2.75311320e-12,
##        -2.77310882e-12, -2.78152071e-12, -2.77661830e-12, -2.77301265e-12,
##        -2.77931259e-12, -2.77777327e-12, -2.77782252e-12, -2.77431223e-12,
##        -2.78416334e-12, -2.77532276e-12, -2.78339434e-12, -2.78012604e-12,
##        -2.78229195e-12, -2.78753158e-12, -2.78224230e-12, -2.78397214e-12,
##        -2.79233744e-12, -2.78026976e-12, -2.78560685e-12, -2.78560831e-12,
##        -2.78748208e-12, -2.78545980e-12, -2.78532019e-12, -2.78358907e-12,
##        -2.79007617e-12, -2.78680814e-12, -2.79224211e-12, -2.78680779e-12,
##        -2.78959904e-12, -2.78844087e-12])
##   message: 'Optimization terminated successfully.'
##      nfev: 9
##       nit: 4
##      njev: 9
##    status: 0
##   success: True
##         x: array([-7.07996347e-11, -7.08013170e-11, -7.08295302e-11, -7.08064350e-11,
##        -7.08012421e-11, -7.08099192e-11, -7.08056154e-11, -7.08080448e-11,
##        -7.08175599e-11, -7.08083098e-11, -7.08237886e-11, -7.08131876e-11,
##        -7.08252756e-11, -7.08242444e-11, -7.08215993e-11, -7.08217434e-11,
##        -7.08110942e-11, -7.08246991e-11, -7.08197472e-11, -7.08063381e-11,
##        -7.08163359e-11, -7.08205419e-11, -7.08180907e-11, -7.08162879e-11,
##        -7.08194378e-11, -7.08186682e-11, -7.08186928e-11, -7.08169376e-11,
##        -7.08218632e-11, -7.08174429e-11, -7.08214787e-11, -7.08198445e-11,
##        -7.08209275e-11, -7.08235473e-11, -7.08209027e-11, -7.08217676e-11,
##        -7.08259502e-11, -7.08199164e-11, -7.08225850e-11, -7.08225857e-11,
##        -7.08235226e-11, -7.08225114e-11, -7.08224416e-11, -7.08215761e-11,
##        -7.08248196e-11, -7.08231856e-11, -7.08259026e-11, -7.08231854e-11,
##        -7.08245810e-11, -7.08240020e-11])
```
] ]

---

```python
sns.catplot(
  y = "method", x = "value", hue = "variable", col="name", kind="bar",
  data = df.melt(
    id_vars=["name","method"], value_vars=["nit", "nfev", "njev", "nhev"]
  ).astype(
    {"value": "float64"}
  ).query(
    "name != '2d'"
  )
).set(
  xscale="log", xlabel="", ylabel=""
)
```

---

## Some general advice

* Having access to the gradient is almost always helpful / necessary

* Having access to the hessian can be helpful, but usually does not significantly improve things

* In general, **BFGS** or **L-BFGS** should be a first choice for most problems (either well- or ill-conditioned)

* **CG** can perform better for well-conditioned problems with cheap function evaluations
 
<img src="imgs/scipy_opt_summary.png" width="80%" style="display: block; margin: auto;" />