Lec 16 - scikit-learn classification

class: center, middle, inverse, title-slide

# Lec 16 - scikit-learn classification
## Statistical Computing and Computation
### Sta 663 | Spring 2022
### Dr. Colin Rundel

---

exclude: true

```python
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

import sklearn

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.model_selection import GridSearchCV, KFold, StratifiedKFold, train_test_split
from sklearn.metrics import classification_report

plt.rcParams['figure.dpi'] = 200

np.set_printoptions(
  edgeitems=30, linewidth=200,
  precision = 5, suppress=True
  #formatter=dict(float=lambda x: "%.5g" % x)
)

pd.set_option("display.width", 1000)
pd.set_option("display.max_columns", 10)
pd.set_option("display.precision", 6)
```

```r
knitr::opts_chunk$set(
  fig.align="center",
  cache=FALSE
)
```

```r
local({
 hook_err_old <- knitr::knit_hooks$get("error") # save the old hook
 knitr::knit_hooks$set(error = function(x, options) {
 # now do whatever you want to do with x, and pass
 # the new x to the old hook
 x = sub("## \n## Detailed traceback:\n.*$", "", x)
 x = sub("Error in py_call_impl\$.*?\$\\: ", "", x)
 hook_err_old(x, options)
 })
 
 hook_warn_old <- knitr::knit_hooks$get("warning") # save the old hook
 knitr::knit_hooks$set(warning = function(x, options) {
 x = sub("<string>:1: ", "", x)
 hook_warn_old(x, options)
 })
})
```

---

## OpenIntro - Spam

We will start by looking at a data set on spam emails from the [OpenIntro project](https://www.openintro.org/). A full data dictionary can be found [here](https://www.openintro.org/data/index.php?data=email). To keep things simple this week we will restrict our exploration to including only the following columns: `spam`, `exclaim_mess`, `format`, `num_char`,  `line_breaks`, and `number`.

* `spam` - Indicator for whether the email was spam.
* `exclaim_mess` - The number of exclamation points in the email message.
* `format` - Indicates whether the email was written using HTML (e.g. may have included bolding or active links).
* `num_char` - The number of characters in the email, in thousands.
* `line_breaks` - The number of line breaks in the email (does not count text wrapping).
* `number` - Factor variable saying whether there was no number, a small number (under 1 million), or a big number.

---

```python
email = pd.read_csv('data/email.csv')[ ['spam', 'exclaim_mess', 'format', 'num_char', 'line_breaks', 'number'] ]
email
```

```
##       spam  exclaim_mess  format  num_char  line_breaks number
## 0        0             0       1    11.370          202    big
## 1        0             1       1    10.504          202  small
## 2        0             6       1     7.773          192  small
## 3        0            48       1    13.256          255  small
## 4        0             1       0     1.231           29   none
## ...    ...           ...     ...       ...          ...    ...
## 3916     1             0       0     0.332           12  small
## 3917     1             0       0     0.323           15  small
## 3918     0             5       1     8.656          208  small
## 3919     0             0       0    10.185          132  small
## 3920     1             1       0     2.225           65  small
## 
## [3921 rows x 6 columns]
```

Given that `number` is categorical, we will take care of the necessary dummy coding via `pd.get_dummies()`,

```python
email_dc = pd.get_dummies(email)
email_dc
```

```
##       spam  exclaim_mess  format  num_char  line_breaks  number_big  number_none  number_small
## 0        0             0       1    11.370          202           1            0             0
## 1        0             1       1    10.504          202           0            0             1
## 2        0             6       1     7.773          192           0            0             1
## 3        0            48       1    13.256          255           0            0             1
## 4        0             1       0     1.231           29           0            1             0
## ...    ...           ...     ...       ...          ...         ...          ...           ...
## 3916     1             0       0     0.332           12           0            0             1
## 3917     1             0       0     0.323           15           0            0             1
## 3918     0             5       1     8.656          208           0            0             1
## 3919     0             0       0    10.185          132           0            0             1
## 3920     1             1       0     2.225           65           0            0             1
## 
## [3921 rows x 8 columns]
```

---

```python
sns.pairplot(email, hue='spam')
```

---

## Model fitting

```python
from sklearn.linear_model import LogisticRegression

y = email_dc.spam
X = email_dc.drop('spam', axis=1)

m = LogisticRegression(fit_intercept = False).fit(X, y)
```

```python
m.feature_names_in_
```

```
## array(['exclaim_mess', 'format', 'num_char', 'line_breaks', 'number_big', 'number_none', 'number_small'], dtype=object)
```

```python
m.coef_
```

```
## array([[ 0.00982, -0.61893,  0.0545 , -0.00556, -1.21224, -0.69336, -1.92076]])
```

---

## A quick comparison

.pull-left[.small[

```r
glm(spam ~ . - 1, data = d, family=binomial) 
```

```
## 
## Call:  glm(formula = spam ~ . - 1, family = binomial, data = d)
## 
## Coefficients:
## exclaim_mess        format      num_char   line_breaks     numberbig  
##     0.009587     -0.604782      0.054765     -0.005480     -1.264827  
##   numbernone   numbersmall  
##    -0.706843     -1.950440  
## 
## Degrees of Freedom: 3921 Total (i.e. Null);  3914 Residual
## Null Deviance:	    5436 
## Residual Deviance: 2144 	AIC: 2158
```
] ]

.pull-right[ .small[

```python
m.feature_names_in_
```

```
## array(['exclaim_mess', 'format', 'num_char', 'line_breaks', 'number_big', 'number_none', 'number_small'], dtype=object)
```

```python
m.coef_
```

```
## array([[ 0.00982, -0.61893,  0.0545 , -0.00556, -1.21224, -0.69336, -1.92076]])
```
] ]

.center[
Why are these different?
]

> `sklearn.linear_model.LogisticRegression`
>
> ...
> 
> This class implements regularized logistic regression using the ‘liblinear’ library, ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ solvers. **Note that regularization is applied by default.** It can handle both dense and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied).

---

## Penalty parameter

🚩🚩🚩 `LogisticRegression()` has a parameter called penalty that applies a `l1` (lasso), `l2` (ridge), `elasticnet` or `none` with `l2` being the default. To make matters worse, the regularization is controled by the parameter `C` which defaults to 1 (not 0) - also `C` is the inverse regularization strength (e.g. different from `alpha` for ridge and lasso models). 🚩🚩🚩

$$
\min\_{w, c} \frac{1 - \rho}{2}w^T w + \rho \|w\|\_1 + C \sum\_{i=1}^n \log(\exp(- y\_i (X\_i^T w + c)) + 1),
$$

```python
m = LogisticRegression(fit_intercept = False, penalty="none").fit(X, y)
m.feature_names_in_
```

```
## array(['exclaim_mess', 'format', 'num_char', 'line_breaks', 'number_big', 'number_none', 'number_small'], dtype=object)
```

```python
m.coef_
```

```
## array([[ 0.00958, -0.60606,  0.05505, -0.00549, -1.26347, -0.70637, -1.95091]])
```

---

## Solver parameter

It is also possible specify the solver to use when fitting a logistic regression model, to complicate matters somewhat the choice of the algorithm depends on the penalty chosen:

* `newton-cg` - [`l2`, `none`]
* `lbfgs` - [`l2`, `none`]
* `liblinear` - [`l1`, `l2`]
* `sag` - [`l2`, `none`]
* `saga` - [`elasticnet`, `l1`, `l2`, `none`]

Also the can be issues with feature scales for some of these solvers:
> **Note:** ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

---

## Prediction

Classification models have multiple prediction methods depending on what type of output you would like,

```python
m.predict(X)
```

```
## array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..., 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
```

.pull-left[

```python
m.predict_proba(X)
```

```
## array([[0.91318, 0.08682],
##        [0.956  , 0.044  ],
##        [0.95796, 0.04204],
##        [0.94091, 0.05909],
##        [0.68747, 0.31253],
##        [0.68439, 0.31561],
##        [0.93424, 0.06576],
##        [0.96366, 0.03634],
##        [0.89589, 0.10411],
##        [0.94186, 0.05814],
##        [0.9326 , 0.0674 ],
##        [0.89604, 0.10396],
##        [0.91236, 0.08764],
##        [0.97276, 0.02724],
##        [0.92822, 0.07178],
##        [0.98356, 0.01644],
##        [0.96329, 0.03671],
##        [0.95383, 0.04617],
##        [0.88896, 0.11104],
##        [0.80423, 0.19577],
##        [0.89907, 0.10093],
##        [0.95648, 0.04352],
##        [0.99088, 0.00912],
##        [0.88025, 0.11975],
##        [0.80527, 0.19473],
##        [0.88754, 0.11246],
##        [0.89736, 0.10264],
##        [0.88684, 0.11316],
##        [0.68512, 0.31488],
##        [0.93706, 0.06294],
##        ...,
##        [0.8822 , 0.1178 ],
##        [0.99382, 0.00618],
##        [0.93511, 0.06489],
##        [0.68921, 0.31079],
##        [0.87715, 0.12285],
##        [0.79328, 0.20672],
##        [0.79   , 0.21   ],
##        [0.67245, 0.32755],
##        [0.89349, 0.10651],
##        [0.93284, 0.06716],
##        [0.68921, 0.31079],
##        [0.88452, 0.11548],
##        [0.98189, 0.01811],
##        [0.88955, 0.11045],
##        [0.88363, 0.11637],
##        [0.67266, 0.32734],
##        [0.79052, 0.20948],
##        [0.67985, 0.32015],
##        [0.68707, 0.31293],
##        [0.70604, 0.29396],
##        [0.93324, 0.06676],
##        [0.93067, 0.06933],
##        [0.88962, 0.11038],
##        [0.78861, 0.21139],
##        [0.91821, 0.08179],
##        [0.88064, 0.11936],
##        [0.88242, 0.11758],
##        [0.95988, 0.04012],
##        [0.89238, 0.10762],
##        [0.89806, 0.10194]])
```
]

.pull-right[

```python
m.predict_log_proba(X)
```

```
## array([[-0.09082, -2.44394],
##        [-0.04499, -3.12364],
##        [-0.04295, -3.16909],
##        [-0.06091, -2.82873],
##        [-0.37474, -1.16305],
##        [-0.37922, -1.15326],
##        [-0.06802, -2.72176],
##        [-0.03702, -3.3147 ],
##        [-0.10994, -2.26229],
##        [-0.0599 , -2.84487],
##        [-0.06978, -2.69706],
##        [-0.10977, -2.26371],
##        [-0.09172, -2.43457],
##        [-0.02761, -3.60324],
##        [-0.07449, -2.63414],
##        [-0.01658, -4.10782],
##        [-0.0374 , -3.30471],
##        [-0.04727, -3.07543],
##        [-0.1177 , -2.19788],
##        [-0.21787, -1.63081],
##        [-0.10639, -2.29337],
##        [-0.04449, -3.13464],
##        [-0.00917, -4.6968 ],
##        [-0.12755, -2.12237],
##        [-0.21657, -1.63616],
##        [-0.1193 , -2.18514],
##        [-0.10829, -2.27657],
##        [-0.12009, -2.17899],
##        [-0.37817, -1.15555],
##        [-0.065  , -2.76564],
##        ...,
##        [-0.12534, -2.13877],
##        [-0.0062 , -5.08704],
##        [-0.06709, -2.73506],
##        [-0.3722 , -1.16865],
##        [-0.13108, -2.09676],
##        [-0.23158, -1.5764 ],
##        [-0.23573, -1.56063],
##        [-0.39683, -1.11611],
##        [-0.11262, -2.23954],
##        [-0.06952, -2.70063],
##        [-0.3722 , -1.16865],
##        [-0.12271, -2.15867],
##        [-0.01827, -4.01147],
##        [-0.11704, -2.20315],
##        [-0.12371, -2.151  ],
##        [-0.39652, -1.11675],
##        [-0.23506, -1.56315],
##        [-0.38588, -1.13897],
##        [-0.37532, -1.16178],
##        [-0.34808, -1.22432],
##        [-0.06909, -2.7067 ],
##        [-0.07185, -2.66892],
##        [-0.11696, -2.20386],
##        [-0.23749, -1.55403],
##        [-0.08533, -2.50358],
##        [-0.1271 , -2.12564],
##        [-0.12509, -2.1406 ],
##        [-0.04094, -3.21594],
##        [-0.11387, -2.22912],
##        [-0.10752, -2.28337]])
```
]

---

## Scoring

Classification models also include a `score()` method which returns the model's accuracy,

```python
m.score(X, y)
```

```
## 0.90640142820709
```

Other scoring options are available via the [metrics](https://scikit-learn.org/stable/modules/classes.html#classification-metrics) submodule

```python
from sklearn.metrics import accuracy_score, roc_auc_score, f1_score, confusion_matrix
```

.pull-left[

```python
accuracy_score(y, m.predict(X))
```

```
## 0.90640142820709
```

```python
roc_auc_score(y, m.predict_proba(X)[:,1])
```

```
## 0.7606622771440706
```

```python
f1_score(y, m.predict(X))
```

```
## 0.0
```
]

.pull-right[

```python
confusion_matrix(y, m.predict(X), labels=m.classes_)
```

```
## array([[3554,    0],
##        [ 367,    0]])
```
]

---

## Scoring visualizations - confusion matrix

.small[

```python
from sklearn.metrics import ConfusionMatrixDisplay
cm = confusion_matrix(y, m.predict(X), labels=m.classes_)

disp = ConfusionMatrixDisplay(cm).plot()
plt.show()
```

<img src="Lec16_files/figure-html/unnamed-chunk-17-1.png" width="40%" style="display: block; margin: auto;" />
]

---

## Scoring visualizations - ROC curve

.small[

```python
from sklearn.metrics import auc, roc_curve, RocCurveDisplay

fpr, tpr, thresholds = roc_curve(y, m.predict_proba(X)[:,1])
roc_auc = auc(fpr, tpr)
disp = RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=roc_auc,
                       estimator_name='Logistic Regression').plot()
plt.show()
```

<img src="Lec16_files/figure-html/unnamed-chunk-18-3.png" width="40%" style="display: block; margin: auto;" />
]

---

## Scoring visualizations - Precision Recall

.small[

```python
from sklearn.metrics import precision_recall_curve, PrecisionRecallDisplay

precision, recall, _ = precision_recall_curve(y, m.predict_proba(X)[:,1])
disp = PrecisionRecallDisplay(precision=precision, recall=recall).plot()
plt.show()
```

<img src="Lec16_files/figure-html/unnamed-chunk-19-5.png" width="40%" style="display: block; margin: auto;" />
]

---

## Another visualization

```python
def confusion_plot(truth, probs, threshold=0.5):
 
 d = pd.DataFrame(
 data = {'spam': y, 'truth': truth, 'probs': probs}
 )
 
 # Create a column called outcome that contains the labeling outcome for the given threshold
 d['outcome'] = 'other'
 d.loc[(d.spam == 1) & (d.probs >= threshold), 'outcome'] = 'true positive'
 d.loc[(d.spam == 0) & (d.probs >= threshold), 'outcome'] = 'false positive'
 d.loc[(d.spam == 1) & (d.probs < threshold), 'outcome'] = 'false negative'
 d.loc[(d.spam == 0) & (d.probs < threshold), 'outcome'] = 'true negative'
 
 # Create plot and color according to outcome
 plt.figure(figsize=(12,4))
 plt.xlim((-0.05,1.05))
 sns.stripplot(y='truth', x='probs', hue='outcome', data=d, size=3, alpha=0.5)
 plt.axvline(x=threshold, linestyle='dashed', color='black', alpha=0.5)
 plt.title("threshold = %.2f" % threshold)
 plt.show()
```

---

.small[

```python
truth = pd.Categorical.from_codes(y, categories = ('not spam','spam'))
probs = m.predict_proba(X)[:,1]
confusion_plot(truth, probs, 0.5)
```

```python
confusion_plot(truth, probs, 0.25)
```

<img src="Lec16_files/figure-html/unnamed-chunk-21-8.png" width="66%" style="display: block; margin: auto;" />
]

---
class: center, middle

## Demo 1 - DecisionTreeClassifier

---
class: center, middle

## Demo 2 - SVC

---

## MNIST handwritten digits

```python
from sklearn.datasets import load_digits

digits = load_digits(as_frame=True)
```

.pull-left[ .small[

```python
X = digits.data
X
```

```
##       pixel_0_0  pixel_0_1  pixel_0_2  pixel_0_3  pixel_0_4  ...  pixel_7_3  pixel_7_4  pixel_7_5  pixel_7_6  pixel_7_7
## 0           0.0        0.0        5.0       13.0        9.0  ...       13.0       10.0        0.0        0.0        0.0
## 1           0.0        0.0        0.0       12.0       13.0  ...       11.0       16.0       10.0        0.0        0.0
## 2           0.0        0.0        0.0        4.0       15.0  ...        3.0       11.0       16.0        9.0        0.0
## 3           0.0        0.0        7.0       15.0       13.0  ...       13.0       13.0        9.0        0.0        0.0
## 4           0.0        0.0        0.0        1.0       11.0  ...        2.0       16.0        4.0        0.0        0.0
## ...         ...        ...        ...        ...        ...  ...        ...        ...        ...        ...        ...
## 1792        0.0        0.0        4.0       10.0       13.0  ...       14.0       15.0        9.0        0.0        0.0
## 1793        0.0        0.0        6.0       16.0       13.0  ...       16.0       14.0        6.0        0.0        0.0
## 1794        0.0        0.0        1.0       11.0       15.0  ...        9.0       13.0        6.0        0.0        0.0
## 1795        0.0        0.0        2.0       10.0        7.0  ...       12.0       16.0       12.0        0.0        0.0
## 1796        0.0        0.0       10.0       14.0        8.0  ...       12.0       14.0       12.0        1.0        0.0
## 
## [1797 rows x 64 columns]
```
] ]

.pull-right[ .small[

```python
y = digits.target
y
```

```
## 0       0
## 1       1
## 2       2
## 3       3
## 4       4
##        ..
## 1792    9
## 1793    0
## 1794    8
## 1795    9
## 1796    8
## Name: target, Length: 1797, dtype: int64
```
] ]

---

## digit description

.small[

```
## .. _digits_dataset:
## 
## Optical recognition of handwritten digits dataset
## --------------------------------------------------
## 
## **Data Set Characteristics:**
## 
##     :Number of Instances: 1797
##     :Number of Attributes: 64
##     :Attribute Information: 8x8 image of integer pixels in the range 0..16.
##     :Missing Attribute Values: None
##     :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
##     :Date: July; 1998
## 
## This is a copy of the test set of the UCI ML hand-written digits datasets
## https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits
## 
## The data set contains images of hand-written digits: 10 classes where
## each class refers to a digit.
## 
## Preprocessing programs made available by NIST were used to extract
## normalized bitmaps of handwritten digits from a preprinted form. From a
## total of 43 people, 30 contributed to the training set and different 13
## to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
## 4x4 and the number of on pixels are counted in each block. This generates
## an input matrix of 8x8 where each element is an integer in the range
## 0..16. This reduces dimensionality and gives invariance to small
## distortions.
## 
## For info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G.
## T. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C.
## L. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469,
## 1994.
## 
## .. topic:: References
## 
##   - C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their
##     Applications to Handwritten Digit Recognition, MSc Thesis, Institute of
##     Graduate Studies in Science and Engineering, Bogazici University.
##   - E. Alpaydin, C. Kaynak (1998) Cascading Classifiers, Kybernetika.
##   - Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin.
##     Linear dimensionalityreduction using relevance weighted LDA. School of
##     Electrical and Electronic Engineering Nanyang Technological University.
##     2005.
##   - Claudio Gentile. A New Approximate Maximal Margin Classification
##     Algorithm. NIPS. 2000.
```
]

---

## Example digits

---

## Doing things properly - train/test split

To properly assess our modeling we will create a training and testing set of these data, only the training data will be used to learn model coefficients or hyperparameters, test data will only be used for final model scoring.

```python
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, shuffle=True, random_state=1234
)
```

---

## Multiclass logistic regression

Fitting a multiclass logistic regression model will involve selecting a value for the `multi_class` parameter, which can be either `multinomial` for multinomial regression or `ovr` for one-vs-rest where `k` binary models are fit.

```python
mc_log_cv = GridSearchCV(
  LogisticRegression(penalty='none', max_iter = 5000),
  param_grid = {"multi_class": ["multinomial", "ovr"]},
  cv = KFold(10, shuffle=True, random_state=12345),
  n_jobs = 4
).fit(X_train, y_train)
```

```python
mc_log_cv.best_estimator_
```

```
## LogisticRegression(max_iter=5000, multi_class='multinomial', penalty='none')
```

```python
mc_log_cv.best_score_
```

```
## 0.943477961432507
```

```python
for p, s in  zip(mc_log_cv.cv_results_["params"], mc_log_cv.cv_results_["mean_test_score"]):
  print(p,"Score:",s)
```

```
## {'multi_class': 'multinomial'} Score: 0.943477961432507
## {'multi_class': 'ovr'} Score: 0.8927617079889807
```

---

## Model coefficients

```python
pd.DataFrame(
  mc_log_cv.best_estimator_.coef_
)
```

```
##     0         1         2         3         4   ...        59        60        61        62        63
## 0  0.0 -0.133584 -0.823611  0.904385  0.163397  ...  1.211092 -0.444343 -1.660396 -0.750159 -0.184264
## 1  0.0 -0.184931 -1.259550  1.453983 -5.091361  ... -0.792356  0.384498  2.617778  1.265903  2.338324
## 2  0.0  0.118104  0.569190  0.798171  0.943558  ...  0.281622  0.829968  2.602947  2.481998  0.788003
## 3  0.0  0.239612 -0.381815  0.393986  3.886781  ...  1.231867  0.439466  1.070662  0.583209 -1.027194
## 4  0.0 -0.109904 -1.160712 -2.175923 -2.580281  ... -0.937843 -1.710608 -0.651175 -0.656791 -0.097263
## 5  0.0  0.701265  4.241974 -0.738130  0.057049  ...  2.045636 -0.001139 -1.412535 -2.097753 -0.210256
## 6  0.0 -0.103487 -1.454058 -1.310946 -0.400937  ... -1.407609  0.249136  2.466801  1.005207 -0.624921
## 7  0.0  0.088562  1.386086  1.198007  0.467463  ... -2.710461 -3.176521 -2.635078 -0.710317 -0.099948
## 8  0.0 -0.347408 -0.306168 -1.933009  1.074249  ...  0.872821  1.722070 -2.302814 -1.602654 -0.679128
## 9  0.0 -0.268228 -0.811336  1.409475  1.480082  ...  0.205230  1.707472 -0.096190  0.481356 -0.203353
## 
## [10 rows x 64 columns]
```

```python
mc_log_cv.best_estimator_.coef_.shape
```

```
## (10, 64)
```

```python
mc_log_cv.best_estimator_.intercept_
```

```
## array([ 0.01606, -0.11466, -0.00535,  0.08555,  0.10436, -0.01811, -0.00945,  0.05044, -0.01357, -0.09528])
```

---

## Confusion Matrix

.pull-left[
**Within sample**

```python
accuracy_score(
  y_train, 
  mc_log_cv.best_estimator_.predict(X_train)
)
```

```
## 1.0
```

```python
confusion_matrix(
  y_train, 
  mc_log_cv.best_estimator_.predict(X_train)
)
```

```
## array([[125,   0,   0,   0,   0,   0,   0,   0,   0,   0],
##        [  0, 118,   0,   0,   0,   0,   0,   0,   0,   0],
##        [  0,   0, 119,   0,   0,   0,   0,   0,   0,   0],
##        [  0,   0,   0, 123,   0,   0,   0,   0,   0,   0],
##        [  0,   0,   0,   0, 110,   0,   0,   0,   0,   0],
##        [  0,   0,   0,   0,   0, 114,   0,   0,   0,   0],
##        [  0,   0,   0,   0,   0,   0, 124,   0,   0,   0],
##        [  0,   0,   0,   0,   0,   0,   0, 124,   0,   0],
##        [  0,   0,   0,   0,   0,   0,   0,   0, 119,   0],
##        [  0,   0,   0,   0,   0,   0,   0,   0,   0, 127]])
```
]

.pull-right[
**Out of sample**

```python
accuracy_score(
  y_test, 
  mc_log_cv.best_estimator_.predict(X_test)
)
```

```
## 0.9579124579124579
```

```python
confusion_matrix(
  y_test, 
  mc_log_cv.best_estimator_.predict(X_test),
  labels = digits.target_names
)
```

```
## array([[53,  0,  0,  0,  0,  0,  0,  0,  0,  0],
##        [ 0, 64,  0,  0,  0,  0,  0,  0,  0,  0],
##        [ 0,  2, 56,  0,  0,  0,  0,  0,  0,  0],
##        [ 0,  0,  1, 58,  0,  1,  0,  0,  0,  0],
##        [ 1,  0,  0,  0, 69,  0,  0,  0,  1,  0],
##        [ 0,  0,  0,  1,  1, 64,  2,  0,  0,  0],
##        [ 1,  1,  0,  0,  0,  0, 55,  0,  0,  0],
##        [ 0,  0,  0,  0,  2,  0,  0, 53,  0,  0],
##        [ 0,  5,  2,  0,  0,  0,  0,  0, 46,  2],
##        [ 0,  0,  0,  0,  0,  1,  0,  0,  1, 51]])
```
]

---

## Report

```python
print( classification_report(
  y_test, 
  mc_log_cv.best_estimator_.predict(X_test)
) )
```

```
##               precision    recall  f1-score   support
## 
##            0       0.96      1.00      0.98        53
##            1       0.89      1.00      0.94        64
##            2       0.95      0.97      0.96        58
##            3       0.98      0.97      0.97        60
##            4       0.96      0.97      0.97        71
##            5       0.97      0.94      0.96        68
##            6       0.96      0.96      0.96        57
##            7       1.00      0.96      0.98        55
##            8       0.96      0.84      0.89        55
##            9       0.96      0.96      0.96        53
## 
##     accuracy                           0.96       594
##    macro avg       0.96      0.96      0.96       594
## weighted avg       0.96      0.96      0.96       594
```

---

## ROC & AUC?

These metrics are slightly awkward to use in the case multiclass problems since they depend on the probability predictions to calculate.

```python
roc_auc_score(
  y_test, mc_log_cv.best_estimator_.predict_proba(X_test)
)
```

```
## ValueError: multi_class must be in ('ovo', 'ovr')
```

.pull-left[

```python
roc_auc_score(
  y_test, mc_log_cv.best_estimator_.predict_proba(X_test),
  multi_class = "ovr"
)
```

```
## 0.9979624274858663
```

```python
roc_auc_score(
  y_test, mc_log_cv.best_estimator_.predict_proba(X_test),
  multi_class = "ovo"
)
```

```
## 0.9979645359400721
```
]

.pull-right[

```python
roc_auc_score(
  y_test, mc_log_cv.best_estimator_.predict_proba(X_test),
  multi_class = "ovr", average = "weighted"
)
```

```
## 0.9979869175119241
```

```python
roc_auc_score(
  y_test, mc_log_cv.best_estimator_.predict_proba(X_test),
  multi_class = "ovo", average = "weighted"
)
```

```
## 0.9979743498851119
```
]

---

## Prediction

.pull-left[ .small[

```python
mc_log_cv.best_estimator_.predict(X_test)
```

```
## array([7, 1, 7, 6, 0, 2, 4, 3, 6, 3, 7, 8, 7, 9, 4, 3, 1, 7, 8, 4, 0, 3, 9, 1, 3, 6, 6, 0, 5, 4, 1, 2, 1, 2, 3, 2, 7, 6, 4, 8, 6, 4, 4, 0, 9, 1, 9, 5, 4, 4, 4, 1, 7, 6, 9, 2, 9, 9, 9, 0, 8, 3, 1, 8,
##        8, 1, 3, 9, 1, 3, 9, 6, 9, 5, 2, 1, 9, 2, 1, 3, 8, 7, 3, 3, 2, 7, 7, 5, 8, 2, 6, 1, 9, 1, 6, 4, 5, 2, 2, 4, 5, 4, 4, 6, 5, 9, 2, 4, 1, 0, 7, 6, 1, 2, 9, 5, 2, 5, 0, 3, 2, 7, 6, 4, 8, 2, 1, 1,
##        6, 4, 6, 2, 3, 4, 7, 5, 0, 9, 1, 0, 5, 6, 7, 6, 3, 8, 3, 2, 0, 4, 0, 1, 5, 4, 6, 1, 1, 1, 6, 1, 7, 9, 0, 7, 9, 5, 4, 1, 3, 8, 6, 4, 7, 1, 5, 7, 4, 7, 4, 5, 2, 2, 1, 1, 4, 4, 3, 5, 6, 9, 4, 5,
##        5, 9, 3, 9, 3, 1, 2, 0, 8, 2, 8, 5, 2, 4, 6, 8, 3, 9, 1, 0, 8, 1, 8, 5, 6, 8, 7, 1, 8, 2, 4, 9, 7, 0, 5, 5, 6, 1, 3, 0, 5, 8, 2, 0, 9, 8, 6, 7, 8, 4, 1, 0, 5, 2, 5, 1, 6, 4, 7, 1, 2, 6, 4, 4,
##        6, 3, 2, 3, 2, 6, 5, 2, 9, 4, 7, 0, 1, 0, 4, 3, 1, 2, 7, 9, 8, 5, 9, 5, 7, 0, 4, 8, 4, 9, 4, 0, 7, 7, 2, 5, 3, 5, 3, 9, 7, 5, 5, 2, 7, 0, 8, 9, 1, 7, 9, 8, 5, 0, 2, 0, 8, 7, 0, 9, 5, 5, 9, 6,
##        1, 2, 3, 9, 1, 3, 2, 9, 3, 4, 3, 4, 1, 0, 1, 8, 5, 0, 9, 2, 7, 2, 3, 5, 2, 6, 3, 4, 1, 5, 0, 5, 4, 6, 3, 2, 5, 0, 4, 3, 6, 0, 8, 6, 0, 0, 2, 2, 0, 1, 4, 6, 5, 0, 9, 5, 6, 8, 4, 4, 2, 8, 2, 9,
##        4, 7, 3, 8, 6, 3, 8, 6, 4, 7, 0, 6, 6, 8, 3, 8, 3, 8, 0, 1, 1, 5, 6, 8, 2, 2, 7, 6, 4, 0, 0, 2, 2, 9, 5, 8, 6, 7, 6, 4, 9, 6, 7, 2, 9, 2, 4, 9, 1, 3, 7, 8, 5, 3, 4, 3, 9, 1, 9, 1, 9, 2, 3, 5,
##        8, 1, 1, 7, 1, 7, 1, 6, 4, 5, 5, 5, 3, 1, 0, 4, 4, 6, 9, 0, 4, 2, 3, 5, 7, 9, 6, 4, 7, 5, 3, 8, 0, 6, 6, 4, 4, 3, 7, 4, 0, 4, 7, 4, 0, 9, 4, 5, 8, 6, 3, 4, 0, 5, 4, 2, 3, 3, 2, 1, 7, 9, 7, 3,
##        1, 1, 4, 3, 0, 5, 9, 5, 5, 7, 5, 0, 6, 1, 5, 7, 9, 0, 8, 3, 1, 3, 1, 5, 2, 3, 0, 1, 8, 7, 8, 0, 5, 5, 1, 8, 8, 3, 6, 0, 2, 7, 1, 6, 2, 4, 5, 1, 3, 0, 5, 5, 3, 8, 4, 0, 0, 1, 1, 4, 8, 7, 6, 1,
##        1, 5, 2, 1, 6, 4, 2, 1, 1, 9, 4, 3, 9, 6, 5, 0, 4, 7])
```
] ]

.pull-right[ .small[

```python
mc_log_cv.best_estimator_.predict_proba(X_test),
```

```
## (array([[0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     ],
##        [0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ],
##        [1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     ],
##        [0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.71887, 0.     , 0.28113, 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     ],
##        [0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ],
##        [1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        ...,
##        [0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     ],
##        [0.     , 0.0002 , 0.     , 0.     , 0.     , 0.     , 0.9998 , 0.     , 0.     , 0.     ],
##        [0.     , 0.99893, 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.00107, 0.     ],
##        [0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     ],
##        [0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ],
##        [1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     , 0.     ],
##        [0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     , 0.     ]]),)
```
] ]

---

## Exercise 1

Using these data fit a `DecisionTreeClassifier` to these data, you should employ `GridSearchCV` to tune some of the parameters (`max_depth` at a minimum) - see the full list [here](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html).

Does this model perform better or worse than the multinomial regression model we just used?

```python
from sklearn.datasets import load_digits
digits = load_digits(as_frame=True)

X, y = digits.data, digits.target
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, shuffle=True, random_state=1234
)
```

---

## Examining the coefs

.small[

```python
coef_img = mc_log_cv.best_estimator_.coef_.reshape(10,8,8)

fig, axes = plt.subplots(nrows=2, ncols=5, figsize=(10, 5), layout="constrained")
axes2 = [ax for row in axes for ax in row]

for ax, image, label in zip(axes2, coef_img, range(10)):
    ax.set_axis_off()
    img = ax.imshow(image, cmap=plt.cm.gray_r, interpolation="nearest")
    txt = ax.set_title(f"{label}")
    
plt.show()
```

<img src="Lec16_files/figure-html/unnamed-chunk-41-13.png" width="66%" style="display: block; margin: auto;" />
]