Lec 05 - NumPy Basics

# Lec 05 - NumPy Basics
## <br/> Statistical Computing and Computation
### Sta 663 | Spring 2022
### <br/> Dr. Colin Rundel

---

---

# NumPy Basics

---

## What is NumPy?

> NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

```python
import numpy as np
```

---

## Arrays

In general NumPy arrays are constructed from sequences (e.g. lists), nesting as necessary for the number of desired dimensions.

```python
np.array([1,2,3])
```

```
## array([1, 2, 3])
```

```python
np.array([[1,2],[3,4]])
```

```
## array([[1, 2],
##        [3, 4]])
```

```python
np.array([[[1,2],[3,4]], [[5,6],[7,8]]])
```

```
## array([[[1, 2],
##         [3, 4]],
## 
##        [[5, 6],
##         [7, 8]]])
```
]

.pull-right[
Some properties of arrays:
* Arrays have a fixed size at creation
* All data must be homogeneous (consistent type)
* Built to support vectorized operations
* Avoids copying whenever possible

```python
np.array([True, False])
```

```
## array([ True, False])
```

```python
np.array(["abc", "def"])
```

```
## array(['abc', 'def'], dtype='<U3')
```
]

---

## dtype

NumPy arrays will have a specific type used for storing their data, called their `dtype`. This is accessible via the `.dtype` attribute and can be set at creation using the `dtype` argument.

```python
np.array([1,1]).dtype
```

```
## dtype('int64')
```

```python
np.array([1.1, 2.2]).dtype
```

```
## dtype('float64')
```

```python
np.array([True, False]).dtype
```

```
## dtype('bool')
```
]

```python
np.array([1,2,3], dtype = np.uint8)
```

```
## array([1, 2, 3], dtype=uint8)
```

```python
np.array([1,2,1000], dtype = np.uint8)
```

```
## array([  1,   2, 232], dtype=uint8)
```

```python
np.array([3.14159, 2.33333], dtype = np.double)
```

```
## array([3.14159, 2.33333])
```

```python
np.array([3.14159, 2.33333], dtype = np.float16)
```

```
## array([3.14 , 2.334], dtype=float16)
```
]

.footnote[
See here for [here](https://numpy.org/doc/stable/user/basics.types.html#array-types-and-conversions-between-types) for a list of dtypes and [here](https://numpy.org/doc/stable/reference/arrays.dtypes.html) for a more detailed description of how they are implemented.
]

---

## Creating 1d arrays

Some common tools for creating useful 1d arrays:

```python
np.arange(10)
```

```
## array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
```

```python
np.arange(3, 5, 0.25)
```

```
## array([3.  , 3.25, 3.5 , 3.75, 4.  , 4.25, 4.5 , 4.75])
```

```python
np.linspace(0, 1, 11)
```

```
## array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
```

```python
np.logspace(0, 2, 11)
```

```
## array([  1.        ,   1.58489319,   2.51188643,   3.98107171,
##          6.30957344,  10.        ,  15.84893192,  25.11886432,
##         39.81071706,  63.09573445, 100.        ])
```
]

```python
np.ones(4)
```

```
## array([1., 1., 1., 1.])
```

```python
np.zeros(6)
```

```
## array([0., 0., 0., 0., 0., 0.])
```

```python
np.full(3, False)
```

```
## array([False, False, False])
```

```python
np.empty(4)
```

```
## array([1., 1., 1., 1.])
```
]

.footnote[
For the full list of creation functions see [here](https://numpy.org/doc/stable/reference/routines.array-creation.html)
]

---

## Creating 2d arrays (matrices)

Many of the same functions exist with some additional useful tools for common matrices,

```python
np.eye(3)
```

```
## array([[1., 0., 0.],
##        [0., 1., 0.],
##        [0., 0., 1.]])
```

```python
np.identity(2)
```

```
## array([[1., 0.],
##        [0., 1.]])
```

```python
np.zeros((2,2))
```

```
## array([[0., 0.],
##        [0., 0.]])
```
]

```python
np.diag([3,2,1])
```

```
## array([[3, 0, 0],
##        [0, 2, 0],
##        [0, 0, 1]])
```

```python
np.tri(3)
```

```
## array([[1., 0., 0.],
##        [1., 1., 0.],
##        [1., 1., 1.]])
```

```python
np.triu(np.full((3,3),3))
```

```
## array([[3, 3, 3],
##        [0, 3, 3],
##        [0, 0, 3]])
```
]

.footnote[
The NumPy documentation references a `matrix` class and related functions - this is no longer recommended, use the `ndarray` class instead.
]

---

## Creating nd arrays

For higher dimensional arrays just add dimensions when constructing,

```python
np.zeros((2,3,2))
```

```
## array([[[0., 0.],
##         [0., 0.],
##         [0., 0.]],
## 
##        [[0., 0.],
##         [0., 0.],
##         [0., 0.]]])
```
]

```python
np.ones((2,3,2,2))
```

```
## array([[[[1., 1.],
##          [1., 1.]],
## 
##         [[1., 1.],
##          [1., 1.]],
## 
##         [[1., 1.],
##          [1., 1.]]],
## 
## 
##        [[[1., 1.],
##          [1., 1.]],
## 
##         [[1., 1.],
##          [1., 1.]],
## 
##         [[1., 1.],
##          [1., 1.]]]])
```
]

---

## Subsetting

Arrays are subsetted using the standard python syntax with either indexes or slices, different dimensions are separated by commas.

```python
x = np.array([[1,2,3],[4,5,6],[7,8,9]])
x
```

```
## array([[1, 2, 3],
##        [4, 5, 6],
##        [7, 8, 9]])
```

```python
x[0]
```

```
## array([1, 2, 3])
```

```python
x[0,0]
```

```
## 1
```

```python
x[0][0]
```

```
## 1
```

```python
x[0:3:2, :]
```

```
## array([[1, 2, 3],
##        [7, 8, 9]])
```

```python
x[0:3:2, :]
```

```
## array([[1, 2, 3],
##        [7, 8, 9]])
```

```python
x[0:3:2, ]
```

```
## array([[1, 2, 3],
##        [7, 8, 9]])
```

```python
x[1:, ::-1]
```

```
## array([[6, 5, 4],
##        [9, 8, 7]])
```

---

## Views and copies

Basic subsetting of ndarray objects does not result in a new object, but instead a "view" of the original object. There are a couple of ways that we can investigate this behavior,

```python
x = np.arange(10)
y = x[2:5]
z = x[2:5].copy()
```

```python
print("x =", x, ", x.base =", x.base)
```

```
## x = [0 1 2 3 4 5 6 7 8 9] , x.base = None
```

```python
print("y =", y, ", y.base =", y.base)
```

```
## y = [2 3 4] , y.base = [0 1 2 3 4 5 6 7 8 9]
```

```python
print("z =", z, ", z.base =", z.base)
```

```
## z = [2 3 4] , z.base = None
```

```python
type(x); type(y); type(z)
```

```
## <class 'numpy.ndarray'>
## <class 'numpy.ndarray'>
## <class 'numpy.ndarray'>
```
]

```python
np.shares_memory(x,y)
```

```
## True
```

```python
np.shares_memory(x,z)
```

```
## False
```

```python
np.shares_memory(y,z)
```

```
## False
```

```python
y.flags
```

```
##   C_CONTIGUOUS : True
##   F_CONTIGUOUS : True
##   OWNDATA : False
##   WRITEABLE : True
##   ALIGNED : True
##   WRITEBACKIFCOPY : False
##   UPDATEIFCOPY : False
```
]

---

## Subsetting with ...

There is some special syntax available using `...` which expands to the number of `:` needed to account for all dimensions,

```python
x = np.arange(16).reshape(2,2,2,2)
x
```
]

```
## array([[[[ 0,  1],
##          [ 2,  3]],
## 
##         [[ 4,  5],
##          [ 6,  7]]],
## 
## 
##        [[[ 8,  9],
##          [10, 11]],
## 
##         [[12, 13],
##          [14, 15]]]])
```
]

<div>
.pull-left[

```python
x[0, 1, ...]
```

```
## array([[4, 5],
##        [6, 7]])
```

```python
x[..., 1]
```

```
## array([[[ 1,  3],
##         [ 5,  7]],
## 
##        [[ 9, 11],
##         [13, 15]]])
```
]

```python
x[0, 1, :, :]
```

```
## array([[4, 5],
##        [6, 7]])
```

```python
x[:, :, :, 1]
```

```
## array([[[ 1,  3],
##         [ 5,  7]],
## 
##        [[ 9, 11],
##         [13, 15]]])
```
]
</div>

---

## Subsetting with tuples

Unlike lists, an ndarray can be subset by a tuple containing integers,

```python
x = np.arange(6)
x
```

```
## array([0, 1, 2, 3, 4, 5])
```

```python
x[(0,1,3),]
```

```
## array([0, 1, 3])
```

```python
x[(0,1,3)]
```

```
## Error in py_call_impl(callable, dots$args, dots$keywords): IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed
## 
## Detailed traceback:
##   File "<string>", line 1, in <module>
```
]

```python
x = np.arange(16).reshape((4,4))
x
```

```
## array([[ 0,  1,  2,  3],
##        [ 4,  5,  6,  7],
##        [ 8,  9, 10, 11],
##        [12, 13, 14, 15]])
```

```python
x[(0,1,3), :]
```

```
## array([[ 0,  1,  2,  3],
##        [ 4,  5,  6,  7],
##        [12, 13, 14, 15]])
```

```python
x[:, (0,1,3)]
```

```
## array([[ 0,  1,  3],
##        [ 4,  5,  7],
##        [ 8,  9, 11],
##        [12, 13, 15]])
```

```python
x[(0,1,3), (0,1,3)]
```

```
## array([ 0,  5, 15])
```
]

---

## Subsetting assignment

Most of the subsetting approaches we've just seen can also be used for assignment, just keep in mind that we cannot change the size or type of the ndarray,

```python
x = np.arange(9).reshape((3,3)); x
```

```
## array([[0, 1, 2],
##        [3, 4, 5],
##        [6, 7, 8]])
```

```python
x[0,0] = -1
x
```

```
## array([[-1,  1,  2],
##        [ 3,  4,  5],
##        [ 6,  7,  8]])
```

```python
x[0, :] = -2
x
```

```
## array([[-2, -2, -2],
##        [ 3,  4,  5],
##        [ 6,  7,  8]])
```
]

```python
x[0:2,1:3] = -3
x
```

```
## array([[-2, -3, -3],
##        [ 3, -3, -3],
##        [ 6,  7,  8]])
```

```python
x[(0,1,2), (0,1,2)] = -4
x
```

```
## array([[-4, -3, -3],
##        [ 3, -4, -3],
##        [ 6,  7, -4]])
```
]

---

## Reshaping arrays

The dimensions of an array can be retrieved via the `.shape` attribute, these values can changed by

```python
x = np.arange(6)
x
```

```
## array([0, 1, 2, 3, 4, 5])
```

```python
y = x.reshape((2,3))
y
```

```
## array([[0, 1, 2],
##        [3, 4, 5]])
```

```python
np.shares_memory(x,y)
```

```
## True
```
]

```python
z = x
z.shape = (2,3)
z
```

```
## array([[0, 1, 2],
##        [3, 4, 5]])
```

```python
x
```

```
## array([[0, 1, 2],
##        [3, 4, 5]])
```

```python
np.shares_memory(x,z)
```

```
## True
```
]

---

## Implicit dimensions

When reshaping an array, the value `-1` can be used to automatically calculate a dimension,

```python
x = np.arange(6)
x
```

```
## array([0, 1, 2, 3, 4, 5])
```

```python
x.reshape((2,-1))
```

```
## array([[0, 1, 2],
##        [3, 4, 5]])
```

```python
x.reshape((-1,3,2))
```

```
## array([[[0, 1],
##         [2, 3],
##         [4, 5]]])
```

```python
x.reshape(-1)
```

```
## array([0, 1, 2, 3, 4, 5])
```

```python
x.reshape((-1,4))
```

```
## Error in py_call_impl(callable, dots$args, dots$keywords): ValueError: cannot reshape array of size 6 into shape (4)
## 
## Detailed traceback:
##   File "<string>", line 1, in <module>
```

---

## Flattening arrays

We've just seen the most common approach to flattening an array (`.reshape(-1)`), there are two additional methods / functions:
* `ravel` which creates flattened *view* of the array and
* `flatten` which creates a flattened *copy* of the array.

```python
x = np.arange(6).reshape((2,3))
x
```

```
## array([[0, 1, 2],
##        [3, 4, 5]])
```

```python
y = x.ravel()
y
```

```
## array([0, 1, 2, 3, 4, 5])
```

```python
np.shares_memory(x,y)
```

```
## True
```
]

```python
z = x.flatten()
z
```

```
## array([0, 1, 2, 3, 4, 5])
```

```python
np.shares_memory(x,z)
```

```
## False
```
]

---

## Resizing

The size of an array cannot be changed but a new array with a different size can be created from an existing array via the resize function and method. Note these have different behaviors around what values the new entries will have.

```python
x = np.resize(np.ones((2,2)), (3,3))
x
```

```
## array([[1., 1., 1.],
##        [1., 1., 1.],
##        [1., 1., 1.]])
```
]

```python
y = np.ones((2,2))
y.resize((3,3), refcheck=False)
y
```

```
## array([[1., 1., 1.],
##        [1., 0., 0.],
##        [0., 0., 0.]])
```
]

.footnote[
The `refcheck` argument should not generally be needed, here it is a side effect of using reticulate (I think).
]

---

## Joining arrays

`concatenate()` is a general purpose function for this, with specialized versions `hstack()`, `vstack()`, and `dstack()` for rows, columns, and slices respectively.

```python
x = np.arange(4).reshape((2,2)); x
```

```
## array([[0, 1],
##        [2, 3]])
```
]

```python
y = np.arange(4,8).reshape((2,2)); y
```

```
## array([[4, 5],
##        [6, 7]])
```
]
]

<div>
.small[
.pull-left[

```python
np.concatenate((x,y), axis=0)
```

```
## array([[0, 1],
##        [2, 3],
##        [4, 5],
##        [6, 7]])
```

```python
np.concatenate((x,y), axis=1)
```

```
## array([[0, 1, 4, 5],
##        [2, 3, 6, 7]])
```

```python
np.concatenate((x,y), axis=2)
```

```
## Error in py_call_impl(callable, dots$args, dots$keywords): AxisError: axis 2 is out of bounds for array of dimension 2
## 
## Detailed traceback:
##   File "<string>", line 1, in <module>
##   File "<__array_function__ internals>", line 180, in concatenate
```

```python
np.concatenate((x,y), axis=None)
```

```
## array([0, 1, 2, 3, 4, 5, 6, 7])
```
]

```python
np.vstack((x,y))
```

```
## array([[0, 1],
##        [2, 3],
##        [4, 5],
##        [6, 7]])
```

```python
np.hstack((x,y))
```

```
## array([[0, 1, 4, 5],
##        [2, 3, 6, 7]])
```

```python
np.dstack((x,y))
```

```
## array([[[0, 4],
##         [1, 5]],
## 
##        [[2, 6],
##         [3, 7]]])
```
]
]
</div>

---
class: middle, center

# NumPy numerics

---

## Basic operators

All of the basic mathematical operators in Python are implemented for arrays, they are applied element-wise to the array values.

```python
np.arange(3) + np.arange(3)
```

```
## array([0, 2, 4])
```

```python
np.arange(3) - np.arange(3)
```

```
## array([0, 0, 0])
```

```python
np.arange(3) + 2
```

```
## array([2, 3, 4])
```
]

```python
np.arange(3) * np.arange(3)
```

```
## array([0, 1, 4])
```

```python
np.arange(1,4) / np.arange(1,4)
```

```
## array([1., 1., 1.])
```

```python
np.arange(3) * 3
```

```
## array([0, 3, 6])
```
]

```python
np.full((2,2), 2) ** np.arange(4).reshape((2,2))
```

```
## array([[1, 2],
##        [4, 8]])
```

```python
np.full((2,2), 2) ** np.arange(4)
```

```
## Error in py_call_impl(callable, dots$args, dots$keywords): ValueError: operands could not be broadcast together with shapes (2,2) (4,) 
## 
## Detailed traceback:
##   File "<string>", line 1, in <module>
```

---

## Mathematical functions

The package provides a [wide variety](https://numpy.org/doc/stable/reference/routines.math.html) of basic mathematical functions that are vectorized, in general they will be faster than their base equivalents (e.g. `np.sum()` vs `sum()`),

```python
np.sum(np.arange(1000))
```

```
## 499500
```

```python
np.cumsum(np.arange(10))
```

```
## array([ 0,  1,  3,  6, 10, 15, 21, 28, 36, 45])
```

```python
np.log10(np.arange(1,11))
```

```
## array([0.        , 0.30103   , 0.47712125, 0.60205999, 0.69897   ,
##        0.77815125, 0.84509804, 0.90308999, 0.95424251, 1.        ])
```

```python
np.median(np.arange(10))
```

```
## 4.5
```

---

## Matrix multiplication

Is supported using the `matmul()` function or the `@` operator,

```python
x = np.arange(6).reshape(3,2)
y = np.tri(2,2)
```

```python
x @ y
```

```
## array([[1., 1.],
##        [5., 3.],
##        [9., 5.]])
```

```python
y.T @ y
```

```
## array([[2., 1.],
##        [1., 1.]])
```

```python
np.matmul(x.T, x)
```

```
## array([[20, 26],
##        [26, 35]])
```

```python
y @ x
```

```
## Error in py_call_impl(callable, dots$args, dots$keywords): ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 2)
## 
## Detailed traceback:
##   File "<string>", line 1, in <module>
```

---

## Other linear algebra functions

All of the other common linear algebra functions are (mostly) implemented in the `linalg` submodule. See [here](https://numpy.org/doc/stable/reference/routines.linalg.html) for more details.

```python
np.linalg.det(y)
```

```
## 1.0
```

```python
np.linalg.eig(x.T @ x)
```

```
## (array([ 0.43988174, 54.56011826]), array([[-0.79911221, -0.6011819 ],
##        [ 0.6011819 , -0.79911221]]))
```

```python
np.linalg.inv(x.T @ x)
```

```
## array([[ 1.45833333, -1.08333333],
##        [-1.08333333,  0.83333333]])
```

```python
np.linalg.cholesky(x.T @ x)
```

```
## array([[4.47213595, 0.        ],
##        [5.81377674, 1.09544512]])
```

---

## Random values

NumPy has another submodule called `random` for functions used to generate random values,

In order to use this, you should construct a generator via `default_rng()`, with or without a seed, and then use the generator's methods to obtain your desired random values.

```python
rng = np.random.default_rng(seed = 1234)

rng.random(3) # ~ Uniform [0,1)
```

```
## array([0.97669977, 0.38019574, 0.92324623])
```

```python
rng.normal(0, 2, size = (2,2))
```

```
## array([[ 0.30523839,  1.72748778],
##        [ 5.82619845, -2.95764672]])
```

```python
rng.binomial(n=5, p=0.5, size = 10)
```

```
## array([2, 4, 2, 2, 3, 4, 4, 3, 3, 3])
```

---
class: middle, center

## Example - Linear regression with NumPy