class: center, middle, inverse, title-slide # Lec 06 - Advanced indexing & Broadcasting ##
Statistical Computing and Computation ### Sta 663 | Spring 2022 ###
Dr. Colin Rundel --- exclude: true ```python import numpy as np np.set_printoptions(edgeitems=3, linewidth=180) ``` --- class: middle, center # NumPy - Advanced Indexing --- ## From last time: subsetting with tuples Unlike lists, an ndarray can be subset by a tuple containing integers, .pull-left[ ```python x = np.arange(6) x ``` ``` ## array([0, 1, 2, 3, 4, 5]) ``` ```python x[(0,1,3),] ``` ``` ## array([0, 1, 3]) ``` ```python x[(0,1,3)] ``` ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed ## ## Detailed traceback: ## File "<string>", line 1, in <module> ``` ```python x[0,1,3] ``` ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed ## ## Detailed traceback: ## File "<string>", line 1, in <module> ``` ] .pull-right[ ```python x = np.arange(16).reshape((4,4)) x ``` ``` ## array([[ 0, 1, 2, 3], ## [ 4, 5, 6, 7], ## [ 8, 9, 10, 11], ## [12, 13, 14, 15]]) ``` ```python x[(0,1,3), :] ``` ``` ## array([[ 0, 1, 2, 3], ## [ 4, 5, 6, 7], ## [12, 13, 14, 15]]) ``` ```python x[:, (0,1,3)] ``` ``` ## array([[ 0, 1, 3], ## [ 4, 5, 7], ## [ 8, 9, 11], ## [12, 13, 15]]) ``` ```python x[(0,1,3), (0,1,3)] ``` ``` ## array([ 0, 5, 15]) ``` ] .footnote[ More next time on why `x[(0,1,3)]` does not work. ] --- ## Integer array subsetting (lists) Lists of integers can be used to subset in the same way: .small[ .pull-left[ ```python x = np.arange(6) x ``` ``` ## array([0, 1, 2, 3, 4, 5]) ``` ```python x[[0,1,3],] ``` ``` ## array([0, 1, 3]) ``` ```python x[[0,1,3]] ``` ``` ## array([0, 1, 3]) ``` ```python x[[1.,3.]] ``` ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices ## ## Detailed traceback: ## File "<string>", line 1, in <module> ``` ] .pull-right[ ```python x = np.arange(16).reshape((4,4)) x ``` ``` ## array([[ 0, 1, 2, 3], ## [ 4, 5, 6, 7], ## [ 8, 9, 10, 11], ## [12, 13, 14, 15]]) ``` ```python x[[1,3]] ``` ``` ## array([[ 4, 5, 6, 7], ## [12, 13, 14, 15]]) ``` ```python x[[1,3], ] ``` ``` ## array([[ 4, 5, 6, 7], ## [12, 13, 14, 15]]) ``` ```python x[:, [1,3]] ``` ``` ## array([[ 1, 3], ## [ 5, 7], ## [ 9, 11], ## [13, 15]]) ``` ```python x[[1,3], [1,3]] ``` ``` ## array([ 5, 15]) ``` ] ] .footnote[ Note that the `,` is now optional ] --- ## Integer array subsetting (ndarrays) Similarly we can also us integer ndarrays: .small[ .pull-left[ ```python x = np.arange(6) y = np.array([0,1,3]) z = np.array([1., 3.]) x[y,] ``` ``` ## array([0, 1, 3]) ``` ```python x[y] ``` ``` ## array([0, 1, 3]) ``` ```python x[z] ``` ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): IndexError: arrays used as indices must be of integer (or boolean) type ## ## Detailed traceback: ## File "<string>", line 1, in <module> ``` ] .pull-right[ ```python x = np.arange(16).reshape((4,4)) x ``` ``` ## array([[ 0, 1, 2, 3], ## [ 4, 5, 6, 7], ## [ 8, 9, 10, 11], ## [12, 13, 14, 15]]) ``` ```python y = np.array([1,3]) x[y] ``` ``` ## array([[ 4, 5, 6, 7], ## [12, 13, 14, 15]]) ``` ```python x[y, ] ``` ``` ## array([[ 4, 5, 6, 7], ## [12, 13, 14, 15]]) ``` ```python x[:, y] ``` ``` ## array([[ 1, 3], ## [ 5, 7], ## [ 9, 11], ## [13, 15]]) ``` ```python x[y, y] ``` ``` ## array([ 5, 15]) ``` ] ] .footnote[ Again the `,` is now optional ] --- ## Exercise 1 Given the following matrix, ```python x = np.arange(16).reshape((4,4)) x ``` ``` ## array([[ 0, 1, 2, 3], ## [ 4, 5, 6, 7], ## [ 8, 9, 10, 11], ## [12, 13, 14, 15]]) ``` write an expression to obtaint the center 2x2 values (i.e. 5, 6, 9, 10 as a matrix). --- ## Boolean indexing Lists or ndarrays of boolean values can also be used to subset, positions with `True` are kept and `False` are discarded. .small[ ```python x = np.arange(6) x ``` ``` ## array([0, 1, 2, 3, 4, 5]) ``` ```python x[[True, False, True, False, True, False]] ``` ``` ## array([0, 2, 4]) ``` ```python x[np.array([True, True, False, False, True, False])] ``` ``` ## array([0, 1, 4]) ``` ] -- the utility comes from vectorized comparison operations, .small[.pull-left[ ```python x > 3 ``` ``` ## array([False, False, False, False, True, True]) ``` ```python x[x>3] ``` ``` ## array([4, 5]) ``` ```python x % 2 == 1 ``` ``` ## array([False, True, False, True, False, True]) ``` ```python x[x % 2 == 1] ``` ``` ## array([1, 3, 5]) ``` ] .pull-right[ ```python y = np.arange(9).reshape((3,3)) y % 2 == 0 ``` ``` ## array([[ True, False, True], ## [False, True, False], ## [ True, False, True]]) ``` ```python y[y % 2 == 0] ``` ``` ## array([0, 2, 4, 6, 8]) ``` ] ] --- ## NumPy and Boolean operators If we want to use a boolean operator on an array we need to use `&`, `|`, and `~` instead of `and`, `or`, and `not` respectively. ```python x = np.arange(6) x ``` ``` ## array([0, 1, 2, 3, 4, 5]) ``` ```python y = x % 2 == 0 y ``` ``` ## array([ True, False, True, False, True, False]) ``` ```python ~y ``` ``` ## array([False, True, False, True, False, True]) ``` ```python y & (x > 3) ``` ``` ## array([False, False, False, False, True, False]) ``` ```python y | (x > 3) ``` ``` ## array([ True, False, True, False, True, True]) ``` --- ## meshgrid One other useful function in NumPy is `meshgrid()` which generates all possible combinations between the input vectors, ```python pts = np.arange(3) x, y = np.meshgrid(pts, pts) x ``` ``` ## array([[0, 1, 2], ## [0, 1, 2], ## [0, 1, 2]]) ``` ```python y ``` ``` ## array([[0, 0, 0], ## [1, 1, 1], ## [2, 2, 2]]) ``` ```python np.sqrt(x**2 + y**2) ``` ``` ## array([[0. , 1. , 2. ], ## [1. , 1.41421356, 2.23606798], ## [2. , 2.23606798, 2.82842712]]) ``` --- ## Exercise 2 We will now use this to attempt a simple brute force approach to numerical optimization, define a grid of points using `meshgrid()` to approximate the minima the following function: $$ f(x,y) = (1-x)^2 + 100(y-x^2)^2 $$ Considering values of `\(x,y \in (-1,3)\)`, which values of `\(x,y\)` minimize this function? --- <img src="Lec06_files/figure-html/unnamed-chunk-14-1.png" width="60%" style="display: block; margin: auto;" /> --- class: middle, center # NumPy - Broadcasting --- ## Broadcasting This is an approach for deciding how to generalize arithmetic operations between arrays with differing shapes. ```python x = np.array([1, 2, 3]) x * 2 ``` ``` ## array([2, 4, 6]) ``` ```python x * np.array([2]) ``` ``` ## array([2, 4, 6]) ``` ```python x * np.array([2,2,2]) ``` ``` ## array([2, 4, 6]) ``` In the first example `2` is equivalent to the array `np.array([2])` which is being broadcast across the longer array `x`. --- ## Efficiancy Using broadcasts can be much more efficient as it does not copy the underlying data, ```python x = np.arange(1e5) y = np.array([2]).repeat(1e5) ``` ```python %timeit x * 2 ``` ``` 31.3 µs ± 1.3 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) ``` ```python %timeit x * y ``` ``` 70.5 µs ± 2.93 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) ``` --- ## General Broadcasting > When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when > 1. they are equal, or > > 2. one of them is 1 > > If these conditions are not met, a `ValueError: operands could not be broadcast together` exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the size that is not 1 along each axis of the inputs. .small[ .pull-left[ ```python x = np.arange(12).reshape((4,3)) x ``` ``` ## array([[ 0, 1, 2], ## [ 3, 4, 5], ## [ 6, 7, 8], ## [ 9, 10, 11]]) ``` ```python x + np.array([1,2,3]) ``` ``` ## array([[ 1, 3, 5], ## [ 4, 6, 8], ## [ 7, 9, 11], ## [10, 12, 14]]) ``` ] .pull-right[ ```python x = np.arange(12).reshape((3,4)) x ``` ``` ## array([[ 0, 1, 2, 3], ## [ 4, 5, 6, 7], ## [ 8, 9, 10, 11]]) ``` ```python x + np.array([1,2,3]) ``` ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): ValueError: operands could not be broadcast together with shapes (3,4) (3,) ## ## Detailed traceback: ## File "<string>", line 1, in <module> ``` ] ] --- ## A quick fix ```python x = np.arange(12).reshape((3,4)) x ``` ``` ## array([[ 0, 1, 2, 3], ## [ 4, 5, 6, 7], ## [ 8, 9, 10, 11]]) ``` ```python x + np.array([1,2,3]) ``` ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): ValueError: operands could not be broadcast together with shapes (3,4) (3,) ## ## Detailed traceback: ## File "<string>", line 1, in <module> ``` ```python x + np.array([1,2,3]).reshape(3,1) ``` ``` ## array([[ 1, 2, 3, 4], ## [ 6, 7, 8, 9], ## [11, 12, 13, 14]]) ``` --- ## Mechanics .pull-left[ ```python x = np.arange(12).reshape((4,3)) y = 1 x+y ``` ``` ## array([[ 1, 2, 3], ## [ 4, 5, 6], ## [ 7, 8, 9], ## [10, 11, 12]]) ``` ``` x (2d array): 4 x 3 y (1d array): 1 ---------------------- x+y (2d array): 4 x 3 ``` ```python x = np.arange(12).reshape((4,3)) y = np.array([1,2,3]) x+y ``` ``` ## array([[ 1, 3, 5], ## [ 4, 6, 8], ## [ 7, 9, 11], ## [10, 12, 14]]) ``` ``` x (2d array): 4 x 3 y (1d array): 3 ---------------------- x+y (2d array): 4 x 3 ``` ] -- .pull-right[ ```python x = np.arange(12).reshape((3,4)) y = np.array([1,2,3]) x+y ``` ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): ValueError: operands could not be broadcast together with shapes (3,4) (3,) ## ## Detailed traceback: ## File "<string>", line 1, in <module> ``` ``` x (2d array): 3 x 4 y (1d array): 3 ---------------------- x+y (2d array): Error ``` ```python x = np.arange(12).reshape((3,4)) y = np.array([1,2,3]).reshape((3,1)) x+y ``` ``` ## array([[ 1, 2, 3, 4], ## [ 6, 7, 8, 9], ## [11, 12, 13, 14]]) ``` ``` x (2d array): 3 x 4 y (1d array): 3 x 1 ---------------------- x+y (2d array): 3 x 4 ``` ] --- ## Another example .small[ .pull-left[ ```python a = np.array([0,10,20,30]).reshape((4,1)) b = np.array([1,2,3]) a ``` ``` ## array([[ 0], ## [10], ## [20], ## [30]]) ``` ```python b ``` ``` ## array([1, 2, 3]) ``` ] .pull-right[ ```python a+b ``` ``` ## array([[ 1, 2, 3], ## [11, 12, 13], ## [21, 22, 23], ## [31, 32, 33]]) ``` ``` a (2d array): 4 x 1 b (1d array): 3 ---------------------- x+y (2d array): 4 x 3 ``` ] ] -- <img src="imgs/numpy_broadcasting.png" width="60%" style="display: block; margin: auto;" /> .footnote[ From NumPy user guide - [Broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html) ] --- ## Example - Standardizing Below we generate a data set with 3 columns of random normal values. Each column has a different mean and standard deviation which we can check with `mean()` and `std()`. ```python rng = np.random.default_rng(1234) d = rng.normal(loc=[-1,0,1], scale=[1,2,3], size=(1000,3)) d.mean(axis=0) ``` ``` ## array([-1.0294382 , -0.01396257, 1.01241784]) ``` ```python d.std(axis=0) ``` ``` ## array([0.99674719, 2.03222595, 3.10625219]) ``` Use broadcasting to standardize all three columns to have mean 0 and standard deviation 1. Check the new data set using `mean()` and `std()`. --- ## Exercise 3 For each of the following combinations determine what the resulting dimension will be: * A (128 x 128 x 3) + B (3) * A (8 x 1 x 6 x 1) + B (7 x 1 x 5) * A (2 x 1) + B (8 x 4 x 3) * A (3 x 1) + B (15 x 3 x 5) * A (3) + B (4) --- ## Broadcasting and assignment In addition to arithmetic operators, broadcasting can be used with assignment via array indexing, ```python x = np.arange(12).reshape((3,4)) y = -np.arange(4) z = -np.arange(3) ``` .pull-left[ ```python x[:] = y x ``` ``` ## array([[ 0, -1, -2, -3], ## [ 0, -1, -2, -3], ## [ 0, -1, -2, -3]]) ``` ```python x[...] = y x ``` ``` ## array([[ 0, -1, -2, -3], ## [ 0, -1, -2, -3], ## [ 0, -1, -2, -3]]) ``` ] .pull-right[ ```python x[:] = z ``` ``` ## Error in py_call_impl(callable, dots$args, dots$keywords): ValueError: could not broadcast input array from shape (3,) into shape (3,4) ## ## Detailed traceback: ## File "<string>", line 1, in <module> ``` ```python x[:] = z.reshape((3,1)) x ``` ``` ## array([[ 0, 0, 0, 0], ## [-1, -1, -1, -1], ## [-2, -2, -2, -2]]) ``` ] --- class: middle, center # NumPy - Basic file IO --- ## Reading and writing arrays We will not spend much time on this as most data you will encounter is more likely to be a tabular format (e.g. data frame) and tools like Pandas are more appropriate. For basic saving and loading of NumPy arrays there are the `save()` and `load()` functions which use a built in binary format. ```python x = np.arange(1e5) np.save("data/x.npy", x) new_x = np.load("data/x.npy") np.all(x == new_x) ``` ``` ## True ``` Additional functions for saving (`savez()`, `savez_compressed()`, `savetxt()`) exist for saving multiple arrays or saving a text representation of an array. --- ## Reading delimited data While not particularly recommended, if you need to read delimited (csv, tsv, etc.) data into a NumPy array you can use `genfromtxt()`, ```r options(width=300) ``` ```python with open("data/mtcars.csv") as file: mtcars = np.genfromtxt(file, delimiter=",", skip_header=True) mtcars ``` ``` ## array([[ 6. , 160. , 110. , 3.9 , 2.62 , 16.46 , 0. , 1. , 4. , 4. ], ## [ 6. , 160. , 110. , 3.9 , 2.875, 17.02 , 0. , 1. , 4. , 4. ], ## [ 4. , 108. , 93. , 3.85 , 2.32 , 18.61 , 1. , 1. , 4. , 1. ], ## [ 6. , 258. , 110. , 3.08 , 3.215, 19.44 , 1. , 0. , 3. , 1. ], ## [ 8. , 360. , 175. , 3.15 , 3.44 , 17.02 , 0. , 0. , 3. , 2. ], ## [ 6. , 225. , 105. , 2.76 , 3.46 , 20.22 , 1. , 0. , 3. , 1. ], ## [ 8. , 360. , 245. , 3.21 , 3.57 , 15.84 , 0. , 0. , 3. , 4. ], ## [ 4. , 146.7 , 62. , 3.69 , 3.19 , 20. , 1. , 0. , 4. , 2. ], ## [ 4. , 140.8 , 95. , 3.92 , 3.15 , 22.9 , 1. , 0. , 4. , 2. ], ## [ 6. , 167.6 , 123. , 3.92 , 3.44 , 18.3 , 1. , 0. , 4. , 4. ], ## [ 6. , 167.6 , 123. , 3.92 , 3.44 , 18.9 , 1. , 0. , 4. , 4. ], ## [ 8. , 275.8 , 180. , 3.07 , 4.07 , 17.4 , 0. , 0. , 3. , 3. ], ## [ 8. , 275.8 , 180. , 3.07 , 3.73 , 17.6 , 0. , 0. , 3. , 3. ], ## [ 8. , 275.8 , 180. , 3.07 , 3.78 , 18. , 0. , 0. , 3. , 3. ], ## [ 8. , 472. , 205. , 2.93 , 5.25 , 17.98 , 0. , 0. , 3. , 4. ], ## [ 8. , 460. , 215. , 3. , 5.424, 17.82 , 0. , 0. , 3. , 4. ], ## [ 8. , 440. , 230. , 3.23 , 5.345, 17.42 , 0. , 0. , 3. , 4. ], ## [ 4. , 78.7 , 66. , 4.08 , 2.2 , 19.47 , 1. , 1. , 4. , 1. ], ## [ 4. , 75.7 , 52. , 4.93 , 1.615, 18.52 , 1. , 1. , 4. , 2. ], ## [ 4. , 71.1 , 65. , 4.22 , 1.835, 19.9 , 1. , 1. , 4. , 1. ], ## [ 4. , 120.1 , 97. , 3.7 , 2.465, 20.01 , 1. , 0. , 3. , 1. ], ## [ 8. , 318. , 150. , 2.76 , 3.52 , 16.87 , 0. , 0. , 3. , 2. ], ## [ 8. , 304. , 150. , 3.15 , 3.435, 17.3 , 0. , 0. , 3. , 2. ], ## [ 8. , 350. , 245. , 3.73 , 3.84 , 15.41 , 0. , 0. , 3. , 4. ], ## [ 8. , 400. , 175. , 3.08 , 3.845, 17.05 , 0. , 0. , 3. , 2. ], ## [ 4. , 79. , 66. , 4.08 , 1.935, 18.9 , 1. , 1. , 4. , 1. ], ## [ 4. , 120.3 , 91. , 4.43 , 2.14 , 16.7 , 0. , 1. , 5. , 2. ], ## [ 4. , 95.1 , 113. , 3.77 , 1.513, 16.9 , 1. , 1. , 5. , 2. ], ## [ 8. , 351. , 264. , 4.22 , 3.17 , 14.5 , 0. , 1. , 5. , 4. ], ## [ 6. , 145. , 175. , 3.62 , 2.77 , 15.5 , 0. , 1. , 5. , 6. ], ## [ 8. , 301. , 335. , 3.54 , 3.57 , 14.6 , 0. , 1. , 5. , 8. ], ## [ 4. , 121. , 109. , 4.11 , 2.78 , 18.6 , 1. , 1. , 4. , 2. ]]) ```