# Part I: NumPy

## Review of Module 1

Take a few minutes to review the content of Module 1. Module 2 builds on what was covered in Module 1.

* To open a Python interpreter, type `python` in terminal.
* You can also run Python code using jupyter notebooks.
* Arithmetic operations:
    * `+` Plus
    * `-` Minus
    * `*` Multiply
    * `/` Divide
    * `**` Power
    * `//` Divide and floor
    * `%` Mod
* Conditionals
    * `if(expression)`
    * `elif`
    * `else`
* Loops
    * `while(expression)`
    * `for`
* Some types are not compatible with each other.
    * `1+ "123"` is invalid.
* To declare a list
    * `a = [1,2,3]`
* To declare a dictionary
    * `d = {"name": "Student"}`

## 1. Why `NumPy`?

Let's take a look at the example in the video. 

Imagine you want to multiply two lists, `[1,3,4]` and `[5,6,7]`, element by element. We can do this by looping over the elements of each list, multiplying them and then storing them in a new list.

In [2]:
# multiplying two lists element-wise
a = [1, 2, 3]
b = [4, 5, 6]
result = []
for i in range(3):
    result.append(a[i] * b[i])
print(result)

[4, 10, 18]


However, this is pretty cumbersome. `NumPy` allows us to do these types of element-wise operations with ease.

In [3]:
# import NumPy
import numpy as np # we import it "as np" just so that we don't have to type "numpy" all the time.

# note `np.array()` converts a list to an array

result = np.array(a) * np.array(b)
print(result)

[ 4 10 18]


`NumPy` also allows us to store multi-dimensional data in a new data structure called an *array* (see below). For example, in climate science (my field), we are often working with data that has three spatial dimensions (latitude, longitude and height) as well as the time dimension.  

### 1.1. Importing NumPy

To use `NumPy`, we first need to import it. 

In [7]:
# import NumPy
import numpy as np

> *Side note:* You can check the numpy version using `np.__version__`:

In [8]:
# what version are we using? 
np.__version__

'1.18.5'

### 1.2 NDArray
The fundamental data structure within `NumPy` is called an NDArray (ND = *N*-dimensional).

One way to create a NDArray by converting a regular Python list.

In [9]:
a = np.array([9,7,0,1])
print(a)

[9 7 0 1]


#### 1.2.1 Data Types

`dtype` defines what type of data is contained within an array.
> *All elements within a NDArray must be the **same type**, unlike Python lists.*

> **Common data types within NumPy**
  * `int32` - 32-bit integer, -2147483648 to 2147483647
  * `int64` - 64-bit integer, -9223372036854775808 to 9223372036854775807
  * `float32` - Single precision float
  * `float64` - Double precision float
  * `complex64` - Complex number represented by 2 32-bit floats
  * `complex128` - Complex number represented by 2 64-bit floats

To see what type of data is contained with in an array, use the following command.

In [10]:
# data type in an array
a.dtype

dtype('int64')

#### 1.2.2 Shape

Shape defines how many elements an array can contain within each dimension.

To see the shape of an array, use the following command.

In [11]:
# 1 dimensional array with 4 elements
a.shape

(4,)

So, we see that `a` has one dimension with 4 elements. Let's try buidling another array with multiple dimensions. What is the shape `b`?

In [12]:
# dimensional array
b = np.array([[9,7,0,1], [6,3,2,0]])
print(b.shape)

(2, 4)


In general, an $N$-dimensional array has the shape of

> `shape = (x_N, x_{N-1}, x_{N-2}, â€¦, x_1)`

$N$ is the dimension count. $x_i$ is the element count at $i^{th}$ dimension.

So, in the case of the above array, we start from the outer `[]` and work our way in. Within the first set of `[]`, we have 2 lists, so the number of elements in the Nth dimension is 2. Within the next set of `[]` we see that each list consists of 4 elements, so the number of elements in the $N$-1th dimension is 4. Thus, the shape is (2,4).

### 1.3 Array Creation

Above, we created an NDarray by converting lists to arrays using the `np.array()` function. But, `NumPy` also provides many helpful functions to let you quickly create arrays from scratch.

In [13]:
# 3x3 array filled with zeros
zeros = np.zeros((3,3)) 
print(zeros)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


Creating an array of zeros is useful of initializing an array that you will then populate. 

You can also specify the data type. In the example below, we are creating an array of zeros, but we want the data type to be complex.

In [14]:
c_zeros = np.zeros((2,2), dtype=np.complex64) # 2x2 array with complex zeros
print(c_zeros)

[[0.+0.j 0.+0.j]
 [0.+0.j 0.+0.j]]


We can also create an array of ones. This is useful if we need an array of a particular constant. You can simply multiply the entire array by the constant of interest. 

In [15]:
# 4x5 array filled with ones
ones = np.ones((4,5)) 
print(ones)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]


In [16]:
# 4x5 array filled with twos
twos = 2 * np.ones((4,5)) 
print(twos)

[[2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]]


An alternative way to fill an array with a constant is to use `np.full()`.

In [18]:
# 2x7x3 array filled with e (use the np.e function to call the value for e)
e = np.full((2,7,3), np.e)
print(e)

[[[2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]]

 [[2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]]]


You can also use the [`np.arange()`](https://numpy.org/doc/stable/reference/generated/numpy.arange.html) function to create a 1-D array of consecutive numbers. `np.arange()` is similar to the `range()` function we discussed in Module 1.

In the example below, what will the array look like?

In [28]:
# array using np.arange
r = np.arange(5, 9)
print(r)

[5 6 7 8]


You can also specify the spacing between values within the range using the additional `step` argument:

In [20]:
help(np.arange)

Help on built-in function arange in module numpy:

arange(...)
    arange([start,] stop[, step,], dtype=None)
    
    Return evenly spaced values within a given interval.
    
    Values are generated within the half-open interval ``[start, stop)``
    (in other words, the interval including `start` but excluding `stop`).
    For integer arguments the function is equivalent to the Python built-in
    `range` function, but returns an ndarray rather than a list.
    
    When using a non-integer step, such as 0.1, the results will often not
    be consistent.  It is better to use `numpy.linspace` for these cases.
    
    Parameters
    ----------
    start : number, optional
        Start of interval.  The interval includes this value.  The default
        start value is 0.
    stop : number
        End of interval.  The interval does not include this value, except
        in some cases where `step` is not an integer and floating point
        round-off affects the length of `out`.
   

In [21]:
# array using np.arange with a step
r = np.arange(5, 9, 2)
print(r)

[5 7]


`np.linspace()` is similar to `np.arange()` except that the `stop` value is included and rather than specify the `step` you specify the number of elements you want. These elements will be linearly spaced.

So, in the example below, what value will the array start with? Stop with? How many values will there be?

In [25]:
# Linearly spaced array
lin = np.linspace(0, 18, 10, dtype=np.float)
print(lin)

[ 0.  2.  4.  6.  8. 10. 12. 14. 16. 18.]


Now, change the above so that the `stop` value is 19. What happens? Keep `stop` at 19 and change the number of elements to 20. What happens?

`np.logspace()` is similar to `np.linspace()` except that the number will be spaced logarithmically.

In [57]:
# Log spaced array
log = np.logspace(0, 18, 10, base = 10) 
print(log)

[1.e+00 1.e+02 1.e+04 1.e+06 1.e+08 1.e+10 1.e+12 1.e+14 1.e+16 1.e+18]


In [58]:
# Log spaced array
log = np.logspace(0, 18, 10, base = np.e) 
print(log)

[1.00000000e+00 7.38905610e+00 5.45981500e+01 4.03428793e+02
 2.98095799e+03 2.20264658e+04 1.62754791e+05 1.20260428e+06
 8.88611052e+06 6.56599691e+07]


Here is a summary of the most commonly used array creation functions:

#### **The most commonly used array creation functions**
>  * `np.empty(shape[, dtype, order])` - Create an empty array with no values.
>  * `np.ones(shape[, dtype, order])` - Create an array with 1s.
>  * `np.zeros(shape[, dtype, order])` - Create an array with 0s.
>  * `np.full(shape, fill_value[, dtype, order])` - Create an array with `fill_value`s.
  
The following are very similar to the above functions, but instead of passing `shape`, you pass another array, and it will create a new array with the same shape.
>  * `np.empty_like(a[, dtype, order, subok])`
>  * `np.ones_like(a[, dtype, order, subok])`
>  * `np.zeros_like(a[, dtype, order, subok])`
>  * `full_like(a, fill_value[, dtype, order, subok])`

The following functions are for creating ranges
>  * `np.arange([start,] stop[, step,][, dtype])` - Evenly spaced values
>  * `np.linspace(start, stop[, num, endpoint, ...])` - Evenly spaced values, however you can specify `start`
>  * `np.logspace(start, stop[, num, endpoint, base, ...])` - Spaced evenly on log scale
>  * `np.geomspace(start, stop[, num, endpoint, dtype])` - Spaced evenly on log scale (geometric progression)

### 1.4 Indexing
Indexing for `NumPy` arrays is very similar to Python lists, except that we have to index for more than one dimension.

Let's start with the array `b` that we defined above.

In [26]:
b = np.array([[9,7,0,1], [6,3,2,0]])
print(b)
print(b.shape)

[[9 7 0 1]
 [6 3 2 0]]
(2, 4)


Suppose we want to extract the element with the value "2". How do we do this? First, let's consider the shape of `b`. What element contains "2" within the 1st dimension (the dimension with 4 elements)? Remember that we count from zero in python. What element contains "2" within the 2nd dimension (the dimension with 2 elements)? 

If you guessed the 2nd and 1st elements, you are correct!

There are two ways to index arrays. The first is to use consecutive `[]` as shown below.

In [62]:
# indexing - method 1
b[1][2]

2

Or you can use the syntax below (this is the syntax that I normally use).

In [63]:
# indexing - method 2
b[1,2]

2

## 2. Array Operations

Being able to perform element-wise operations is what makes `NumPy` so powerful. Let's take a look by first defining two arrays.

In [27]:
# First define two arrays
k = np.array([1.0, 2.0, 3.0])
j = np.array([2.0, 2.0, 2.0])

`NumPy` allows for basic arithmetic operations such as addition and multiplication. What do you get if you add `k` and `j` element-wise?

In [28]:
c = k + j
print(c)

[3. 4. 5.]


What do you get if you multiply `k` and `j` element-wise?

In [67]:
c = k * j
print(c)

[2. 4. 6.]


Notice the result of `k * j` might be weird if you are familiar with matrices.

**Array != Matrix**

Array operations are element-wise, which means `k * j` simply multiplies the elements with the same index. In most cases, you would operate on two arrays with the same shape (`k` and `j` have the same shape). You can, however, operate on two different shaped arrays under certain conditions. This brings us to **broadcasting**.

### 2.1 Broadcasting
Broadcasting allows us to operate on differently shaped arrays, with some constraints. Broadcasting means that array operations occur within C instead of Python, which is much faster.


![](https://www.evernote.com/l/Aq2wONNsbSFJ_Z7_JyPS3U5T3QQ3ULrYrVgB/image.png)

![](https://www.evernote.com/l/Aq3NWqDzrodAxJe2An6QGBOmGDDWtJFA-5MB/image.png)

![](https://www.evernote.com/l/Aq18pMOG1uxFEbV_6c6dJ5Ie3RIu5-JkEuoB/image.png)

![](https://www.evernote.com/l/Aq1F-XUu2GtKVLD1aBhxoDniSfD43QHbyooB/image.png)

> If you wish to learn more, follow this [link](http://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc).