Part I: NumPy#

Review of Module 1#

Take a few minutes to review the content of Module 1. Module 2 builds on what was covered in Module 1.

  • To open a Python interpreter, type python in terminal.

  • You can also run Python code using jupyter notebooks.

  • Arithmetic operations:

    • + Plus

    • - Minus

    • * Multiply

    • / Divide

    • ** Power

    • // Divide and floor

    • % Mod

  • Conditionals

    • if(expression)

    • elif

    • else

  • Loops

    • while(expression)

    • for

  • Some types are not compatible with each other.

    • 1+ "123" is invalid.

  • To declare a list

    • a = [1,2,3]

  • To declare a dictionary

    • d = {"name": "Student"}

1. Why NumPy?#

Let’s take a look at the example in the video.

Imagine you want to multiply two lists, [1,3,4] and [5,6,7], element by element. We can do this by looping over the elements of each list, multiplying them and then storing them in a new list.

# multiplying two lists element-wise
a = [1, 2, 3]
b = [4, 5, 6]
result = []
for i in range(3):
    result.append(a[i] * b[i])
print(result)
[4, 10, 18]

However, this is pretty cumbersome. NumPy allows us to do these types of element-wise operations with ease.

# import NumPy
import numpy as np # we import it "as np" just so that we don't have to type "numpy" all the time.

# note `np.array()` converts a list to an array

result = np.array(a) * np.array(b)
print(result)
[ 4 10 18]

NumPy also allows us to store multi-dimensional data in a new data structure called an array (see below). For example, in climate science (my field), we are often working with data that has three spatial dimensions (latitude, longitude and height) as well as the time dimension.

1.1. Importing NumPy#

To use NumPy, we first need to import it.

# import NumPy
import numpy as np

Side note: You can check the numpy version using np.__version__:

# what version are we using? 
np.__version__
'1.23.2'

1.2 NDArray#

The fundamental data structure within NumPy is called an NDArray (ND = N-dimensional).

One way to create a NDArray by converting a regular Python list.

a = np.array([9,7,0,1])
print(a)
[9 7 0 1]

1.2.1 Data Types#

dtype defines what type of data is contained within an array.

All elements within a NDArray must be the same type, unlike Python lists.

Common data types within NumPy

  • int32 - 32-bit integer, -2147483648 to 2147483647

  • int64 - 64-bit integer, -9223372036854775808 to 9223372036854775807

  • float32 - Single precision float

  • float64 - Double precision float

  • complex64 - Complex number represented by 2 32-bit floats

  • complex128 - Complex number represented by 2 64-bit floats

To see what type of data is contained with in an array, use the following command.

# data type in an array
a.dtype
dtype('int64')

1.2.2 Shape#

Shape defines how many elements an array can contain within each dimension.

To see the shape of an array, use the following command.

# 1 dimensional array with 4 elements
a.shape
(4,)

So, we see that a has one dimension with 4 elements. Let’s try buidling another array with multiple dimensions. What is the shape b?

# dimensional array
b = np.array([[9,7,0,1], [6,3,2,0]])
print(b.shape)
Hide code cell output
(2, 4)

In general, an \(N\)-dimensional array has the shape of

shape = (x_N, x_{N-1}, x_{N-2}, …, x_1)

\(N\) is the dimension count. \(x_i\) is the element count at \(i^{th}\) dimension.

So, in the case of the above array, we start from the outer [] and work our way in. Within the first set of [], we have 2 lists, so the number of elements in the Nth dimension is 2. Within the next set of [] we see that each list consists of 4 elements, so the number of elements in the \(N\)-1th dimension is 4. Thus, the shape is (2,4).

1.3 Array Creation#

Above, we created an NDarray by converting lists to arrays using the np.array() function. But, NumPy also provides many helpful functions to let you quickly create arrays from scratch.

# 3x3 array filled with zeros
zeros = np.zeros((3,3)) 
print(zeros)
Hide code cell output
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

Creating an array of zeros is useful of initializing an array that you will then populate.

You can also specify the data type. In the example below, we are creating an array of zeros, but we want the data type to be complex.

c_zeros = np.zeros((2,2), dtype=np.complex64) # 2x2 array with complex zeros
print(c_zeros)
Hide code cell output
[[0.+0.j 0.+0.j]
 [0.+0.j 0.+0.j]]

We can also create an array of ones. This is useful if we need an array of a particular constant. You can simply multiply the entire array by the constant of interest.

# 4x5 array filled with ones
ones = np.ones((4,5)) 
print(ones)
Hide code cell output
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
# 4x5 array filled with twos
twos = 2 * np.ones((4,5)) 
print(twos)
Hide code cell output
[[2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]]

An alternative way to fill an array with a constant is to use np.full().

# 2x7x3 array filled with e (use the np.e function to call the value for e)
e = np.full((2,7,3), np.e)
print(e)
Hide code cell output
[[[2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]]

 [[2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]
  [2.71828183 2.71828183 2.71828183]]]

You can also use the np.arange() function to create a 1-D array of consecutive numbers. np.arange() is similar to the range() function we discussed in Module 1.

In the example below, what will the array look like?

# array using np.arange
r = np.arange(5, 9)
print(r)
Hide code cell output
[5 6 7 8]

You can also specify the spacing between values within the range using the additional step argument:

help(np.arange)
Help on built-in function arange in module numpy:

arange(...)
    arange([start,] stop[, step,], dtype=None, *, like=None)
    
    Return evenly spaced values within a given interval.
    
    ``arange`` can be called with a varying number of positional arguments:
    
    * ``arange(stop)``: Values are generated within the half-open interval
      ``[0, stop)`` (in other words, the interval including `start` but
      excluding `stop`).
    * ``arange(start, stop)``: Values are generated within the half-open
      interval ``[start, stop)``.
    * ``arange(start, stop, step)`` Values are generated within the half-open
      interval ``[start, stop)``, with spacing between values given by
      ``step``.
    
    For integer arguments the function is roughly equivalent to the Python
    built-in :py:class:`range`, but returns an ndarray rather than a ``range``
    instance.
    
    When using a non-integer step, such as 0.1, it is often better to use
    `numpy.linspace`.
    
    See the Warning sections below for more information.
    
    Parameters
    ----------
    start : integer or real, optional
        Start of interval.  The interval includes this value.  The default
        start value is 0.
    stop : integer or real
        End of interval.  The interval does not include this value, except
        in some cases where `step` is not an integer and floating point
        round-off affects the length of `out`.
    step : integer or real, optional
        Spacing between values.  For any output `out`, this is the distance
        between two adjacent values, ``out[i+1] - out[i]``.  The default
        step size is 1.  If `step` is specified as a position argument,
        `start` must also be given.
    dtype : dtype, optional
        The type of the output array.  If `dtype` is not given, infer the data
        type from the other input arguments.
    like : array_like, optional
        Reference object to allow the creation of arrays which are not
        NumPy arrays. If an array-like passed in as ``like`` supports
        the ``__array_function__`` protocol, the result will be defined
        by it. In this case, it ensures the creation of an array object
        compatible with that passed in via this argument.
    
        .. versionadded:: 1.20.0
    
    Returns
    -------
    arange : ndarray
        Array of evenly spaced values.
    
        For floating point arguments, the length of the result is
        ``ceil((stop - start)/step)``.  Because of floating point overflow,
        this rule may result in the last element of `out` being greater
        than `stop`.
    
    Warnings
    --------
    The length of the output might not be numerically stable.
    
    Another stability issue is due to the internal implementation of
    `numpy.arange`.
    The actual step value used to populate the array is
    ``dtype(start + step) - dtype(start)`` and not `step`. Precision loss
    can occur here, due to casting or due to using floating points when
    `start` is much larger than `step`. This can lead to unexpected
    behaviour. For example::
    
      >>> np.arange(0, 5, 0.5, dtype=int)
      array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
      >>> np.arange(-3, 3, 0.5, dtype=int)
      array([-3, -2, -1,  0,  1,  2,  3,  4,  5,  6,  7,  8])
    
    In such cases, the use of `numpy.linspace` should be preferred.
    
    The built-in :py:class:`range` generates :std:doc:`Python built-in integers
    that have arbitrary size <c-api/long>`, while `numpy.arange` produces
    `numpy.int32` or `numpy.int64` numbers. This may result in incorrect
    results for large integer values::
    
      >>> power = 40
      >>> modulo = 10000
      >>> x1 = [(n ** power) % modulo for n in range(8)]
      >>> x2 = [(n ** power) % modulo for n in np.arange(8)]
      >>> print(x1)
      [0, 1, 7776, 8801, 6176, 625, 6576, 4001]  # correct
      >>> print(x2)
      [0, 1, 7776, 7185, 0, 5969, 4816, 3361]  # incorrect
    
    See Also
    --------
    numpy.linspace : Evenly spaced numbers with careful handling of endpoints.
    numpy.ogrid: Arrays of evenly spaced numbers in N-dimensions.
    numpy.mgrid: Grid-shaped arrays of evenly spaced numbers in N-dimensions.
    
    Examples
    --------
    >>> np.arange(3)
    array([0, 1, 2])
    >>> np.arange(3.0)
    array([ 0.,  1.,  2.])
    >>> np.arange(3,7)
    array([3, 4, 5, 6])
    >>> np.arange(3,7,2)
    array([3, 5])
# array using np.arange with a step
r = np.arange(5, 9, 2)
print(r)
[5 7]

np.linspace() is similar to np.arange() except that the stop value is included and rather than specify the step you specify the number of elements you want. These elements will be linearly spaced.

So, in the example below, what value will the array start with? Stop with? How many values will there be?

# Linearly spaced array
lin = np.linspace(0, 18, 10, dtype=np.float)
print(lin)
Hide code cell output
[ 0.  2.  4.  6.  8. 10. 12. 14. 16. 18.]
/var/folders/m0/hylw02j53db4s63gklflxs800000gn/T/ipykernel_80949/3676009425.py:2: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  lin = np.linspace(0, 18, 10, dtype=np.float)

Now, change the above so that the stop value is 19. What happens? Keep stop at 19 and change the number of elements to 20. What happens?

np.logspace() is similar to np.linspace() except that the number will be spaced logarithmically.

# Log spaced array
log = np.logspace(0, 18, 10, base = 10) 
print(log)
Hide code cell output
[1.e+00 1.e+02 1.e+04 1.e+06 1.e+08 1.e+10 1.e+12 1.e+14 1.e+16 1.e+18]
# Log spaced array
log = np.logspace(0, 18, 10, base = np.e) 
print(log)
[1.00000000e+00 7.38905610e+00 5.45981500e+01 4.03428793e+02
 2.98095799e+03 2.20264658e+04 1.62754791e+05 1.20260428e+06
 8.88611052e+06 6.56599691e+07]

Here is a summary of the most commonly used array creation functions:

The most commonly used array creation functions#

  • np.empty(shape[, dtype, order]) - Create an empty array with no values.

  • np.ones(shape[, dtype, order]) - Create an array with 1s.

  • np.zeros(shape[, dtype, order]) - Create an array with 0s.

  • np.full(shape, fill_value[, dtype, order]) - Create an array with fill_values.

The following are very similar to the above functions, but instead of passing shape, you pass another array, and it will create a new array with the same shape.

  • np.empty_like(a[, dtype, order, subok])

  • np.ones_like(a[, dtype, order, subok])

  • np.zeros_like(a[, dtype, order, subok])

  • full_like(a, fill_value[, dtype, order, subok])

The following functions are for creating ranges

  • np.arange([start,] stop[, step,][, dtype]) - Evenly spaced values

  • np.linspace(start, stop[, num, endpoint, ...]) - Evenly spaced values, however you can specify start

  • np.logspace(start, stop[, num, endpoint, base, ...]) - Spaced evenly on log scale

  • np.geomspace(start, stop[, num, endpoint, dtype]) - Spaced evenly on log scale (geometric progression)

1.4 Indexing#

Indexing for NumPy arrays is very similar to Python lists, except that we have to index for more than one dimension.

Let’s start with the array b that we defined above.

b = np.array([[9,7,0,1], [6,3,2,0]])
print(b)
print(b.shape)
[[9 7 0 1]
 [6 3 2 0]]
(2, 4)

Suppose we want to extract the element with the value “2”. How do we do this? First, let’s consider the shape of b. What element contains “2” within the 1st dimension (the dimension with 4 elements)? Remember that we count from zero in python. What element contains “2” within the 2nd dimension (the dimension with 2 elements)?

If you guessed the 2nd and 1st elements, you are correct!

There are two ways to index arrays. The first is to use consecutive [] as shown below.

# indexing - method 1
b[1][2]
2

Or you can use the syntax below (this is the syntax that I normally use).

# indexing - method 2
b[1,2]
2

2. Array Operations#

Being able to perform element-wise operations is what makes NumPy so powerful. Let’s take a look by first defining two arrays.

# First define two arrays
k = np.array([1.0, 2.0, 3.0])
j = np.array([2.0, 2.0, 2.0])

NumPy allows for basic arithmetic operations such as addition and multiplication. What do you get if you add k and j element-wise?

c = k + j
print(c)
Hide code cell output
[3. 4. 5.]

What do you get if you multiply k and j element-wise?

c = k * j
print(c)
Hide code cell output
[2. 4. 6.]

Notice the result of k * j might be weird if you are familiar with matrices.

Array != Matrix

Array operations are element-wise, which means k * j simply multiplies the elements with the same index. In most cases, you would operate on two arrays with the same shape (k and j have the same shape). You can, however, operate on two different shaped arrays under certain conditions. This brings us to broadcasting.

2.1 Broadcasting#

Broadcasting allows us to operate on differently shaped arrays, with some constraints. Broadcasting means that array operations occur within C instead of Python, which is much faster.

If you wish to learn more, follow this link.