Part I: NumPy#
Review of Module 1#
Take a few minutes to review the content of Module 1. Module 2 builds on what was covered in Module 1.
To open a Python interpreter, type
python
in terminal.You can also run Python code using jupyter notebooks.
Arithmetic operations:
+
Plus-
Minus*
Multiply/
Divide**
Power//
Divide and floor%
Mod
Conditionals
if(expression)
elif
else
Loops
while(expression)
for
Some types are not compatible with each other.
1+ "123"
is invalid.
To declare a list
a = [1,2,3]
To declare a dictionary
d = {"name": "Student"}
1. Why NumPy
?#
Let’s take a look at the example in the video.
Imagine you want to multiply two lists, [1,3,4]
and [5,6,7]
, element by element. We can do this by looping over the elements of each list, multiplying them and then storing them in a new list.
# multiplying two lists element-wise
a = [1, 2, 3]
b = [4, 5, 6]
result = []
for i in range(3):
result.append(a[i] * b[i])
print(result)
[4, 10, 18]
However, this is pretty cumbersome. NumPy
allows us to do these types of element-wise operations with ease.
# import NumPy
import numpy as np # we import it "as np" just so that we don't have to type "numpy" all the time.
# note `np.array()` converts a list to an array
result = np.array(a) * np.array(b)
print(result)
[ 4 10 18]
NumPy
also allows us to store multi-dimensional data in a new data structure called an array (see below). For example, in climate science (my field), we are often working with data that has three spatial dimensions (latitude, longitude and height) as well as the time dimension.
1.1. Importing NumPy#
To use NumPy
, we first need to import it.
# import NumPy
import numpy as np
Side note: You can check the numpy version using
np.__version__
:
# what version are we using?
np.__version__
'1.23.2'
1.2 NDArray#
The fundamental data structure within NumPy
is called an NDArray (ND = N-dimensional).
One way to create a NDArray by converting a regular Python list.
a = np.array([9,7,0,1])
print(a)
[9 7 0 1]
1.2.1 Data Types#
dtype
defines what type of data is contained within an array.
All elements within a NDArray must be the same type, unlike Python lists.
Common data types within NumPy
int32
- 32-bit integer, -2147483648 to 2147483647int64
- 64-bit integer, -9223372036854775808 to 9223372036854775807float32
- Single precision floatfloat64
- Double precision floatcomplex64
- Complex number represented by 2 32-bit floatscomplex128
- Complex number represented by 2 64-bit floats
To see what type of data is contained with in an array, use the following command.
# data type in an array
a.dtype
dtype('int64')
1.2.2 Shape#
Shape defines how many elements an array can contain within each dimension.
To see the shape of an array, use the following command.
# 1 dimensional array with 4 elements
a.shape
(4,)
So, we see that a
has one dimension with 4 elements. Let’s try buidling another array with multiple dimensions. What is the shape b
?
# dimensional array
b = np.array([[9,7,0,1], [6,3,2,0]])
print(b.shape)
Show code cell output
(2, 4)
In general, an \(N\)-dimensional array has the shape of
shape = (x_N, x_{N-1}, x_{N-2}, …, x_1)
\(N\) is the dimension count. \(x_i\) is the element count at \(i^{th}\) dimension.
So, in the case of the above array, we start from the outer []
and work our way in. Within the first set of []
, we have 2 lists, so the number of elements in the Nth dimension is 2. Within the next set of []
we see that each list consists of 4 elements, so the number of elements in the \(N\)-1th dimension is 4. Thus, the shape is (2,4).
1.3 Array Creation#
Above, we created an NDarray by converting lists to arrays using the np.array()
function. But, NumPy
also provides many helpful functions to let you quickly create arrays from scratch.
# 3x3 array filled with zeros
zeros = np.zeros((3,3))
print(zeros)
Show code cell output
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
Creating an array of zeros is useful of initializing an array that you will then populate.
You can also specify the data type. In the example below, we are creating an array of zeros, but we want the data type to be complex.
c_zeros = np.zeros((2,2), dtype=np.complex64) # 2x2 array with complex zeros
print(c_zeros)
Show code cell output
[[0.+0.j 0.+0.j]
[0.+0.j 0.+0.j]]
We can also create an array of ones. This is useful if we need an array of a particular constant. You can simply multiply the entire array by the constant of interest.
# 4x5 array filled with ones
ones = np.ones((4,5))
print(ones)
Show code cell output
[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]
# 4x5 array filled with twos
twos = 2 * np.ones((4,5))
print(twos)
Show code cell output
[[2. 2. 2. 2. 2.]
[2. 2. 2. 2. 2.]
[2. 2. 2. 2. 2.]
[2. 2. 2. 2. 2.]]
An alternative way to fill an array with a constant is to use np.full()
.
# 2x7x3 array filled with e (use the np.e function to call the value for e)
e = np.full((2,7,3), np.e)
print(e)
Show code cell output
[[[2.71828183 2.71828183 2.71828183]
[2.71828183 2.71828183 2.71828183]
[2.71828183 2.71828183 2.71828183]
[2.71828183 2.71828183 2.71828183]
[2.71828183 2.71828183 2.71828183]
[2.71828183 2.71828183 2.71828183]
[2.71828183 2.71828183 2.71828183]]
[[2.71828183 2.71828183 2.71828183]
[2.71828183 2.71828183 2.71828183]
[2.71828183 2.71828183 2.71828183]
[2.71828183 2.71828183 2.71828183]
[2.71828183 2.71828183 2.71828183]
[2.71828183 2.71828183 2.71828183]
[2.71828183 2.71828183 2.71828183]]]
You can also use the np.arange()
function to create a 1-D array of consecutive numbers. np.arange()
is similar to the range()
function we discussed in Module 1.
In the example below, what will the array look like?
# array using np.arange
r = np.arange(5, 9)
print(r)
Show code cell output
[5 6 7 8]
You can also specify the spacing between values within the range using the additional step
argument:
help(np.arange)
Help on built-in function arange in module numpy:
arange(...)
arange([start,] stop[, step,], dtype=None, *, like=None)
Return evenly spaced values within a given interval.
``arange`` can be called with a varying number of positional arguments:
* ``arange(stop)``: Values are generated within the half-open interval
``[0, stop)`` (in other words, the interval including `start` but
excluding `stop`).
* ``arange(start, stop)``: Values are generated within the half-open
interval ``[start, stop)``.
* ``arange(start, stop, step)`` Values are generated within the half-open
interval ``[start, stop)``, with spacing between values given by
``step``.
For integer arguments the function is roughly equivalent to the Python
built-in :py:class:`range`, but returns an ndarray rather than a ``range``
instance.
When using a non-integer step, such as 0.1, it is often better to use
`numpy.linspace`.
See the Warning sections below for more information.
Parameters
----------
start : integer or real, optional
Start of interval. The interval includes this value. The default
start value is 0.
stop : integer or real
End of interval. The interval does not include this value, except
in some cases where `step` is not an integer and floating point
round-off affects the length of `out`.
step : integer or real, optional
Spacing between values. For any output `out`, this is the distance
between two adjacent values, ``out[i+1] - out[i]``. The default
step size is 1. If `step` is specified as a position argument,
`start` must also be given.
dtype : dtype, optional
The type of the output array. If `dtype` is not given, infer the data
type from the other input arguments.
like : array_like, optional
Reference object to allow the creation of arrays which are not
NumPy arrays. If an array-like passed in as ``like`` supports
the ``__array_function__`` protocol, the result will be defined
by it. In this case, it ensures the creation of an array object
compatible with that passed in via this argument.
.. versionadded:: 1.20.0
Returns
-------
arange : ndarray
Array of evenly spaced values.
For floating point arguments, the length of the result is
``ceil((stop - start)/step)``. Because of floating point overflow,
this rule may result in the last element of `out` being greater
than `stop`.
Warnings
--------
The length of the output might not be numerically stable.
Another stability issue is due to the internal implementation of
`numpy.arange`.
The actual step value used to populate the array is
``dtype(start + step) - dtype(start)`` and not `step`. Precision loss
can occur here, due to casting or due to using floating points when
`start` is much larger than `step`. This can lead to unexpected
behaviour. For example::
>>> np.arange(0, 5, 0.5, dtype=int)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> np.arange(-3, 3, 0.5, dtype=int)
array([-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8])
In such cases, the use of `numpy.linspace` should be preferred.
The built-in :py:class:`range` generates :std:doc:`Python built-in integers
that have arbitrary size <c-api/long>`, while `numpy.arange` produces
`numpy.int32` or `numpy.int64` numbers. This may result in incorrect
results for large integer values::
>>> power = 40
>>> modulo = 10000
>>> x1 = [(n ** power) % modulo for n in range(8)]
>>> x2 = [(n ** power) % modulo for n in np.arange(8)]
>>> print(x1)
[0, 1, 7776, 8801, 6176, 625, 6576, 4001] # correct
>>> print(x2)
[0, 1, 7776, 7185, 0, 5969, 4816, 3361] # incorrect
See Also
--------
numpy.linspace : Evenly spaced numbers with careful handling of endpoints.
numpy.ogrid: Arrays of evenly spaced numbers in N-dimensions.
numpy.mgrid: Grid-shaped arrays of evenly spaced numbers in N-dimensions.
Examples
--------
>>> np.arange(3)
array([0, 1, 2])
>>> np.arange(3.0)
array([ 0., 1., 2.])
>>> np.arange(3,7)
array([3, 4, 5, 6])
>>> np.arange(3,7,2)
array([3, 5])
# array using np.arange with a step
r = np.arange(5, 9, 2)
print(r)
[5 7]
np.linspace()
is similar to np.arange()
except that the stop
value is included and rather than specify the step
you specify the number of elements you want. These elements will be linearly spaced.
So, in the example below, what value will the array start with? Stop with? How many values will there be?
# Linearly spaced array
lin = np.linspace(0, 18, 10, dtype=np.float)
print(lin)
Show code cell output
[ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18.]
/var/folders/m0/hylw02j53db4s63gklflxs800000gn/T/ipykernel_80949/3676009425.py:2: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
lin = np.linspace(0, 18, 10, dtype=np.float)
Now, change the above so that the stop
value is 19. What happens? Keep stop
at 19 and change the number of elements to 20. What happens?
np.logspace()
is similar to np.linspace()
except that the number will be spaced logarithmically.
# Log spaced array
log = np.logspace(0, 18, 10, base = 10)
print(log)
Show code cell output
[1.e+00 1.e+02 1.e+04 1.e+06 1.e+08 1.e+10 1.e+12 1.e+14 1.e+16 1.e+18]
# Log spaced array
log = np.logspace(0, 18, 10, base = np.e)
print(log)
[1.00000000e+00 7.38905610e+00 5.45981500e+01 4.03428793e+02
2.98095799e+03 2.20264658e+04 1.62754791e+05 1.20260428e+06
8.88611052e+06 6.56599691e+07]
Here is a summary of the most commonly used array creation functions:
The most commonly used array creation functions#
np.empty(shape[, dtype, order])
- Create an empty array with no values.
np.ones(shape[, dtype, order])
- Create an array with 1s.
np.zeros(shape[, dtype, order])
- Create an array with 0s.
np.full(shape, fill_value[, dtype, order])
- Create an array withfill_value
s.
The following are very similar to the above functions, but instead of passing shape
, you pass another array, and it will create a new array with the same shape.
np.empty_like(a[, dtype, order, subok])
np.ones_like(a[, dtype, order, subok])
np.zeros_like(a[, dtype, order, subok])
full_like(a, fill_value[, dtype, order, subok])
The following functions are for creating ranges
np.arange([start,] stop[, step,][, dtype])
- Evenly spaced values
np.linspace(start, stop[, num, endpoint, ...])
- Evenly spaced values, however you can specifystart
np.logspace(start, stop[, num, endpoint, base, ...])
- Spaced evenly on log scale
np.geomspace(start, stop[, num, endpoint, dtype])
- Spaced evenly on log scale (geometric progression)
1.4 Indexing#
Indexing for NumPy
arrays is very similar to Python lists, except that we have to index for more than one dimension.
Let’s start with the array b
that we defined above.
b = np.array([[9,7,0,1], [6,3,2,0]])
print(b)
print(b.shape)
[[9 7 0 1]
[6 3 2 0]]
(2, 4)
Suppose we want to extract the element with the value “2”. How do we do this? First, let’s consider the shape of b
. What element contains “2” within the 1st dimension (the dimension with 4 elements)? Remember that we count from zero in python. What element contains “2” within the 2nd dimension (the dimension with 2 elements)?
If you guessed the 2nd and 1st elements, you are correct!
There are two ways to index arrays. The first is to use consecutive []
as shown below.
# indexing - method 1
b[1][2]
2
Or you can use the syntax below (this is the syntax that I normally use).
# indexing - method 2
b[1,2]
2
2. Array Operations#
Being able to perform element-wise operations is what makes NumPy
so powerful. Let’s take a look by first defining two arrays.
# First define two arrays
k = np.array([1.0, 2.0, 3.0])
j = np.array([2.0, 2.0, 2.0])
NumPy
allows for basic arithmetic operations such as addition and multiplication. What do you get if you add k
and j
element-wise?
c = k + j
print(c)
Show code cell output
[3. 4. 5.]
What do you get if you multiply k
and j
element-wise?
c = k * j
print(c)
Show code cell output
[2. 4. 6.]
Notice the result of k * j
might be weird if you are familiar with matrices.
Array != Matrix
Array operations are element-wise, which means k * j
simply multiplies the elements with the same index. In most cases, you would operate on two arrays with the same shape (k
and j
have the same shape). You can, however, operate on two different shaped arrays under certain conditions. This brings us to broadcasting.
2.1 Broadcasting#
Broadcasting allows us to operate on differently shaped arrays, with some constraints. Broadcasting means that array operations occur within C instead of Python, which is much faster.
If you wish to learn more, follow this link.