Exercises: Part II#
Here we will use what we’ve learned to remove missing data from a list. As boring as this sounds, managing missing data is a very common issue in scientific computing. So, let’s give it a try.
In the example below, we have bunch of data that we want to add together but some is missing. Missing data points in our dataset are flagged with the number -99, so if we include this data in our sum, the results will be incorrect. We want to remove these from our data set before we do our sum.
We will be reading in a file called missing_data.csv
. You can download the file here. To read this in, we are going to cheat and use a package called csv. You can read it in without this package, but it is a bit cumbersome. In Module 2, we will learn more about importing packages.
# import csv package
import csv
# read in data using csv
with open('missing_data.csv') as f:
reader = csv.reader(f,delimiter=',')
sample = list(reader)[0]
print(sample)
['-5', '-1', '4', '-3', '-4', '5', '-3', '-1', '-4', '-99', '2', '1', '4', '-5', '-1', '-1', '-1', '-2', '4', '-1', '-1', '-99', '-99', '4', '0', '-1', '0', '-4', '4', '0', '0', '-3', '4', '-2', '0', '-5', '1', '3', '-2', '5', '-2', '-4', '-3', '-99', '4', '-5', '5', '-1', '-2', '-1', '0', '5', '0', '1', '3', '2', '2', '-1', '4', '1', '1', '-5', '-99', '-4', '-99', '-2', '2', '-3', '0', '-2', '0', '-2', '3', '4', '-1', '3', '4', '1', '-4', '1', '2', '-5', '-99', '0', '-1', '5', '-5', '0', '3', '3', '-1', '3', '-1', '-1', '-1', '5', '3', '-99', '-3']
You should be able to see the -99 values in the data above.
Let’s check what data structure sample
is. Is it a list, a dictionary, a tuple?
# check type of data structure using type()
Now, let’s check the type of data in sample
. Use indexing and type()
to do this.
# pick an element of sample and find the data type
Do you need to do a type conversion in order to compute the sum?
Now, let’s try to remove the missing data and compute the sum using a for
loop. We should have two new things at the end of our loop, a new list with no missing data and a sum that excludes the missing data.
Fill in the code block below:
# remove missing data and sum
sample_sum = 0 # initialize sum
new_sample = [] # initialize new list
# loop over elements of sample (remove the number sign below to get started)
#for s in sample:
Try it on your own before you click to reveal the answer below. Note that this is an answer - you may have written the code differently but got the correct output. This is fine.
Show code cell source
# remove missing data and sum
sample_sum = 0
new_sample = []
# loop over elements of sample
for s in sample:
s = int(s) # note we need to do a type conversion!
if s == -99:
sample_sum = sample_sum
new_sample = new_sample
else:
sample_sum += s
new_sample.append(s)
print(sample_sum, new_sample)
Show code cell output
1 [-5, -1, 4, -3, -4, 5, -3, -1, -4, 2, 1, 4, -5, -1, -1, -1, -2, 4, -1, -1, 4, 0, -1, 0, -4, 4, 0, 0, -3, 4, -2, 0, -5, 1, 3, -2, 5, -2, -4, -3, 4, -5, 5, -1, -2, -1, 0, 5, 0, 1, 3, 2, 2, -1, 4, 1, 1, -5, -4, -2, 2, -3, 0, -2, 0, -2, 3, 4, -1, 3, 4, 1, -4, 1, 2, -5, 0, -1, 5, -5, 0, 3, 3, -1, 3, -1, -1, -1, 5, 3, -3]
Now, let’s try the same thing but write it as a function. Again, try it on your own before you click to reveal.
# function to remove missing data and compute the sum (remove the comment sign (#) below to get started)
#def manage_missing_data(l):
Show code cell source
# function to remove missing data and compute the sum
def manage_missing_data(l):
sample_sum = 0
new_sample = []
# loop over elements of sample
for s in sample:
s = int(s)
if s == -99:
sample_sum = sample_sum
new_sample = new_sample
else:
sample_sum += s
new_sample.append(s)
return sample_sum, new_sample
# call function
print(manage_missing_data(sample))
Show code cell output
(1, [-5, -1, 4, -3, -4, 5, -3, -1, -4, 2, 1, 4, -5, -1, -1, -1, -2, 4, -1, -1, 4, 0, -1, 0, -4, 4, 0, 0, -3, 4, -2, 0, -5, 1, 3, -2, 5, -2, -4, -3, 4, -5, 5, -1, -2, -1, 0, 5, 0, 1, 3, 2, 2, -1, 4, 1, 1, -5, -4, -2, 2, -3, 0, -2, 0, -2, 3, 4, -1, 3, 4, 1, -4, 1, 2, -5, 0, -1, 5, -5, 0, 3, 3, -1, 3, -1, -1, -1, 5, 3, -3])
Finally, let’s suppose that we want to keep both data sets, the one with missing values and the one without, and store them in a dictionary. How would you construct this dictionary?
Show code cell source
# Dictionary of data
all_data = {'original': sample, 'no missing': new_sample}
print(all_data)
Show code cell output
{'original': ['-5', '-1', '4', '-3', '-4', '5', '-3', '-1', '-4', '-99', '2', '1', '4', '-5', '-1', '-1', '-1', '-2', '4', '-1', '-1', '-99', '-99', '4', '0', '-1', '0', '-4', '4', '0', '0', '-3', '4', '-2', '0', '-5', '1', '3', '-2', '5', '-2', '-4', '-3', '-99', '4', '-5', '5', '-1', '-2', '-1', '0', '5', '0', '1', '3', '2', '2', '-1', '4', '1', '1', '-5', '-99', '-4', '-99', '-2', '2', '-3', '0', '-2', '0', '-2', '3', '4', '-1', '3', '4', '1', '-4', '1', '2', '-5', '-99', '0', '-1', '5', '-5', '0', '3', '3', '-1', '3', '-1', '-1', '-1', '5', '3', '-99', '-3'], 'no missing': [-5, -1, 4, -3, -4, 5, -3, -1, -4, 2, 1, 4, -5, -1, -1, -1, -2, 4, -1, -1, 4, 0, -1, 0, -4, 4, 0, 0, -3, 4, -2, 0, -5, 1, 3, -2, 5, -2, -4, -3, 4, -5, 5, -1, -2, -1, 0, 5, 0, 1, 3, 2, 2, -1, 4, 1, 1, -5, -4, -2, 2, -3, 0, -2, 0, -2, 3, 4, -1, 3, 4, 1, -4, 1, 2, -5, 0, -1, 5, -5, 0, 3, 3, -1, 3, -1, -1, -1, 5, 3, -3]}
If you want more practice, check out the following link. You can also google “python exercises beginner” to find many more links (e.g. here’s another one).