Exercises: Part II

Exercises: Part II#

Here we will use what we’ve learned to remove missing data from a list. As boring as this sounds, managing missing data is a very common issue in scientific computing. So, let’s give it a try.

In the example below, we have bunch of data that we want to add together but some is missing. Missing data points in our dataset are flagged with the number -99, so if we include this data in our sum, the results will be incorrect. We want to remove these from our data set before we do our sum.

We will be reading in a file called missing_data.csv. You can download the file here. To read this in, we are going to cheat and use a package called csv. You can read it in without this package, but it is a bit cumbersome. In Module 2, we will learn more about importing packages.

# import csv package
import csv

# read in data using csv
with open('missing_data.csv') as f:
    reader = csv.reader(f,delimiter=',')
    sample = list(reader)[0]

print(sample)
['-5', '-1', '4', '-3', '-4', '5', '-3', '-1', '-4', '-99', '2', '1', '4', '-5', '-1', '-1', '-1', '-2', '4', '-1', '-1', '-99', '-99', '4', '0', '-1', '0', '-4', '4', '0', '0', '-3', '4', '-2', '0', '-5', '1', '3', '-2', '5', '-2', '-4', '-3', '-99', '4', '-5', '5', '-1', '-2', '-1', '0', '5', '0', '1', '3', '2', '2', '-1', '4', '1', '1', '-5', '-99', '-4', '-99', '-2', '2', '-3', '0', '-2', '0', '-2', '3', '4', '-1', '3', '4', '1', '-4', '1', '2', '-5', '-99', '0', '-1', '5', '-5', '0', '3', '3', '-1', '3', '-1', '-1', '-1', '5', '3', '-99', '-3']

You should be able to see the -99 values in the data above.

Let’s check what data structure sample is. Is it a list, a dictionary, a tuple?

# check type of data structure using type()

Now, let’s check the type of data in sample. Use indexing and type() to do this.

# pick an element of sample and find the data type

Do you need to do a type conversion in order to compute the sum?

Now, let’s try to remove the missing data and compute the sum using a for loop. We should have two new things at the end of our loop, a new list with no missing data and a sum that excludes the missing data.

Fill in the code block below:

# remove missing data and sum
sample_sum = 0  # initialize sum
new_sample = [] # initialize new list

# loop over elements of sample (remove the number sign below to get started)
#for s in sample:
    

Try it on your own before you click to reveal the answer below. Note that this is an answer - you may have written the code differently but got the correct output. This is fine.

Hide code cell source
# remove missing data and sum
sample_sum = 0
new_sample = []

# loop over elements of sample
for s in sample:
    s = int(s) # note we need to do a type conversion!
    if s == -99:
        sample_sum = sample_sum
        new_sample = new_sample
    else:
        sample_sum += s
        new_sample.append(s)

print(sample_sum, new_sample)
Hide code cell output
1 [-5, -1, 4, -3, -4, 5, -3, -1, -4, 2, 1, 4, -5, -1, -1, -1, -2, 4, -1, -1, 4, 0, -1, 0, -4, 4, 0, 0, -3, 4, -2, 0, -5, 1, 3, -2, 5, -2, -4, -3, 4, -5, 5, -1, -2, -1, 0, 5, 0, 1, 3, 2, 2, -1, 4, 1, 1, -5, -4, -2, 2, -3, 0, -2, 0, -2, 3, 4, -1, 3, 4, 1, -4, 1, 2, -5, 0, -1, 5, -5, 0, 3, 3, -1, 3, -1, -1, -1, 5, 3, -3]

Now, let’s try the same thing but write it as a function. Again, try it on your own before you click to reveal.

# function to remove missing data and compute the sum (remove the comment sign (#) below to get started)

#def manage_missing_data(l):
    
Hide code cell source
# function to remove missing data and compute the sum 
def manage_missing_data(l):
    sample_sum = 0
    new_sample = []

    # loop over elements of sample
    for s in sample:
        s = int(s)
        if s == -99:
            sample_sum = sample_sum
            new_sample = new_sample
        else:
            sample_sum += s
            new_sample.append(s)
    return sample_sum, new_sample

# call function
print(manage_missing_data(sample))
Hide code cell output
(1, [-5, -1, 4, -3, -4, 5, -3, -1, -4, 2, 1, 4, -5, -1, -1, -1, -2, 4, -1, -1, 4, 0, -1, 0, -4, 4, 0, 0, -3, 4, -2, 0, -5, 1, 3, -2, 5, -2, -4, -3, 4, -5, 5, -1, -2, -1, 0, 5, 0, 1, 3, 2, 2, -1, 4, 1, 1, -5, -4, -2, 2, -3, 0, -2, 0, -2, 3, 4, -1, 3, 4, 1, -4, 1, 2, -5, 0, -1, 5, -5, 0, 3, 3, -1, 3, -1, -1, -1, 5, 3, -3])

Finally, let’s suppose that we want to keep both data sets, the one with missing values and the one without, and store them in a dictionary. How would you construct this dictionary?

Hide code cell source
# Dictionary of data
all_data = {'original': sample, 'no missing': new_sample}
print(all_data)
Hide code cell output
{'original': ['-5', '-1', '4', '-3', '-4', '5', '-3', '-1', '-4', '-99', '2', '1', '4', '-5', '-1', '-1', '-1', '-2', '4', '-1', '-1', '-99', '-99', '4', '0', '-1', '0', '-4', '4', '0', '0', '-3', '4', '-2', '0', '-5', '1', '3', '-2', '5', '-2', '-4', '-3', '-99', '4', '-5', '5', '-1', '-2', '-1', '0', '5', '0', '1', '3', '2', '2', '-1', '4', '1', '1', '-5', '-99', '-4', '-99', '-2', '2', '-3', '0', '-2', '0', '-2', '3', '4', '-1', '3', '4', '1', '-4', '1', '2', '-5', '-99', '0', '-1', '5', '-5', '0', '3', '3', '-1', '3', '-1', '-1', '-1', '5', '3', '-99', '-3'], 'no missing': [-5, -1, 4, -3, -4, 5, -3, -1, -4, 2, 1, 4, -5, -1, -1, -1, -2, 4, -1, -1, 4, 0, -1, 0, -4, 4, 0, 0, -3, 4, -2, 0, -5, 1, 3, -2, 5, -2, -4, -3, 4, -5, 5, -1, -2, -1, 0, 5, 0, 1, 3, 2, 2, -1, 4, 1, 1, -5, -4, -2, 2, -3, 0, -2, 0, -2, 3, 4, -1, 3, 4, 1, -4, 1, 2, -5, 0, -1, 5, -5, 0, 3, 3, -1, 3, -1, -1, -1, 5, 3, -3]}

If you want more practice, check out the following link. You can also google “python exercises beginner” to find many more links (e.g. here’s another one).