Solutions to Exercises

For each exercise, the solutions below show one possible way of solving it, but you might have used a different approach, and that's great! There is almost always more than one way to solve any particular problem in Python.

Lesson 1: Intro to Pandas

Exercise 1.1

a) Initial setup (you can skip to part b if you've already done this):

b) Based on the output of world.info(), what data type is the pop_density column?

The pop_density column is of float type.

c) Based on the output of world.describe(), what are the minimum and maximum years in this data?

The minimum year is 1950 and the maximum year is 2015.

Exercise 1.2

a) Create a new DataFrame called americas which contains the rows of world where the region is "Americas" and has the following columns: country, year, sub_region, income_group, pop_density.

b) Use the head() and tail() methods to display the first 20 and last 20 rows.

c) Use the unique() method on the country column to display the list of unique countries in the americas DataFrame.

Exercise 1.3

For this exercise we're working with the original DataFrame world (containing all years and all countries).

a) Initial setup (you can skip to part b if you've already done this):

b) Group the DataFrame world by year and compute the world total population (in millions) in each year.

Lesson 2: Intro to Data Visualization

Exercise 2.1

a) Initial setup (you can skip to part b if you've already done this):

b) Use relplot() to create a scatter plot of life_expectancy vs. gdp_per_capita from world_2015, in which the points are coloured by income_group.

c) Add the keyword argument aspect=1.5 to the relplot() function call. How does the plot change?

The aspect ratio changes (the plot becomes wider).

Exercise 2.2

a) Initial setup (you can skip to part b if you've already done this):

b) Use relplot() to create a plot similar to the previous example, but plotting life_expectancy on the y-axis instead of pop_millions and aggregating with the mean instead of the sum.

Bonus: Do you spot anything strange in the subplot for the "Americas" region? How could you investigate this using the techniques we learned in the Intro to Pandas lesson?

There appears to be an outlier in the Americas low income group in 2010. To investigate, we can look at the rows of world for this group, to see if anything jumps out:

We can see that the Americas low income group only contains one country, Haiti, and during 2010 there was a large drop in life expectancy. Since there was a devastating earthquake in Haiti in 2010, this drop in life expectancy is likely reflecting these disaster conditions. However, the magnitude of the decrease (from 58 down to 32 years) seems potentially unrealistic and could indicate an issue in how life expectancy was calculated in the Gapminder data.


Back to home