Lesson 4. Explain - Using Python to Display and Analyze Data

Jacoya Thompson, Jacob Mills, Shruti Researcher
Mathematics
1 class period, 42 minutes
AP Statistics
v5

Overview

In this lesson, students explore Jupyter notebook and various Python commands and functions. Students will choose a data set, plot histograms, change bin width, explore shape and the effect that outliers have on distributions. 

Standards

Computational Thinking in STEM
  • Data Practices
    • Analyzing Data
    • Manipulating Data
    • Visualizing Data
  • Modeling and Simulation Practices
    • Using Computational Models to Understand a Concept
  • Computational Problem Solving Practices
    • Computer Programming

Activities

  • 1. Choose Data Set & Plot Descriptive Table
  • 2. Plot Histogram
  • 3. Manipulating the Histogram
  • 4. Plot Boxplot
  • 5. Summarize

Student Directions and Resources


In this lesson, you will:

  • Explore Jupyter notebook and various Python commands and functions
  • Choose a data set, plot a histogram, change bin width, and explore the effect that outliers have on distributions
  • Add data points, remove data points, and observe the effect this has on measures of center and spread

Please note that you will need to have 2 tabs open for this lesson (one for Jupyter notebook and one for CT-STEM)

1. Choose Data Set & Plot Descriptive Table


Open separate tab and paste this URL into the search bar:

https://mybinder.org/v2/gh/CT-STEM/Descriptive-Statistics/1.0

Then, click on "STD_Part2"

Please note that you will need to have TWO tabs open on your Chromebook - one for CT-STEM and one for Jupyter notebook.


Question 1.1

Which data set did you choose to display and analyze? You will enter the name of this .csv file into the Jupyter notebook

ds = pd.read_csv('FILE NAME')

  daily_exercise.csv
  daily_phoneuse.csv
  daily_sleep.csv
  daily_studying.csv
  igfollowers.csv
  studentheight.csv
  studentweight.csv


Question 1.2

Write the code necessary to display descriptive statistics (count, mean, st dev, etc). You may need to open up the link from yesterday's pre-assessment, if you forgot how to display this information (https://mybinder.org/v2/gh/CT-STEM/Descriptive-Statistics/1.0).

Based on the relationship between the mean and median, make a prediction of the shape of this distribution. Explain your reasoning. 



2. Plot Histogram



Question 2.1

Plot the histogram for the data set you chose, take a screenshot on your chromebook and upload the image below. 

Steps for taking a partial screenshot:

Upload files that are less than 5MB in size.
File Delete
Upload files to the space allocated by your teacher.


Question 2.2

Describe the shape, center, spread and any possible outliers in the distribution (in context).



3. Manipulating the Histogram



Question 3.1

When you changed the number of bins to less than 8, what did you notice about the distribution? Did the shape change? Did you learn more or less about the data set? Write your observations below. 



Question 3.2

When you changed the number of bins to more than 8, what did you notice about the distribution? Did the shape change? Did you learn more or less about the data set? Write your observations below. 



Question 3.3

What is an appropriate number of bins for this data set? Why?



4. Plot Boxplot



Question 4.1

Plot the boxplot for your data set, take a screenshot and upload the image below. 

Upload files that are less than 5MB in size.
File Delete
Upload files to the space allocated by your teacher.


Question 4.2

What information does the boxplot show that the histogram does not?



Question 4.3

What are the pros and cons of boxplots?



5. Summarize



Question 5.1

Summarize what you did in todays lesson. 

What did you like? dislike?