Lesson 5. Evaluate & Elaborate - Comparing Distributions

Jacoya Thompson, Jacob Mills, Shruti Researcher
Mathematics
1 class period, 42 minutes
AP Statistics
v4

Overview

Students will compare distributions for two different data sets of their choosing (height, weight, IG followers, etc.)

Students will observe the effect of removing points on measures of center and spread. 

Standards

Computational Thinking in STEM
  • Data Practices
    • Analyzing Data
    • Manipulating Data
    • Visualizing Data
  • Modeling and Simulation Practices
    • Using Computational Models to Understand a Concept
  • Computational Problem Solving Practices
    • Computer Programming

Activities

  • 1. Choose & Display Data
  • 2. Plot Histogram and Boxplot for each class
  • 3. Effect of removing data points near the mean
  • 4. Generalize
  • 5. Challenge - Dotplots
  • 6. Summarize

Student Directions and Resources


In this lesson, you will analyze class data from Jupyter repositories.

Click on this link below:
https://mybinder.org/v2/gh/CT-STEM/Descriptive-Statistics/1.0

Then, click on " CT-STEM's repositories", and then click on " Descriptive-Statistics"

Please note that you will need to have 2 tabs open for this lesson (one for Jupyter notebook and one for CT-STEM)

 

1. Choose & Display Data


In our Jupyter binder, we have a number of different variables we collected from two classes of high school students:

1. Instagram Followers

2. Daily Phone Use

3. Number of photos on phone

4. Number of hours spent sleeping per day

5. Number of hours spent studying per day

6. Student heights

7. Student weights

Each of these you will find encoded in a CSV file (comma-separated-value) in our notebook. Pick one of the variables to analyze.


Question 1.1

Choose a variable you would like to analyze for each class.

  Height
  Hours spent studying
  Sleep
  Exercise
  Phone Use
  Instagram Followers
  # of pictures on phone


Question 1.2

Run through the first few lines of code until you display descriptive tables for both class periods. (You may need to look at previous Jupyter notebook files if you forgot the commands for these functions.

Compare the standard deviations of these distributions. What does this tell us about 1st period versus 2nd period, in context of the variable you chose?



2. Plot Histogram and Boxplot for each class



Question 2.1

Plot a histogram and boxplot for each class on Jupyter notebook.

Fill in the table below. 



Question 2.2

Write a few sentences comparing the distributions, in context (shape, center, spread, outliers). 



3. Effect of removing data points near the mean



Question 3.1

You will remove 1 data point in each class that is CLOSEST to the mean. 

Display descriptive tables for each class' data. What effect did removal of this point have on the mean and standard deviation in each set?



Question 3.2

Explain WHY removing a point closest to the mean had the observed effect. 



Question 3.3

You will remove 1 data point in each class that is FURTHEST from the mean. 

Display descriptive tables for each class' data. What effect did removal of this point have on the mean and standard deviation in each set?



Question 3.4

Explain WHY removing a point furthest from the mean had the observed effect. 



4. Generalize



Question 4.1

Complete the following statement:

Removing a point closest to the mean will _______________the value of the sample standard deviation. 

  have no effect
  increase
  decrease


Question 4.2

Complete the following statement:

Removing a point furthest from the mean will ________________ the value of the sample standard deviation. 

  have no effect
  increase
  decrease


Question 4.3

Complete the following statement:

If the mean is greater than the median, the shape of the distribution is ________________

  skewed left
  skewed right
  approximately symmetric


Question 4.4

Complete the following statement:

If the mean is less than the median, the shape of the distribution is ________________.

  skewed left
  skewed right
  approximately symmetric


5. Challenge - Dotplots


Python doesn’t support the creation of dot plots.

However, creating a histogram of values and then mapping them on a scatterplot solves this problem.

The last few lines of code on this notebook will plot a dotplot for any set of data in our Jupyter notebook, but modifications has to be made to accurately display the plot.

Choose a variable to analyze (height, weight, Instagram followers, etc) and debug the code so that dotplots are displayed for 1st period and 2nd period. 


Question 5.1

What changes did you have to make in the Python code?



Question 5.2

Copy and paste the code used to create the plot.



6. Summarize



Question 6.1

Summarize what you did and learned from todays lesson.

What did you like? dislike?