Lesson 2. Engage - Datasets in Python

Jacoya Thompson, Jacob Mills, Shruti Researcher
Mathematics
1 class period, 42 minutes
AP Statistics
v6

Overview

In this lesson:

Students will engage in a pre-assessment using Python and Jupyter notebooks.

Given output table for given set of data, interpret the standard deviation in context. 

Ex table output:

  1. Plot table in python (screenshot/code)

  2. Interpret std for given data set (text box)

  3. Interpret mean for given data set (text box)

  4. Change labels (25%, 50%, 75%) to (Q1, median, Q3) within Python code. (screenshot/code)

Standards

Computational Thinking in STEM
  • Data Practices
    • Analyzing Data
    • Manipulating Data
    • Visualizing Data
  • Modeling and Simulation Practices
    • Using Computational Models to Understand a Concept
  • Computational Problem Solving Practices
    • Computer Programming

Activities

  • 1. Introduction to Python & Jupyter notebooks
  • 2. Engage with data sets in Python
  • 3. Display a descriptive table only with information for counts, mean, and standard deviation for the data set
  • 4. Add the correct quartiles (Q1, Median, Q3) labels to 25%, 50%, 75% listed in the table
  • 5. Plot a boxplot of the data set
  • 6. Plot a histogram of the data set

Student Directions and Resources


Students will familiarize themselves with Jupyter notebook and various Python commands and functions. They will be asked to complete a set of questions based on a table they create and modify.

1. Introduction to Python & Jupyter notebooks


Python is a popular programming language, created by Guido van Rossum, we will use to make plots and perform calculations on various data sets.

We will use Jupyter notebooks, which is a web-based platform that acts as a diary for python code.

For the exercises, once you open the Jupyter notebook, to explore your data, refer back to the GIF and video below to see results for each line of code. 

  


2. Engage with data sets in Python


Click on the link below to start the pre-assessment:

https://mybinder.org/v2/gh/CT-STEM/Descriptive-Statistics/1.0

To begin, click on "STD_Part1"

Please note that you will need to have two tabs open on your Chromebook - one for CT-STEM and one for Jupyter notebook.


Question 2.1

Running the code will display a descriptive statistics table for the data set. Take a screenshot of this output and upload the image below. 

Steps for taking a partial screenshot:

Upload files that are less than 5MB in size.
File Delete
Upload files to the space allocated by your teacher.


3. Display a descriptive table only with information for counts, mean, and standard deviation for the data set



Question 3.1

Change the table so that only "count", "mean" and "std" are displayed. Take a screenshot of the new table and upload image below. 

Upload files that are less than 5MB in size.
File Delete
Upload files to the space allocated by your teacher.


Question 3.2

Interpret the mean for this data set (Average US Gas Price).



Question 3.3

Interpret the standard deviation for this data set (Average US Gas Price).



4. Add the correct quartiles (Q1, Median, Q3) labels to 25%, 50%, 75% listed in the table



Question 4.1

Change the labels for 25%, 50%, and 75% to the names of those quartiles. Take a screenshot of the updated table and upload image below. 

Upload files that are less than 5MB in size.
File Delete
Upload files to the space allocated by your teacher.


Question 4.2

What does Q1 represent, in the context of this situation?



Question 4.3

What does the median represent, in the context of this situation?



Question 4.4

What does Q3 represent, in the context of this situation?



5. Plot a boxplot of the data set



Question 5.1

Plot a boxplot of this data set using Python commands and functions. Take a screenshot of the boxplot and upload image below. 

Upload files that are less than 5MB in size.
File Delete
Upload files to the space allocated by your teacher.


Question 5.2

What does the length of the "box" represent in the boxplot?



Question 5.3

What descriptive statistics are NOT shown on the boxplot?



6. Plot a histogram of the data set



Question 6.1

Plot a histogram of this data set using Python commands and functions. Take a screenshot of the histogram and upload image below. 

Upload files that are less than 5MB in size.
File Delete
Upload files to the space allocated by your teacher.


Question 6.2

Describe the distribution (shape, center, spread, outliers). 



Question 6.3

What descriptive statistics are NOT shown on the histogram?