How much should you budget for gasoline (per gallon)? Explain your reasoning based on the data set above. Keep in mind that you want to budget enough money for fuel expenses, but not too much, since there are other expenses for your business.
By the end of this unit, students will be able to display and describe quantitative data sets using Python. Students will compare distributions of data, explain how outliers affect measures of center and spread, and develop a deeper understanding of standard deviation.
In this activity, students are given a scenario, a list of 30 data points, and asked to make decisions based on their prior knowledge of descriptive statistics. Students should use a measure of center (mean, median, mode) and a measure of spread (interquartile range, standard deviation) in their explanation. Students may also choose to provide a plot of the sample data (histogram, stem plot, dot plot, etc).
Suppose that you own a trucking company. Each month, you budget for fuel expenses but the price of gasoline fluctuates from day to day. You gather a sample of prices per gallon from 30 different gas stations in your area. Your goal is to decide how much to budget for fuel expenses (in terms of price per gallon).
Suppose that you own a trucking company. Each month, you budget for fuel expenses but the price of gasoline fluctuates from day to day. You gather a sample of prices per gallon from 30 different gas stations in your area.
Gas Station (#) | Price per gallon (USD) |
---|---|
1 | 3.21 |
2 | 3.42 |
3 | 3.33 |
4 | 3.41 |
5 | 3.09 |
6 | 3.16 |
7 | 3.17 |
8 | 3.00 |
9 | 3.11 |
10 | 2.98 |
11 | 3.78 |
12 | 3.56 |
13 | 3.67 |
14 | 3.44 |
15 | 3.64 |
16 | 3.50 |
17 | 3.44 |
18 | 3.19 |
19 | 3.25 |
20 | 3.37 |
21 | 3.39 |
22 | 3.41 |
23 | 3.40 |
24 | 3.46 |
25 | 3.39 |
26 | 3.49 |
27 | 3.71 |
28 | 3.26 |
29 | 3.31 |
30 | 3.38 |
How much should you budget for gasoline (per gallon)? Explain your reasoning based on the data set above. Keep in mind that you want to budget enough money for fuel expenses, but not too much, since there are other expenses for your business.
In this lesson:
Students will engage in a pre-assessment using Python and Jupyter notebooks.
Given output table for given set of data, interpret the standard deviation in context.
Ex table output:
Plot table in python (screenshot/code)
Interpret std for given data set (text box)
Interpret mean for given data set (text box)
Change labels (25%, 50%, 75%) to (Q1, median, Q3) within Python code. (screenshot/code)
Students will familiarize themselves with Jupyter notebook and various Python commands and functions. They will be asked to complete a set of questions based on a table they create and modify.
Python is a popular programming language, created by Guido van Rossum, we will use to make plots and perform calculations on various data sets.
We will use Jupyter notebooks, which is a web-based platform that acts as a diary for python code.
For the exercises, once you open the Jupyter notebook, to explore your data, refer back to the GIF and video below to see results for each line of code.
Click on the link below to start the pre-assessment:
https://mybinder.org/v2/gh/CT-STEM/Descriptive-Statistics/1.0
To begin, click on "STD_Part1"
Please note that you will need to have two tabs open on your Chromebook - one for CT-STEM and one for Jupyter notebook.
Running the code will display a descriptive statistics table for the data set. Take a screenshot of this output and upload the image below.
Steps for taking a partial screenshot:
File | Delete |
---|---|
Change the table so that only "count", "mean" and "std" are displayed. Take a screenshot of the new table and upload image below.
File | Delete |
---|---|
Interpret the mean for this data set (Average US Gas Price).
Interpret the standard deviation for this data set (Average US Gas Price).
Change the labels for 25%, 50%, and 75% to the names of those quartiles. Take a screenshot of the updated table and upload image below.
File | Delete |
---|---|
What does Q1 represent, in the context of this situation?
What does the median represent, in the context of this situation?
What does Q3 represent, in the context of this situation?
Plot a boxplot of this data set using Python commands and functions. Take a screenshot of the boxplot and upload image below.
File | Delete |
---|---|
What does the length of the "box" represent in the boxplot?
What descriptive statistics are NOT shown on the boxplot?
Plot a histogram of this data set using Python commands and functions. Take a screenshot of the histogram and upload image below.
File | Delete |
---|---|
Describe the distribution (shape, center, spread, outliers).
What descriptive statistics are NOT shown on the histogram?
In groups of 3-4, students will be presented with the formula for sample standard deviation. They will write their own pseudo code or procedure for calculating sample standard deviation from a given data set. Students will test out their code and compare their output to their graphing calculator output. If the outputs don't match, they must debug their code until it matches the calculator output. After writing the correct code, students will answer some follow-up questions that dive deeper into standard deviation
In groups of 3-4, students will be presented with the formula for sample standard deviation. They will write their own pseudo code or procedure for calculating sample standard deviation from a given data set. Students will test out their code and compare their output to their graphing calculator output. If the outputs don't match, they must debug their code until it matches the calculator output. After writing the correct code, students will answer some follow-up questions that dive deeper into standard deviation
Here is the formula for sample standard deviation, as it appears in our textbook and AP formula sheet:
Using what you know about order of operations and previous math courses, look at the formula above and write out a procedure or "pseudo code" that will take a given data set (e.g., 5 numbers) and calculate sample standard deviation.
Use your procedure or "pseudo code" to find the sample standard deviation for this 5 number data set. Write your solution in the textbox below. Be sure to show all steps.
3 7 9 11 15
Grab your graphing calculator and enter the same data set into List 1 (L1). Next, go to STAT --> CALC --> 1 Var Stats to display descriptive statistics. The sample standard deviation will be listed under "Sx". Does this value match your answer to Question 2.1? If it does, move on to question #4. If not, re-write your procedure or "pseudo code" in the text box below. As a reminder, your procedure must take a list of values and using the formula, calculate sample standard deviation. Make sure the output of your procedure matches the calculator output.
3 7 9 11 15
If your procedure was incorrect the first time, please write what went wrong in the text box below.
If we added the value "1" to our set, what effect would it have on the sample standard deviation? Why?
(If you're not sure, use your procedure or graphing calculator and find the new standard deviation with this point added.)
If we added the value "20" to our set, what effect would it have on the sample standard deviation? Why?
(If you're not sure, use your procedure or graphing calculator and find the new standard deviation with this point added.)
If we added the value "9" to our set, what effect would it have on the sample standard deviation? Why?
(If you're not sure, use your procedure or graphing calculator and find the new standard deviation with this point added.)
Recall the gas price problem from earlier in this unit. You owned a trucking company and were trying to figure out how much to budget for gasoline (per gallon). For the questions that follow, let's assume that the current mean US gas price is $3.43/gallon with a standard deviation of $0.13/gallon.
What would be an amount ($ per gallon) that would be way too high to budget for gasoline? Justify your answer.
What would be an amount ($ per gallon) that would be way too low to budget for gasoline? Justify your answer.
Make a prediction for the shape of the distribution of US gas prices. Justify your answer.
In this lesson, students explore Jupyter notebook and various Python commands and functions. Students will choose a data set, plot histograms, change bin width, explore shape and the effect that outliers have on distributions.
In this lesson, you will:
Please note that you will need to have 2 tabs open for this lesson (one for Jupyter notebook and one for CT-STEM)
Open separate tab and paste this URL into the search bar:
https://mybinder.org/v2/gh/CT-STEM/Descriptive-Statistics/1.0
Then, click on "STD_Part2"
Please note that you will need to have TWO tabs open on your Chromebook - one for CT-STEM and one for Jupyter notebook.
Which data set did you choose to display and analyze? You will enter the name of this .csv file into the Jupyter notebook
ds = pd.read_csv('FILE NAME')
Write the code necessary to display descriptive statistics (count, mean, st dev, etc). You may need to open up the link from yesterday's pre-assessment, if you forgot how to display this information (https://mybinder.org/v2/gh/CT-STEM/Descriptive-Statistics/1.0).
Based on the relationship between the mean and median, make a prediction of the shape of this distribution. Explain your reasoning.
Plot the histogram for the data set you chose, take a screenshot on your chromebook and upload the image below.
Steps for taking a partial screenshot:
File | Delete |
---|---|
Describe the shape, center, spread and any possible outliers in the distribution (in context).
When you changed the number of bins to less than 8, what did you notice about the distribution? Did the shape change? Did you learn more or less about the data set? Write your observations below.
When you changed the number of bins to more than 8, what did you notice about the distribution? Did the shape change? Did you learn more or less about the data set? Write your observations below.
What is an appropriate number of bins for this data set? Why?
Plot the boxplot for your data set, take a screenshot and upload the image below.
File | Delete |
---|---|
What information does the boxplot show that the histogram does not?
What are the pros and cons of boxplots?
Summarize what you did in todays lesson.
What did you like? dislike?
Students will compare distributions for two different data sets of their choosing (height, weight, IG followers, etc.)
Students will observe the effect of removing points on measures of center and spread.
In this lesson, you will analyze class data from Jupyter repositories.
Click on this link below: https://mybinder.org/v2/gh/CT-STEM/Descriptive-Statistics/1.0
Then, click on " CT-STEM's repositories", and then click on " Descriptive-Statistics"
Please note that you will need to have 2 tabs open for this lesson (one for Jupyter notebook and one for CT-STEM)
In our Jupyter binder, we have a number of different variables we collected from two classes of high school students:
1. Instagram Followers
2. Daily Phone Use
3. Number of photos on phone
4. Number of hours spent sleeping per day
5. Number of hours spent studying per day
6. Student heights
7. Student weights
Each of these you will find encoded in a CSV file (comma-separated-value) in our notebook. Pick one of the variables to analyze.
Choose a variable you would like to analyze for each class.
Run through the first few lines of code until you display descriptive tables for both class periods. (You may need to look at previous Jupyter notebook files if you forgot the commands for these functions.
Compare the standard deviations of these distributions. What does this tell us about 1st period versus 2nd period, in context of the variable you chose?
Plot a histogram and boxplot for each class on Jupyter notebook.
Fill in the table below.
Write a few sentences comparing the distributions, in context (shape, center, spread, outliers).
You will remove 1 data point in each class that is CLOSEST to the mean.
Display descriptive tables for each class' data. What effect did removal of this point have on the mean and standard deviation in each set?
Explain WHY removing a point closest to the mean had the observed effect.
You will remove 1 data point in each class that is FURTHEST from the mean.
Display descriptive tables for each class' data. What effect did removal of this point have on the mean and standard deviation in each set?
Explain WHY removing a point furthest from the mean had the observed effect.
Complete the following statement:
Removing a point closest to the mean will _______________the value of the sample standard deviation.
Complete the following statement:
Removing a point furthest from the mean will ________________ the value of the sample standard deviation.
Complete the following statement:
If the mean is greater than the median, the shape of the distribution is ________________
Complete the following statement:
If the mean is less than the median, the shape of the distribution is ________________.
Python doesn’t support the creation of dot plots.
However, creating a histogram of values and then mapping them on a scatterplot solves this problem.
The last few lines of code on this notebook will plot a dotplot for any set of data in our Jupyter notebook, but modifications has to be made to accurately display the plot.
Choose a variable to analyze (height, weight, Instagram followers, etc) and debug the code so that dotplots are displayed for 1st period and 2nd period.
What changes did you have to make in the Python code?
Copy and paste the code used to create the plot.
Summarize what you did and learned from todays lesson.
What did you like? dislike?