Choose a variable you would like to analyze for each class.
Students will compare distributions for two different data sets of their choosing (height, weight, IG followers, etc.)
Students will observe the effect of removing points on measures of center and spread.
In this lesson, you will analyze class data from Jupyter repositories.
Click on this link below: https://mybinder.org/v2/gh/CT-STEM/Descriptive-Statistics/1.0
Then, click on " CT-STEM's repositories", and then click on " Descriptive-Statistics"
Please note that you will need to have 2 tabs open for this lesson (one for Jupyter notebook and one for CT-STEM)
In our Jupyter binder, we have a number of different variables we collected from two classes of high school students:
1. Instagram Followers
2. Daily Phone Use
3. Number of photos on phone
4. Number of hours spent sleeping per day
5. Number of hours spent studying per day
6. Student heights
7. Student weights
Each of these you will find encoded in a CSV file (comma-separated-value) in our notebook. Pick one of the variables to analyze.
Choose a variable you would like to analyze for each class.
Run through the first few lines of code until you display descriptive tables for both class periods. (You may need to look at previous Jupyter notebook files if you forgot the commands for these functions.
Compare the standard deviations of these distributions. What does this tell us about 1st period versus 2nd period, in context of the variable you chose?
Plot a histogram and boxplot for each class on Jupyter notebook.
Fill in the table below.
Write a few sentences comparing the distributions, in context (shape, center, spread, outliers).
You will remove 1 data point in each class that is CLOSEST to the mean.
Display descriptive tables for each class' data. What effect did removal of this point have on the mean and standard deviation in each set?
Explain WHY removing a point closest to the mean had the observed effect.
You will remove 1 data point in each class that is FURTHEST from the mean.
Display descriptive tables for each class' data. What effect did removal of this point have on the mean and standard deviation in each set?
Explain WHY removing a point furthest from the mean had the observed effect.
Complete the following statement:
Removing a point closest to the mean will _______________the value of the sample standard deviation.
Complete the following statement:
Removing a point furthest from the mean will ________________ the value of the sample standard deviation.
Complete the following statement:
If the mean is greater than the median, the shape of the distribution is ________________
Complete the following statement:
If the mean is less than the median, the shape of the distribution is ________________.
Python doesn’t support the creation of dot plots.
However, creating a histogram of values and then mapping them on a scatterplot solves this problem.
The last few lines of code on this notebook will plot a dotplot for any set of data in our Jupyter notebook, but modifications has to be made to accurately display the plot.
Choose a variable to analyze (height, weight, Instagram followers, etc) and debug the code so that dotplots are displayed for 1st period and 2nd period.
What changes did you have to make in the Python code?
Copy and paste the code used to create the plot.
Summarize what you did and learned from todays lesson.
What did you like? dislike?