Lecture 16
Duke University
STA 199 - Fall 2023
2023-10-24
– Clone ae-15
– HW-5 Due out on Tuesday
– Project proposal due Wednesday (Nov 1st)
– HW-6 is out now (Due last day of class)
Your team should have a project
repo
– This is where you will complete all components of your project
– It will be hosted on a website
– Turned in via GitHub
– Feedback will be within GitHub
– Each person in your group should be contributing to the project
– Group feedback survey for lab leaders / TAs
– Reach out to instructor / lab leader if an individual is hard to communicate with, and we will remind the entire group of project expectations
– Project work day attendance; GitHub commits will be used to adjust grade accordingly if there is an ongoing issue
Own your work vs “that part wasn’t mine”
Research Question
– Can you answer it?
Introduction
– Do not copy + paste the description given on the website
– Do some digging
– Write a brief description of the observations.
Address ethical concerns about the data, if any.
Put in the effort so there are pieces to give feedback on
A Statistics Experience: The goal of the statistics experience assignments is to help you engage with the statistics and data science communities outside of the classroom
– Can be found in the last row of our schedule
– No GitHub repo for HW-6
– Attend a talk or conference
– Talk with a statistician/ data scientist (myself and TAs do not count)
– Listen to a podcast / watch video
– Participate in a data science competition or challenge
– Read a book on statistics/data science
– TidyTuesday challenges
– Coding out loud project
– Are the intercepts different?
– Does the relationship between body mass and flipper length change based on which island the penguin is on?
– Interaction model allows the slopes to differ based on other covariates
– Interaction model is the more complicated model
– Describe evidence for an interaction model
– Fit an interaction model in R
– Model Selection
We have fit many models to analyze the body mass of penguins. Let’s go over strategies to figure out which model is “the best”
In philosophy, Occam’s razor is the problem-solving principle that recommends searching for explanations constructed with the smallest possible set of elements. It is also known as the principle of parsimony
The best model is not always the most complicated:
– R-squared (why we shouldn’t use this)
– Adjusted R-squared
– AIC (Next Time)
– Stepwise selection (Next Time)
tells us the proportion of variability in the data our model explains.
– What is it?
– How is it different than R-squared?
– Does not have the same interpretation as R-squared
– Generally defined as strength of model fit
– Look for higher adjusted R-squared
We want our covariates to do a good job at modeling our response y. Is the goal for \(R^2\) = 1? Is the goal to have a perfect fitting model?
– SLR
– MLR - Additive Case
– MLR - Interaction Case
What can we use these models for?