Before we fit the model, let’s visualize it. Recreate the plot from the slides that shows the interaction model between body mass (y), flipper length (x1) and island (x2).
penguins|>ggplot(aes(x =flipper_length_mm, y =body_mass_g, color =island))+geom_point()+geom_smooth(method ="lm" , se =F)
When the relationship between x and y depends on values or levels of z
Is there evidence of an interaction? Why or why not?
Yes! The relationship between body mass and flipper length is different across levels of island.
Fit the interaction model that models body mass by island and flipper length. Name this model peng_int. Display the summary output. Next, write out the estimated model using proper notation.
peng_int<-linear_reg()|>set_engine("lm")|>fit(body_mass_g~flipper_length_mm*island , data =penguins)tidy(peng_int)
Now, let’s simplify the equation to only look at penguins on the Bisoce island. What about the Dream island?
$$
= -5464 + 48.5*flipper
$$
$$
= -5464 + 48.5flipper + 35511 - 19.4flipper1 \
= (-5464 + 3551) + (48.5-19.4)*flipper
$$
You then can compare intercepts and slope coefficients across levels. What do these two lines tell you about the relationship between flipper length and body mass across these two levels?
When flipper length is 0, we estimate the mean body mass of penguins on the Bisoce island to be smaller than those on the Dream island. However, we estimate the mean body mass of Biscoe penguins to increase more quickly (at a higher rate) than those on the Dream penguins as flipper length increases
Note: We don’t interpret the interaction terms from the full model. We often simplify and compare across like above!
Now, suppose you want to change the baseline of this model. Let’s practice this by changing the baseline to Dream. You can do this a few ways, including with fct_relevel or factor.
penguins2<-penguins|>mutate(island =factor(island, levels =c("Dream" , "Biscoe", "Torgersen")))linear_reg()|>set_engine("lm")|>fit(body_mass_g~flipper_length_mm*island, data =penguins2)|>tidy()
model_1<-linear_reg()|>set_engine("lm")|>fit(body_mass_g~island, data =penguins)set.seed(33)penguins<-penguins|>mutate(random_numbers =rnorm(nrow(penguins), 1, 10))model_2<-linear_reg()|>set_engine("lm")|>fit(body_mass_g~island*random_numbers, data =penguins)glance(model_1)$r.squared
[1] 0.3935772
glance(model_2)$r.squared
[1] 0.3952714
Did r-squared increase or decrease from model 1 to model 2? Why?
It went up! R-squared will always go up when we add variables to our model, even if the variables do not make sense