AE 07: Pivoting - Suggested Answers
Go to the course GitHub organization and locate the repo titled ae-07-f23-YOUR_GITHUB_USERNAME
to get started.
This AE is due Friday, September 22nd at 11:59pm.
Packages
Pivot Practice
Run the following code below. Are these data in long or wide format? Why?
x <- tibble(
state = rep(c("MT", "NC" , "SC"),2),
group = c(rep("C", 3), rep("D", 3)),
obs = c(1:6)
)
x
# A tibble: 6 × 3
state group obs
<chr> <chr> <int>
1 MT C 1
2 NC C 2
3 SC C 3
4 MT D 4
5 NC D 5
6 SC D 6
These data are in long format. We see multiple rows from each observational unit state.
Pivot these data so that the data are wide. i.e. Each state should be it’s own unique observation (row). Save this new data set as y
.
y <- x |>
pivot_wider(
names_from = group,
values_from = obs
)
y
# A tibble: 3 × 3
state C D
<chr> <int> <int>
1 MT 1 4
2 NC 2 5
3 SC 3 6
Now, let’s change it back. Introducing pivot_longer
. There are three things we need to consider with pivot_longer
:
- What the columns will be
names_to
values_to
y |>
pivot_longer(cols = !state, names_to = "group", values_to = "obs")
# A tibble: 6 × 3
state group obs
<chr> <chr> <int>
1 MT C 1
2 MT D 4
3 NC C 2
4 NC D 5
5 SC C 3
6 SC D 6
Pivot Practice 2
Let’s try this on a real data set.
The Portland Trailblazers are a National Basketball Association (NBA) sports team. These data reflect the points scored by 9 Portland Trailblazers players across the first 10 games of the 2021-2022 NBA season.
trailblazer <- read_csv("data/trailblazer21.csv")
Rows: 9 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Player
dbl (10): Game1_Home, Game2_Home, Game3_Away, Game4_Home, Game5_Home, Game6_...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
– Take a slice at the data. Are these data in wide or long format?
trailblazer |>
slice(1:5)
# A tibble: 5 × 11
Player Game1_Home Game2_Home Game3_Away Game4_Home Game5_Home Game6_Away
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Damian Lill… 20 19 12 20 25 14
2 CJ McCollum 24 28 20 25 14 25
3 Norman Powe… 14 16 NA NA 12 14
4 Robert Covi… 8 6 0 3 9 6
5 Jusuf Nurkic 20 9 4 17 14 13
# ℹ 4 more variables: Game7_Away <dbl>, Game8_Away <dbl>, Game9_Home <dbl>,
# Game10_Home <dbl>
These data are in wide format. We see unique rows per observational unit player.
– Pivot the data so that you have columns for Player
, Game
, Points
. Save this as a new data set called new.blazer
.
new.blazer <- trailblazer |>
pivot_longer(
cols = !Player,
names_to = "Game",
values_to = "Points"
)
new.blazer
# A tibble: 90 × 3
Player Game Points
<chr> <chr> <dbl>
1 Damian Lillard Game1_Home 20
2 Damian Lillard Game2_Home 19
3 Damian Lillard Game3_Away 12
4 Damian Lillard Game4_Home 20
5 Damian Lillard Game5_Home 25
6 Damian Lillard Game6_Away 14
7 Damian Lillard Game7_Away 20
8 Damian Lillard Game8_Away 26
9 Damian Lillard Game9_Home 4
10 Damian Lillard Game10_Home 25
# ℹ 80 more rows
—————————– Answer Below ————————————-
– Suppose now that you are asked to have two separate columns within these data. One column to represent Game
, and one to represent Location
. Make this happen below. Save your new data set as new.blazer
new.blazer <- trailblazer |>
pivot_longer(
cols = -Player,
names_to = "Game",
values_to = "Points",
) |>
separate(Game , sep = "_", into = c("Game" , "Location"))
– Now, use pivot_wider
to reshape the new.blazer
data frame such that you have a 90 x 4 tibble with columns Player
, Game
, Home
, Away
.
new.blazer |>
pivot_wider(names_from = Location,
values_from = Points)
# A tibble: 90 × 4
Player Game Home Away
<chr> <chr> <dbl> <dbl>
1 Damian Lillard Game1 20 NA
2 Damian Lillard Game2 19 NA
3 Damian Lillard Game3 NA 12
4 Damian Lillard Game4 20 NA
5 Damian Lillard Game5 25 NA
6 Damian Lillard Game6 NA 14
7 Damian Lillard Game7 NA 20
8 Damian Lillard Game8 NA 26
9 Damian Lillard Game9 4 NA
10 Damian Lillard Game10 25 NA
# ℹ 80 more rows