AE 07: Pivoting - Suggested Answers

Application exercise
Important

Go to the course GitHub organization and locate the repo titled ae-07-f23-YOUR_GITHUB_USERNAME to get started.

This AE is due Friday, September 22nd at 11:59pm.

Packages

Pivot Practice

Run the following code below. Are these data in long or wide format? Why?

x <- tibble(
  state = rep(c("MT", "NC" , "SC"),2),
  group = c(rep("C", 3), rep("D", 3)),
  obs = c(1:6)
  )

x
# A tibble: 6 × 3
  state group   obs
  <chr> <chr> <int>
1 MT    C         1
2 NC    C         2
3 SC    C         3
4 MT    D         4
5 NC    D         5
6 SC    D         6

These data are in long format. We see multiple rows from each observational unit state.

Pivot these data so that the data are wide. i.e. Each state should be it’s own unique observation (row). Save this new data set as y.

y <- x |>
  pivot_wider(
    names_from = group,
    values_from = obs
  )

y
# A tibble: 3 × 3
  state     C     D
  <chr> <int> <int>
1 MT        1     4
2 NC        2     5
3 SC        3     6

Now, let’s change it back. Introducing pivot_longer. There are three things we need to consider with pivot_longer:

  • What the columns will be
  • names_to
  • values_to
y |>
  pivot_longer(cols = !state, names_to = "group", values_to = "obs")
# A tibble: 6 × 3
  state group   obs
  <chr> <chr> <int>
1 MT    C         1
2 MT    D         4
3 NC    C         2
4 NC    D         5
5 SC    C         3
6 SC    D         6

Pivot Practice 2

Let’s try this on a real data set.

The Portland Trailblazers are a National Basketball Association (NBA) sports team. These data reflect the points scored by 9 Portland Trailblazers players across the first 10 games of the 2021-2022 NBA season.

trailblazer <- read_csv("data/trailblazer21.csv")
Rows: 9 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): Player
dbl (10): Game1_Home, Game2_Home, Game3_Away, Game4_Home, Game5_Home, Game6_...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

– Take a slice at the data. Are these data in wide or long format?

trailblazer |>
  slice(1:5)
# A tibble: 5 × 11
  Player       Game1_Home Game2_Home Game3_Away Game4_Home Game5_Home Game6_Away
  <chr>             <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
1 Damian Lill…         20         19         12         20         25         14
2 CJ McCollum          24         28         20         25         14         25
3 Norman Powe…         14         16         NA         NA         12         14
4 Robert Covi…          8          6          0          3          9          6
5 Jusuf Nurkic         20          9          4         17         14         13
# ℹ 4 more variables: Game7_Away <dbl>, Game8_Away <dbl>, Game9_Home <dbl>,
#   Game10_Home <dbl>

These data are in wide format. We see unique rows per observational unit player.

– Pivot the data so that you have columns for Player, Game, Points. Save this as a new data set called new.blazer.

new.blazer <- trailblazer |>
  pivot_longer(
    cols = !Player,
    names_to = "Game", 
    values_to = "Points"
  )

new.blazer
# A tibble: 90 × 3
   Player         Game        Points
   <chr>          <chr>        <dbl>
 1 Damian Lillard Game1_Home      20
 2 Damian Lillard Game2_Home      19
 3 Damian Lillard Game3_Away      12
 4 Damian Lillard Game4_Home      20
 5 Damian Lillard Game5_Home      25
 6 Damian Lillard Game6_Away      14
 7 Damian Lillard Game7_Away      20
 8 Damian Lillard Game8_Away      26
 9 Damian Lillard Game9_Home       4
10 Damian Lillard Game10_Home     25
# ℹ 80 more rows

—————————– Answer Below ————————————-

– Suppose now that you are asked to have two separate columns within these data. One column to represent Game, and one to represent Location. Make this happen below. Save your new data set as new.blazer

new.blazer <- trailblazer |>  
  pivot_longer(
    cols = -Player,
    names_to = "Game",
    values_to = "Points",
  ) |> 
  separate(Game , sep = "_", into = c("Game" , "Location"))

– Now, use pivot_wider to reshape the new.blazer data frame such that you have a 90 x 4 tibble with columns Player, Game, Home, Away.

new.blazer |>
  pivot_wider(names_from = Location, 
              values_from = Points)
# A tibble: 90 × 4
   Player         Game    Home  Away
   <chr>          <chr>  <dbl> <dbl>
 1 Damian Lillard Game1     20    NA
 2 Damian Lillard Game2     19    NA
 3 Damian Lillard Game3     NA    12
 4 Damian Lillard Game4     20    NA
 5 Damian Lillard Game5     25    NA
 6 Damian Lillard Game6     NA    14
 7 Damian Lillard Game7     NA    20
 8 Damian Lillard Game8     NA    26
 9 Damian Lillard Game9      4    NA
10 Damian Lillard Game10    25    NA
# ℹ 80 more rows