Preparation

First we load the necessary libraries: {cfbfastR} pulls the match up data, {cfbplotR} makes it easy to add the team logos, {ggiraph} adds the interactivity to the ggplot, and the {tidyverse} packages for wrangling the data.

#remotes::install_github(repo = "Kazink36/cfbplotR")
library(cfbplotR)
library(cfbfastR)
library(ggiraph)
library(tidyverse)
options(dplyr.summarise.inform = FALSE)

Now we can pull in the matchup data we want from collegefootballdata.com using the cfbd_game_info() function (this requires an API key, check {cfbfastR}’s documentation for more info). The API only returns one season at a time so we’ll use purrr::map_df to pull in the 2014 to 2021 seasons. We’re only looking at who schedules who so we’ll ignore bowl games. To include bowl games you can add the parameter season_type = "both" to the function.

year_vector = 2021:2021

game_info <- purrr::map_df(
  year_vector,
  function(x){
    Sys.sleep(2)
    cfbfastR::cfbd_game_info(x)
  }
)
## Rows: 6,408
## Columns: 28
## $ game_id            <int> 400548403, 400548402, 400547829, 400547960, 4005480…
## $ season             <int> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 201…
## $ week               <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ season_type        <chr> "regular", "regular", "regular", "regular", "regula…
## $ start_date         <chr> "2014-08-27T23:00:00.000Z", "2014-08-28T22:00:00.00…
## $ start_time_tbd     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ neutral_site       <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ conference_game    <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ attendance         <int> 10140, 82847, 21003, 0, 9104, 15793, 12398, 45925, …
## $ venue_id           <int> 3495, 3994, 3817, 3953, 3768, 3786, 3764, 587, 3601…
## $ venue              <chr> "Georgia Dome", "Williams-Brice Stadium", "Malone S…
## $ home_id            <int> 2247, 2579, 2433, 135, 2006, 2117, 2459, 254, 166, …
## $ home_team          <chr> "Georgia State", "South Carolina", "Louisiana Monro…
## $ home_conference    <chr> "Sun Belt", "SEC", "Sun Belt", "Big Ten", "Mid-Amer…
## $ home_points        <int> 38, 28, 17, 42, 41, 20, 55, 56, 28, 35, 38, 42, 38,…
## $ home_post_win_prob <chr> "0.7040558473410293", "0.005651845193381177", "0.98…
## $ home_pregame_elo   <int> NA, 1863, 1303, NA, NA, NA, NA, NA, NA, 1625, 1345,…
## $ home_postgame_elo  <int> NA, 1787, 1317, NA, NA, NA, NA, NA, NA, 1690, 1356,…
## $ away_id            <int> 2000, 245, 154, 2197, 47, 236, 2506, 304, 13, 68, 2…
## $ away_team          <chr> "Abilene Christian", "Texas A&M", "Wake Forest", "E…
## $ away_conference    <chr> NA, "SEC", "ACC", NA, NA, NA, NA, NA, NA, "Mountain…
## $ away_points        <int> 37, 52, 10, 20, 0, 16, 3, 14, 10, 13, 31, 10, 41, 1…
## $ away_post_win_prob <chr> "0.29594415265897067", "0.9943481548066189", "0.010…
## $ away_pregame_elo   <int> NA, 1765, 1377, NA, NA, NA, NA, NA, NA, 1653, 1374,…
## $ away_postgame_elo  <int> NA, 1841, 1363, NA, NA, NA, NA, NA, NA, 1588, 1363,…
## $ excitement_index   <chr> "9.07877961509999", "3.5531582091", "6.450991846699…
## $ highlights         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ notes              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Data Wrangling

To make the chord diagram, we need to position all of the teams in a circle. So we’ll create a new nodes data frame we can use for reference. First we take all of the distinct teams and conferences in game_info and filter it down to just power 5 conference teams and arrange the teams by conference so that teams in the same conference are next to each other on the circle. We give each team a node id and an angle for positioning on the circle.

nodes <- game_info %>%
  distinct(home_team,home_conference) %>%
  filter(home_team %in% cfbplotR::valid_team_names("P5")) %>%
  arrange(home_conference,home_team) %>%
  transmute(
    label = home_team,
    home_conference,
    id = row_number(),
    node = row_number()
  ) %>%
  mutate(
    theta = 2*pi*row_number()/nrow(.),
    x = cos(theta),
    y = sin(theta)
  )
glimpse(nodes)
## Rows: 66
## Columns: 7
## $ label           <chr> "Boston College", "Clemson", "Duke", "Florida State", …
## $ home_conference <chr> "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC"…
## $ id              <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
## $ node            <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
## $ theta           <dbl> 0.09519978, 0.19039955, 0.28559933, 0.38079911, 0.4759…
## $ x               <dbl> 0.99547192, 0.98192870, 0.95949297, 0.92836793, 0.8888…
## $ y               <dbl> 0.09505604, 0.18925124, 0.28173256, 0.37166246, 0.4582…

We also need a conf data frame to plot the conference logos. We do this by grouping the nodes data frame by conference and getting the minimum and maximum angles of the conference to plot the conference curve and the average angle to plot the conference logo.

conf <- nodes %>%
  group_by(home_conference) %>%
  summarize(
    theta_min = min(theta),
    theta_max = max(theta),
    theta = mean(theta)
  ) %>%
  ungroup()
glimpse(conf)
## Rows: 6
## Columns: 4
## $ home_conference <chr> "ACC", "Big 12", "Big Ten", "FBS Independents", "Pac-1…
## $ theta_min       <dbl> 0.09519978, 1.42799666, 2.37999443, 3.71279132, 3.9031…
## $ theta_max       <dbl> 1.332797, 2.284795, 3.617592, 3.807991, 4.950388, 6.28…
## $ theta           <dbl> 0.7139983, 1.8563957, 2.9987930, 3.7603912, 4.4267896,…

Now we create an edges data frame that we can use for the curves connecting teams that have played each other. First, we use a slightly modified double_games() function from the {nflseedR} package that will create two rows for each game. We then summarize the win-loss record for each match up and create the text for each tooltip. We then join edges with our nodes data frame to get the source and target nodes.

# Slightly modified from nflseedR
double_games <- function(g){
  g1 <- g %>%
    select(game_id, season,week, away_team, home_team, result, home_points, away_points) %>%
    rename(team = away_team, opp = home_team, points = away_points, opp_points = home_points) %>%
    mutate(result = -1 * result)
  g2 <- g %>%
    select(game_id, season,week, away_team, home_team, result, home_points, away_points) %>%
    rename(team = home_team, opp = away_team, points = home_points, opp_points = away_points)
  g <- bind_rows(g1, g2) %>% 
    mutate(
      outcome = case_when(result > 0 ~ 1, result < 0 ~ 0, result == 0 ~ 0.5, TRUE ~ NA_real_)
    )
  return(g)
}

edges <- game_info %>% 
  mutate(result = home_points - away_points) %>%
  filter(!conference_game) %>%
  double_games() %>%
  filter(team %in% cfbplotR::valid_team_names("P5"),opp %in% cfbplotR::valid_team_names("P5")) %>%
  group_by(team,opp) %>%
  arrange(season) %>%
  summarize(
    n = n(),
    w = sum(outcome),
    l = n-w,
    text = paste0(season,": ",points," - ",opp_points, collapse = "\n")
  ) %>%
  ungroup() %>%
  mutate(tooltip = paste0(team," - ",opp,"\n",w," - ",l,"\n",text)) %>%
  left_join(nodes ,by = c("team" = "label")) %>%
  left_join(nodes, by = c("opp" = "label"),suffix = c("_source","_target")) %>%
  ungroup()

glimpse(edges)
## Rows: 326
## Columns: 19
## $ team                   <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Al…
## $ opp                    <chr> "Duke", "Florida State", "Louisville", "Miami",…
## $ n                      <int> 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 1, 2, 2, 1,…
## $ w                      <dbl> 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 2, 1, 0, 1, 1, 1,…
## $ l                      <dbl> 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 1, 1, 1, 0,…
## $ text                   <chr> "2019: 42 - 3", "2017: 24 - 7", "2018: 51 - 14"…
## $ tooltip                <chr> "Alabama - Duke\n1 - 0\n2019: 42 - 3", "Alabama…
## $ home_conference_source <chr> "SEC", "SEC", "SEC", "SEC", "SEC", "SEC", "SEC"…
## $ id_source              <int> 53, 53, 53, 53, 53, 53, 53, 41, 41, 42, 42, 42,…
## $ node_source            <int> 53, 53, 53, 53, 53, 53, 53, 41, 41, 42, 42, 42,…
## $ theta_source           <dbl> 5.04558820, 5.04558820, 5.04558820, 5.04558820,…
## $ x_source               <dbl> 0.3270680, 0.3270680, 0.3270680, 0.3270680, 0.3…
## $ y_source               <dbl> -0.94500082, -0.94500082, -0.94500082, -0.94500…
## $ home_conference_target <chr> "ACC", "ACC", "ACC", "ACC", "Pac-12", "Big 12",…
## $ id_target              <int> 3, 4, 6, 7, 49, 24, 38, 39, 23, 39, 30, 40, 65,…
## $ node_target            <int> 3, 4, 6, 7, 49, 24, 38, 39, 23, 39, 30, 40, 65,…
## $ theta_target           <dbl> 0.2855993, 0.3807991, 0.5711987, 0.6663984, 4.6…
## $ x_target               <dbl> 0.95949297, 0.92836793, 0.84125353, 0.78605309,…
## $ y_target               <dbl> 0.28173256, 0.37166246, 0.54064082, 0.61815899,…

Eventually we’d like to use the curve geom in our plot to make the network, but curvature is not an aesthetic so we have to get a little fancy. We’ll end up using an annotate so we need to do all of our size and alpha scaling manually (this may not be necessary if you use geom_curve_interactive() in the next block but I haven’t tested it out). We’ll also calculate the curvature based on the positioning of the source and target nodes: the closer they are to opposite ends of the circle, the smaller the curvature of the connection.

edges <- edges %>%
  mutate(
    curvature = case_when(
      id_source < id_target ~ .95*(theta_target-theta_source-pi)/(2*pi),
      id_source > id_target ~ -.95*(theta_source-theta_target-pi)/(2*pi),
      TRUE ~ 0
    ),
    color = "black",
    alpha = (n-min(n))/(max(n)-min(n))*(.45-.25)+.25,
    size = (n-min(n))/(max(n)-min(n))*(2.5-1)+1,
    curvature = ifelse(abs(curvature) < 0.0001, 0, curvature)  #causing problems when plotting
  )

glimpse(edges)
## Rows: 326
## Columns: 23
## $ team                   <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Al…
## $ opp                    <chr> "Duke", "Florida State", "Louisville", "Miami",…
## $ n                      <int> 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 1, 2, 2, 1,…
## $ w                      <dbl> 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 2, 1, 0, 1, 1, 1,…
## $ l                      <dbl> 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 1, 1, 1, 0,…
## $ text                   <chr> "2019: 42 - 3", "2017: 24 - 7", "2018: 51 - 14"…
## $ tooltip                <chr> "Alabama - Duke\n1 - 0\n2019: 42 - 3", "Alabama…
## $ home_conference_source <chr> "SEC", "SEC", "SEC", "SEC", "SEC", "SEC", "SEC"…
## $ id_source              <int> 53, 53, 53, 53, 53, 53, 53, 41, 41, 42, 42, 42,…
## $ node_source            <int> 53, 53, 53, 53, 53, 53, 53, 41, 41, 42, 42, 42,…
## $ theta_source           <dbl> 5.04558820, 5.04558820, 5.04558820, 5.04558820,…
## $ x_source               <dbl> 0.3270680, 0.3270680, 0.3270680, 0.3270680, 0.3…
## $ y_source               <dbl> -0.94500082, -0.94500082, -0.94500082, -0.94500…
## $ home_conference_target <chr> "ACC", "ACC", "ACC", "ACC", "Pac-12", "Big 12",…
## $ id_target              <int> 3, 4, 6, 7, 49, 24, 38, 39, 23, 39, 30, 40, 65,…
## $ node_target            <int> 3, 4, 6, 7, 49, 24, 38, 39, 23, 39, 30, 40, 65,…
## $ theta_target           <dbl> 0.2855993, 0.3807991, 0.5711987, 0.6663984, 4.6…
## $ x_target               <dbl> 0.95949297, 0.92836793, 0.84125353, 0.78605309,…
## $ y_target               <dbl> 0.28173256, 0.37166246, 0.54064082, 0.61815899,…
## $ curvature              <dbl> -0.24469697, -0.23030303, -0.20151515, -0.18712…
## $ color                  <chr> "black", "black", "black", "black", "black", "b…
## $ alpha                  <dbl> 0.2500000, 0.2500000, 0.2500000, 0.2500000, 0.2…
## $ size                   <dbl> 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.50,…

Plotting

Curvature isn’t an aesthetic we can use in our geom, but ggplot does allow us to add a list of layers to our plot. So we’ll use the Map function to generate a list of the 562 curves we need to make our connections (this is actually double what is necessary for a standard plot, but in order for the interactivity to be nice when highlighting a team, we need the doubled data).

curve_list <- Map(function(x, y, 
                           xend, yend,
                           size, alpha, 
                           tooltip, data_id,
                           color, curvature) {
  annotate_interactive("curve",
                       x = x, y = y, 
                       xend = xend, yend = yend,
                       size = size, alpha = alpha, 
                       tooltip = tooltip, data_id = data_id, 
                       color = color, curvature = curvature) 
  },
  x = edges$x_source, y = edges$y_source, 
  xend = edges$x_target, yend = edges$y_target,
  size = edges$size, alpha = edges$alpha, 
  tooltip = edges$tooltip, data_id = edges$id_source,
  color = edges$color, curvature = edges$curvature
)

Now we create our plot. Implementing ggiraph interactivity is incredibly easy: replace geom_*() with geom_*_interactive() and you’re good to go. The interactive geoms take two additional important aesthetics: tooltip for the text when you hover over the layer and data_id that groups interactive elements together. We created the tooltips earlier to show the overall record and scores of each game in a given match up and we use the source team as the ID.

g <- ggplot() +
  curve_list +
  geom_point_interactive(data = nodes, aes(x,y,tooltip = label,data_id = id,color = home_conference),size = 2.5) +
  geom_cfb_logos(data = nodes, aes(x = 1.1*cos(theta),y=1.1*sin(theta),team = label), height = 0.04) +
  geom_cfb_logos(data = conf, aes(x = 1.4*cos(theta),y=1.4*sin(theta),team = home_conference), height = 0.08) +
  geom_path(data = nodes,aes(x = 1.2*cos(theta),y = 1.2*sin(theta),color = home_conference),
            size = 1.8) +
  labs(title = "P5 vs P5",
       subtitle = "2014-2021") +
  scale_size_continuous(range = c(1,2.5)) + #unnecessary since we scaled in the data frame
  scale_alpha_continuous(range = c(0.2,.45)) +
  coord_fixed(clip = "off") +
  scale_color_brewer_interactive(palette = "Set2") +
  theme_void() +
  theme(legend.position = "none",
        plot.title = element_text(size = 22, hjust = 0.5, face = "bold",family = "Oswald"),
        plot.subtitle = element_text(size = 18, hjust = 0.5,family = "Oswald"))

Finally, we take the ggplot we created and put it inside the ggiraph::girafe() function to add the interactivity. The function also takes in several options that let you customize different effects using inputs or even your own custom css. Inside opts_tooltip(), we can drop the opacity of the tooltip and offset where it appears next to our mouse. In the custom css, we defined the text and background colors, made it rounded, and center aligned the text. In opts_hover() we can add css that affects the layers with the same data_id as what we hover over, in this case making the stroke red. opts_hover_inv() controls what happens to the layers with a different data_id to what we are hovering over. The girafe_css function lets us write custom css for different geoms. Here we want the opacity of everything interactive except the team nodes to drop to 0.2

chord_graph <- ggiraph::girafe(
  ggobj = g,
  width_svg = 10,
  height_svg = 10,
  bg = "white",
  options = list(
    opts_sizing(rescale = FALSE),
    opts_tooltip(
      opacity = .6,
      css = "background-color:black;color:white;padding:2px;border-radius:2px;text-align:center;",
      offx = 10,
      offy = 10
    ),
    opts_hover(css = "stroke:red;cursor:pointer;opacity:1"),
    opts_hover_inv(css = girafe_css(css = "opacity:0.2;",point = "opacity:1;"))
  )
)

Now we end up with a slick interactive graph showing the out of conference scheduling of power 5 teams in the college football playoff.

chord_graph