First we load the necessary libraries: {cfbfastR} pulls the match up data, {cfbplotR} makes it easy to add the team logos, {ggiraph} adds the interactivity to the ggplot, and the {tidyverse} packages for wrangling the data.
#remotes::install_github(repo = "Kazink36/cfbplotR")
library(cfbplotR)
library(cfbfastR)
library(ggiraph)
library(tidyverse)
options(dplyr.summarise.inform = FALSE)
Now we can pull in the matchup data we want from collegefootballdata.com using the
cfbd_game_info()
function (this requires an API key, check
{cfbfastR}’s documentation for more info). The API only returns one
season at a time so we’ll use purrr::map_df
to pull in the
2014 to 2021 seasons. We’re only looking at who schedules who so we’ll
ignore bowl games. To include bowl games you can add the parameter
season_type = "both"
to the function.
year_vector = 2021:2021
game_info <- purrr::map_df(
year_vector,
function(x){
Sys.sleep(2)
cfbfastR::cfbd_game_info(x)
}
)
## Rows: 6,408
## Columns: 28
## $ game_id <int> 400548403, 400548402, 400547829, 400547960, 4005480…
## $ season <int> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 201…
## $ week <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ season_type <chr> "regular", "regular", "regular", "regular", "regula…
## $ start_date <chr> "2014-08-27T23:00:00.000Z", "2014-08-28T22:00:00.00…
## $ start_time_tbd <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ neutral_site <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ conference_game <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ attendance <int> 10140, 82847, 21003, 0, 9104, 15793, 12398, 45925, …
## $ venue_id <int> 3495, 3994, 3817, 3953, 3768, 3786, 3764, 587, 3601…
## $ venue <chr> "Georgia Dome", "Williams-Brice Stadium", "Malone S…
## $ home_id <int> 2247, 2579, 2433, 135, 2006, 2117, 2459, 254, 166, …
## $ home_team <chr> "Georgia State", "South Carolina", "Louisiana Monro…
## $ home_conference <chr> "Sun Belt", "SEC", "Sun Belt", "Big Ten", "Mid-Amer…
## $ home_points <int> 38, 28, 17, 42, 41, 20, 55, 56, 28, 35, 38, 42, 38,…
## $ home_post_win_prob <chr> "0.7040558473410293", "0.005651845193381177", "0.98…
## $ home_pregame_elo <int> NA, 1863, 1303, NA, NA, NA, NA, NA, NA, 1625, 1345,…
## $ home_postgame_elo <int> NA, 1787, 1317, NA, NA, NA, NA, NA, NA, 1690, 1356,…
## $ away_id <int> 2000, 245, 154, 2197, 47, 236, 2506, 304, 13, 68, 2…
## $ away_team <chr> "Abilene Christian", "Texas A&M", "Wake Forest", "E…
## $ away_conference <chr> NA, "SEC", "ACC", NA, NA, NA, NA, NA, NA, "Mountain…
## $ away_points <int> 37, 52, 10, 20, 0, 16, 3, 14, 10, 13, 31, 10, 41, 1…
## $ away_post_win_prob <chr> "0.29594415265897067", "0.9943481548066189", "0.010…
## $ away_pregame_elo <int> NA, 1765, 1377, NA, NA, NA, NA, NA, NA, 1653, 1374,…
## $ away_postgame_elo <int> NA, 1841, 1363, NA, NA, NA, NA, NA, NA, 1588, 1363,…
## $ excitement_index <chr> "9.07877961509999", "3.5531582091", "6.450991846699…
## $ highlights <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ notes <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
To make the chord diagram, we need to position all of the teams in a
circle. So we’ll create a new nodes
data frame we can use
for reference. First we take all of the distinct teams and conferences
in game_info
and filter it down to just power 5 conference
teams and arrange the teams by conference so that teams in the same
conference are next to each other on the circle. We give each team a
node id and an angle for positioning on the circle.
nodes <- game_info %>%
distinct(home_team,home_conference) %>%
filter(home_team %in% cfbplotR::valid_team_names("P5")) %>%
arrange(home_conference,home_team) %>%
transmute(
label = home_team,
home_conference,
id = row_number(),
node = row_number()
) %>%
mutate(
theta = 2*pi*row_number()/nrow(.),
x = cos(theta),
y = sin(theta)
)
glimpse(nodes)
## Rows: 66
## Columns: 7
## $ label <chr> "Boston College", "Clemson", "Duke", "Florida State", …
## $ home_conference <chr> "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC", "ACC"…
## $ id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
## $ node <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
## $ theta <dbl> 0.09519978, 0.19039955, 0.28559933, 0.38079911, 0.4759…
## $ x <dbl> 0.99547192, 0.98192870, 0.95949297, 0.92836793, 0.8888…
## $ y <dbl> 0.09505604, 0.18925124, 0.28173256, 0.37166246, 0.4582…
We also need a conf
data frame to plot the conference
logos. We do this by grouping the nodes
data frame by
conference and getting the minimum and maximum angles of the conference
to plot the conference curve and the average angle to plot the
conference logo.
conf <- nodes %>%
group_by(home_conference) %>%
summarize(
theta_min = min(theta),
theta_max = max(theta),
theta = mean(theta)
) %>%
ungroup()
glimpse(conf)
## Rows: 6
## Columns: 4
## $ home_conference <chr> "ACC", "Big 12", "Big Ten", "FBS Independents", "Pac-1…
## $ theta_min <dbl> 0.09519978, 1.42799666, 2.37999443, 3.71279132, 3.9031…
## $ theta_max <dbl> 1.332797, 2.284795, 3.617592, 3.807991, 4.950388, 6.28…
## $ theta <dbl> 0.7139983, 1.8563957, 2.9987930, 3.7603912, 4.4267896,…
Now we create an edges
data frame that we can use for
the curves connecting teams that have played each other. First, we use a
slightly modified double_games()
function from the
{nflseedR} package that will create two rows for each game. We then
summarize the win-loss record for each match up and create the text for
each tooltip. We then join edges
with our
nodes
data frame to get the source and target nodes.
# Slightly modified from nflseedR
double_games <- function(g){
g1 <- g %>%
select(game_id, season,week, away_team, home_team, result, home_points, away_points) %>%
rename(team = away_team, opp = home_team, points = away_points, opp_points = home_points) %>%
mutate(result = -1 * result)
g2 <- g %>%
select(game_id, season,week, away_team, home_team, result, home_points, away_points) %>%
rename(team = home_team, opp = away_team, points = home_points, opp_points = away_points)
g <- bind_rows(g1, g2) %>%
mutate(
outcome = case_when(result > 0 ~ 1, result < 0 ~ 0, result == 0 ~ 0.5, TRUE ~ NA_real_)
)
return(g)
}
edges <- game_info %>%
mutate(result = home_points - away_points) %>%
filter(!conference_game) %>%
double_games() %>%
filter(team %in% cfbplotR::valid_team_names("P5"),opp %in% cfbplotR::valid_team_names("P5")) %>%
group_by(team,opp) %>%
arrange(season) %>%
summarize(
n = n(),
w = sum(outcome),
l = n-w,
text = paste0(season,": ",points," - ",opp_points, collapse = "\n")
) %>%
ungroup() %>%
mutate(tooltip = paste0(team," - ",opp,"\n",w," - ",l,"\n",text)) %>%
left_join(nodes ,by = c("team" = "label")) %>%
left_join(nodes, by = c("opp" = "label"),suffix = c("_source","_target")) %>%
ungroup()
glimpse(edges)
## Rows: 326
## Columns: 19
## $ team <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Al…
## $ opp <chr> "Duke", "Florida State", "Louisville", "Miami",…
## $ n <int> 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 1, 2, 2, 1,…
## $ w <dbl> 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 2, 1, 0, 1, 1, 1,…
## $ l <dbl> 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 1, 1, 1, 0,…
## $ text <chr> "2019: 42 - 3", "2017: 24 - 7", "2018: 51 - 14"…
## $ tooltip <chr> "Alabama - Duke\n1 - 0\n2019: 42 - 3", "Alabama…
## $ home_conference_source <chr> "SEC", "SEC", "SEC", "SEC", "SEC", "SEC", "SEC"…
## $ id_source <int> 53, 53, 53, 53, 53, 53, 53, 41, 41, 42, 42, 42,…
## $ node_source <int> 53, 53, 53, 53, 53, 53, 53, 41, 41, 42, 42, 42,…
## $ theta_source <dbl> 5.04558820, 5.04558820, 5.04558820, 5.04558820,…
## $ x_source <dbl> 0.3270680, 0.3270680, 0.3270680, 0.3270680, 0.3…
## $ y_source <dbl> -0.94500082, -0.94500082, -0.94500082, -0.94500…
## $ home_conference_target <chr> "ACC", "ACC", "ACC", "ACC", "Pac-12", "Big 12",…
## $ id_target <int> 3, 4, 6, 7, 49, 24, 38, 39, 23, 39, 30, 40, 65,…
## $ node_target <int> 3, 4, 6, 7, 49, 24, 38, 39, 23, 39, 30, 40, 65,…
## $ theta_target <dbl> 0.2855993, 0.3807991, 0.5711987, 0.6663984, 4.6…
## $ x_target <dbl> 0.95949297, 0.92836793, 0.84125353, 0.78605309,…
## $ y_target <dbl> 0.28173256, 0.37166246, 0.54064082, 0.61815899,…
Eventually we’d like to use the curve geom in our plot to make the
network, but curvature is not an aesthetic so we have to get a little
fancy. We’ll end up using an annotate so we need to do all of our size
and alpha scaling manually (this may not be necessary if you use
geom_curve_interactive()
in the next block but I haven’t
tested it out). We’ll also calculate the curvature based on the
positioning of the source and target nodes: the closer they are to
opposite ends of the circle, the smaller the curvature of the
connection.
edges <- edges %>%
mutate(
curvature = case_when(
id_source < id_target ~ .95*(theta_target-theta_source-pi)/(2*pi),
id_source > id_target ~ -.95*(theta_source-theta_target-pi)/(2*pi),
TRUE ~ 0
),
color = "black",
alpha = (n-min(n))/(max(n)-min(n))*(.45-.25)+.25,
size = (n-min(n))/(max(n)-min(n))*(2.5-1)+1,
curvature = ifelse(abs(curvature) < 0.0001, 0, curvature) #causing problems when plotting
)
glimpse(edges)
## Rows: 326
## Columns: 23
## $ team <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Al…
## $ opp <chr> "Duke", "Florida State", "Louisville", "Miami",…
## $ n <int> 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 2, 1, 1, 2, 2, 1,…
## $ w <dbl> 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 2, 1, 0, 1, 1, 1,…
## $ l <dbl> 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 1, 1, 1, 0,…
## $ text <chr> "2019: 42 - 3", "2017: 24 - 7", "2018: 51 - 14"…
## $ tooltip <chr> "Alabama - Duke\n1 - 0\n2019: 42 - 3", "Alabama…
## $ home_conference_source <chr> "SEC", "SEC", "SEC", "SEC", "SEC", "SEC", "SEC"…
## $ id_source <int> 53, 53, 53, 53, 53, 53, 53, 41, 41, 42, 42, 42,…
## $ node_source <int> 53, 53, 53, 53, 53, 53, 53, 41, 41, 42, 42, 42,…
## $ theta_source <dbl> 5.04558820, 5.04558820, 5.04558820, 5.04558820,…
## $ x_source <dbl> 0.3270680, 0.3270680, 0.3270680, 0.3270680, 0.3…
## $ y_source <dbl> -0.94500082, -0.94500082, -0.94500082, -0.94500…
## $ home_conference_target <chr> "ACC", "ACC", "ACC", "ACC", "Pac-12", "Big 12",…
## $ id_target <int> 3, 4, 6, 7, 49, 24, 38, 39, 23, 39, 30, 40, 65,…
## $ node_target <int> 3, 4, 6, 7, 49, 24, 38, 39, 23, 39, 30, 40, 65,…
## $ theta_target <dbl> 0.2855993, 0.3807991, 0.5711987, 0.6663984, 4.6…
## $ x_target <dbl> 0.95949297, 0.92836793, 0.84125353, 0.78605309,…
## $ y_target <dbl> 0.28173256, 0.37166246, 0.54064082, 0.61815899,…
## $ curvature <dbl> -0.24469697, -0.23030303, -0.20151515, -0.18712…
## $ color <chr> "black", "black", "black", "black", "black", "b…
## $ alpha <dbl> 0.2500000, 0.2500000, 0.2500000, 0.2500000, 0.2…
## $ size <dbl> 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.50,…
Curvature isn’t an aesthetic we can use in our geom, but ggplot does
allow us to add a list of layers to our plot. So we’ll use the
Map
function to generate a list of the 562 curves we need
to make our connections (this is actually double what is necessary for a
standard plot, but in order for the interactivity to be nice when
highlighting a team, we need the doubled data).
curve_list <- Map(function(x, y,
xend, yend,
size, alpha,
tooltip, data_id,
color, curvature) {
annotate_interactive("curve",
x = x, y = y,
xend = xend, yend = yend,
size = size, alpha = alpha,
tooltip = tooltip, data_id = data_id,
color = color, curvature = curvature)
},
x = edges$x_source, y = edges$y_source,
xend = edges$x_target, yend = edges$y_target,
size = edges$size, alpha = edges$alpha,
tooltip = edges$tooltip, data_id = edges$id_source,
color = edges$color, curvature = edges$curvature
)
Now we create our plot. Implementing ggiraph interactivity is
incredibly easy: replace geom_*()
with
geom_*_interactive()
and you’re good to go. The interactive
geoms take two additional important aesthetics: tooltip
for
the text when you hover over the layer and data_id
that
groups interactive elements together. We created the tooltips earlier to
show the overall record and scores of each game in a given match up and
we use the source team as the ID.
g <- ggplot() +
curve_list +
geom_point_interactive(data = nodes, aes(x,y,tooltip = label,data_id = id,color = home_conference),size = 2.5) +
geom_cfb_logos(data = nodes, aes(x = 1.1*cos(theta),y=1.1*sin(theta),team = label), height = 0.04) +
geom_cfb_logos(data = conf, aes(x = 1.4*cos(theta),y=1.4*sin(theta),team = home_conference), height = 0.08) +
geom_path(data = nodes,aes(x = 1.2*cos(theta),y = 1.2*sin(theta),color = home_conference),
size = 1.8) +
labs(title = "P5 vs P5",
subtitle = "2014-2021") +
scale_size_continuous(range = c(1,2.5)) + #unnecessary since we scaled in the data frame
scale_alpha_continuous(range = c(0.2,.45)) +
coord_fixed(clip = "off") +
scale_color_brewer_interactive(palette = "Set2") +
theme_void() +
theme(legend.position = "none",
plot.title = element_text(size = 22, hjust = 0.5, face = "bold",family = "Oswald"),
plot.subtitle = element_text(size = 18, hjust = 0.5,family = "Oswald"))
Finally, we take the ggplot we created and put it inside the
ggiraph::girafe()
function to add the interactivity. The
function also takes in several options that let you customize different
effects using inputs or even your own custom css. Inside
opts_tooltip()
, we can drop the opacity of the tooltip and
offset where it appears next to our mouse. In the custom css, we defined
the text and background colors, made it rounded, and center aligned the
text. In opts_hover()
we can add css that affects the
layers with the same data_id as what we hover over, in this case making
the stroke red. opts_hover_inv()
controls what happens to
the layers with a different data_id to what we are hovering
over. The girafe_css
function lets us write custom css for
different geoms. Here we want the opacity of everything interactive
except the team nodes to drop to 0.2
chord_graph <- ggiraph::girafe(
ggobj = g,
width_svg = 10,
height_svg = 10,
bg = "white",
options = list(
opts_sizing(rescale = FALSE),
opts_tooltip(
opacity = .6,
css = "background-color:black;color:white;padding:2px;border-radius:2px;text-align:center;",
offx = 10,
offy = 10
),
opts_hover(css = "stroke:red;cursor:pointer;opacity:1"),
opts_hover_inv(css = girafe_css(css = "opacity:0.2;",point = "opacity:1;"))
)
)
Now we end up with a slick interactive graph showing the out of conference scheduling of power 5 teams in the college football playoff.
chord_graph