How To: Rotation plots with R!
This tutorial will guide you through the process of scraping data on player rotations within a single NBA game and throughout an entire season for a specific team. Additionally, I'll demonstrate how to create clear and informative plots to visualize these rotations.
To begin, let's kick off our tutorial by loading the necessary packages in R.
library(janitor)
library(hablar)
library(tidyverse)
library(jsonlite)
library(httr)
library(ggplot2)
library(dplyr)
library(magick)
library(nbastatR)
library(ggtext)
To ensure smooth data scraping and prevent potential errors related to the size of our connection buffer, it's crucial to set up appropriate headers. This step is essential for seamless data retrieval.
headers = c(
`Connection` = 'keep-alive',
`Accept` = 'application/json, text/plain, */*',
`x-nba-stats-token` = 'true',
`X-NewRelic-ID` = 'VQECWF5UChAHUlNTBwgBVw==',
`User-Agent` = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36',
`x-nba-stats-origin` = 'stats',
`Sec-Fetch-Site` = 'same-origin',
`Sec-Fetch-Mode` = 'cors',
`Referer` = 'https://stats.nba.com/players/leaguedashplayerbiostats/',
`Accept-Encoding` = 'gzip, deflate, br',
`Accept-Language` = 'en-US,en;q=0.9'
)
Sys.setenv("VROOM_CONNECTION_SIZE" = 131072 * 2)
To obtain the required data, we'll directly retrieve it from the NBA API. It's worth noting that the specific link we'll be working with may not be directly accessible on the NBA website without scraping.
In this initial section of the tutorial, our emphasis will be on creating a plot for a single game. This plot will display rotations, point differentials, and distinguish between the home and away teams.
To get started, we must identify the game ID of the game we're interested in. You can find this ID by visiting the NBA website, opening the game page, and examining the link structure, as illustrated in the example below.
After copying the game ID, proceed to paste it into the following code chunk.
url <- "https://stats.nba.com/stats/gamerotation?GameID=0022300224"
res <- GET(url = url, add_headers(.headers = headers))
json_resp <- fromJSON(content(res, "text"))
By executing this step, we obtain a list containing the elements that we aim to transform into a readable table.
Now, let's create two distinct tables: one for the home team and another for the away team.
subs_team_away <- data.frame(json_resp[["resultSets"]][["rowSet"]][[2]])
subs_team_home <- data.frame(json_resp[["resultSets"]][["rowSet"]][[1]])
colnames(subs_team_home) <- json_resp[["resultSets"]][["headers"]][[1]]
colnames(subs_team_away) <- json_resp[["resultSets"]][["headers"]][[1]]
subs_team_home <- subs_team_home %>%
clean_names() %>%
retype()
subs_team_away <- subs_team_away %>%
clean_names() %>%
retype()
The provided code accomplishes the following tasks:
Converts the necessary list elements into dataframes
Renames the columns using the column names from the scraped data
Ensures that the data is in the correct format and follows coding conventions (no spaces, capitalized letters, etc…)
As a result, we now have two tables that will be structured as follows.
As you can see, the only weird thing it’s the time format, because we want it to be more understandable for us at first glimpse.
The information I'm about to share is derived from here, a tutorial in PHP code about the very same plot I’m trying to recreate in R.
It appears that the time format requires some adjustment, particularly in handling an extra 0 digit. To make the time more user-friendly, we need to divide the values by 10. After this adjustment, we can convert the resulting values into an actual time string using the following provided function.
convert_to_time_string <- function(seconds) {
time_strings <- sprintf("%02d:%02d", as.integer(seconds) %/% 60, as.integer(seconds) %% 60)
return(time_strings)
}
# Apply the conversion to in_time_real and out_time_real columns
subs_team_away$in_time_real <- lapply(subs_team_away$in_time_real / 10, convert_to_time_string)
subs_team_away$out_time_real <- lapply(subs_team_away$out_time_real / 10, convert_to_time_string)
# Apply the conversion to in_time_real and out_time_real columns
subs_team_home$in_time_real <- lapply(subs_team_home$in_time_real / 10, convert_to_time_string)
subs_team_home$out_time_real <- lapply(subs_team_home$out_time_real / 10, convert_to_time_string)
Instead of that weird format, we will now have something like “00:00” or “32:45”, so a MM:SS format.
However, since the graph's x-axis is a true timeline and will not work with the MM:SS strings above, they must be converted into Unix time.
The date that we choose will not matter: only the time (minutes and seconds) matter because the graph will be formatting the x-axis ticks to only show minutes anyway.
date <- as.Date("2000-01-01")
# Convert "mm:ss" to Unix time (seconds since Unix Epoch)
subs_team_home$in_unix_time <- as.integer(as.POSIXct(paste(date, subs_team_home$in_time_real), format = "%Y-%m-%d %M:%S"))
subs_team_home$out_unix_time <- as.integer(as.POSIXct(paste(date, subs_team_home$out_time_real), format = "%Y-%m-%d %M:%S"))
subs_team_away$in_unix_time <- as.integer(as.POSIXct(paste(date, subs_team_away$in_time_real), format = "%Y-%m-%d %M:%S"))
subs_team_away$out_unix_time <- as.integer(as.POSIXct(paste(date, subs_team_away$out_time_real), format = "%Y-%m-%d %M:%S"))
To plot the points differential per minute, it's essential to determine the amount of time played for each player's section on the floor. This involves dividing the point differential in that period by the time played and then multiplying it by 60 to obtain the per-minute value. This calculation ensures a standardized representation of points differential over time.
subs_team_home$time_played <- subs_team_home$out_unix_time - subs_team_home$in_unix_time
subs_team_home$point_diff_minute = subs_team_home$pt_diff/subs_team_home$time_played*60
subs_team_away$time_played <- subs_team_away$out_unix_time - subs_team_away$in_unix_time
subs_team_away$point_diff_minute = subs_team_away$pt_diff/subs_team_away$time_played*60
Our tables are nearly prepared, and the final touch involves adding player faces to facilitate plotting on the timeline. To achieve this, we'll leverage the link_to_img
function found on The Mock Up, another great R coding resource, where you can find more info about it.
In order to use it though, we need the correct URL for each player's image. We'll obtain these URLs using the {nbastatR} package, which provides the necessary information, including player headshots.
Following that, we'll merge this player image data with our two tables using the player ID as the identifier. This step ensures that we have a comprehensive dataset ready for plotting, complete with player faces.
link_to_img <- function(x, width = 28) {
glue::glue("<img src='{x}' width='{width}'/>")
}
image_df=nba_players()
image_df=image_df%>%
select(idPlayer, urlPlayerHeadshot)
subs_team_away=merge(subs_team_away, image_df, by.x="person_id", by.y="idPlayer")
subs_team_home=merge(subs_team_home, image_df, by.x="person_id", by.y="idPlayer")
In order to plot the starters on top of the plot, I’ve used the following chunk of code, but I’m pretty sure there are other, more efficient, ways to do it.
summary_home <- subs_team_home %>%
retype()%>%
group_by(urlPlayerHeadshot) %>%
summarize(min_in_unix_time = min(in_unix_time))%>%
arrange(desc(min_in_unix_time))%>%
mutate(urlPlayerHeadshot = link_to_img(urlPlayerHeadshot))
summary_away <- subs_team_away %>%
retype()%>%
group_by(urlPlayerHeadshot) %>%
summarize(min_in_unix_time = min(in_unix_time))%>%
arrange(desc(min_in_unix_time))%>%
mutate(urlPlayerHeadshot = link_to_img(urlPlayerHeadshot))
By constructing the two tables, I've organized the player unique URLs based on their first appearance on the floor.
Now, it's time to generate the plot, and while the ggplot code might be extensive, it shouldn't pose significant challenges.
p_away <- subs_team_away %>%
mutate(urlPlayerHeadshot = link_to_img(urlPlayerHeadshot)) %>%
ggplot(aes(y = factor(urlPlayerHeadshot, levels=summary_away$urlPlayerHeadshot), xmin = in_unix_time, xmax = out_unix_time, color = point_diff_minute)) +
geom_linerange(size = 5) +
scale_x_continuous(breaks = c(946681200, 946681920, 946682640, 946683360, 946684080),
labels = c("0", "12", "24", "36", "48")) +
labs(x = "", y = "", caption="@ed_vergani | nba.com/stats") +
theme_minimal() +
coord_cartesian(xlim = c(946681200, 946684080)) +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank()) +
scale_color_gradient2(low = "#D91A1A", mid="#ffcb2d", high = "#549935") +
theme(
axis.title.x = element_text(vjust = 0, size = 16, face = "italic", family = "Archivo"),
axis.title.y = element_text(vjust = 2, size = 16, face = "italic", family = "Archivo"),
plot.caption = element_text(color = 'gray40', size = 10, family = "Archivo"),
plot.title = element_text(hjust = 0.5, face = "bold", size = 18, family = "Archivo"),
plot.subtitle = element_text(hjust = 0.5, size = 10, family = "Archivo"),
axis.text = element_text(size = 9, family = "Archivo"),
legend.position = "none",
legend.box = "horizontal",
legend.title = element_blank(),
legend.text = element_text(family = "Archivo", size = 10),
legend.key.height = unit(0.2, "in"),
legend.key.width = unit(0.6, "in"),
legend.background = element_rect(fill = '#efe8e8', color = '#efe8e8'),
panel.grid = element_line(color = "#afa9a9"),
plot.background = element_rect(fill = '#efe8e8', color = '#efe8e8'),
panel.background = element_rect(fill = '#efe8e8', color = '#efe8e8'),
panel.grid.minor = element_blank(),
plot.margin = margin(b=0.1, r=0.2, unit = "cm"),
axis.text.y = element_markdown(margin = margin(r = -15, unit = "pt"))
) +
theme(
legend.position = "top", # Position the legend at the top of the plot
legend.title = element_text(size = 8, hjust = 0.5) # Set the legend title
) +
guides(
color = guide_colorbar(barwidth = 8, barheight=0.5, title = "Points Differential per Minute", title.position = "top", title.hjust = 0.5, label.theme = element_text(size = 6)) # Adjust the color legend appearance
)
Let's break down the process step by step. Initially, we use our function to transform the given link into a usable image for the plot. Subsequently, we place these images on the y-axis, while the x-axis represents entrance time, exit time, and color (representing the point differential per minute).
The geom_linerange
function is employed, representing a vertical interval defined by x, xmin, and xmax. The scale numbers correspond to the Unix representation of the minutes 0, 12, 24, 36, and 48, denoting the beginning and end times of each quarter to provide context.
In the theme section, adjustments are made to fonts, dimensions, and the positioning of legends, axes, title, and subtitle. Feel free to modify this section according to your preferences.
Finally, the plot is saved both as an image and as a Magick object for potential later combination with the other team's plot. The subsequent part replicates the same process for the home team, with necessary edits to team names and date based on the game being plotted.
p_home <- subs_team_home %>%
mutate(urlPlayerHeadshot = link_to_img(urlPlayerHeadshot)) %>%
ggplot(aes(y = factor(urlPlayerHeadshot, levels=summary_home$urlPlayerHeadshot), xmin = in_unix_time, xmax = out_unix_time, color = point_diff_minute)) +
geom_linerange(size = 5) +
scale_x_continuous(breaks = c(946681200, 946681920, 946682640, 946683360, 946684080),
labels = c("0", "12", "24", "36", "48")) +
labs(x = "", y = "", title="Spurs @ Suns - Rotation Plot", subtitle="31/10/2023, 115-114") +
theme_minimal() +
coord_cartesian(xlim = c(946681200, 946684080)) +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank()) +
scale_color_gradient2(low = "#D91A1A", mid="#ffcb2d", high = "#549935") +
theme(
axis.title.x = element_text(vjust = 0, size = 16, face = "italic", family = "Archivo"),
axis.title.y = element_text(vjust = 2, size = 16, face = "italic", family = "Archivo"),
plot.caption = element_text(color = 'gray40', size = 12, family = "Archivo"),
plot.title = element_text(hjust = 0.5, face = "bold", size = 18, family = "Archivo"),
plot.subtitle = element_text(hjust = 0.5, size = 10, family = "Archivo"),
axis.text = element_text(size = 9, family = "Archivo"),
legend.position = "none",
legend.box = "horizontal",
legend.title = element_blank(),
legend.text = element_text(family = "Archivo", size = 10),
legend.key.height = unit(0.2, "in"),
legend.key.width = unit(0.6, "in"),
legend.background = element_rect(fill = '#efe8e8', color = '#efe8e8'),
panel.grid = element_line(color = "#afa9a9"),
plot.background = element_rect(fill = '#efe8e8', color = '#efe8e8'),
panel.background = element_rect(fill = '#efe8e8', color = '#efe8e8'),
panel.grid.minor = element_blank(),
plot.margin = margin(t=1, unit = "cm", r=0.2),
axis.text.y = element_markdown(margin = margin(r = -15, unit = "pt"))
) +
theme(
legend.position = "top", # Position the legend at the top of the plot
legend.title = element_text(size = 8, hjust = 0.5, family = "Archivo") # Set the legend title
) +
guides(
color = guide_colorbar(barwidth = 8, barheight=0.5, title = "Points Differential per Minute", title.position = "top", title.hjust = 0.5, label.theme = element_text(size = 6, family = "Archivo")) # Adjust the color legend appearance
)
ggsave("SubsHome.png", p_home, w = 9, h = 5.5, dpi = 600, type = 'cairo')
home <- image_read("SubsHome.png")
Now, we proceed to merge the two Magick objects using the image_append
function, creating a single image. Our concluding function, image_write
, will combine the two image objects into a unified one, subsequently converting it into a PNG format for convenient viewing.
img <- c(home, away)
final_subs_img=image_append(img, stack = TRUE)
image_write(final_subs_img, "RotationGame.png", format = "png", density=3000)
If the goal is to create a season version, the approach will be slightly different. Manually obtaining every game ID for a team, especially as the season progresses, can be tedious. To streamline this process, we can create a function that automatically downloads all the game IDs for every game of the season. Afterward, we filter the IDs for the specific team of interest, appending the necessary "00" required for scraping rotation information for each game.
ids=seasons_schedule(
seasons = 2024
)
ids=ids%>%
clean_names()%>%
retype()%>%
mutate(id_game = paste0("00", id_game))%>%
filter(str_detect(slug_matchup, "GSW")) #here you select all games played by the team, it also includes opponents rotations for now
# List of GameIDs
game_ids <- ids$id_game
Now, utilizing each obtained ID, we'll automate the process of scraping the rotation dataframe. This will result in a comprehensive table containing the rotations for every game, effectively consolidating this information into a single dataset.
# Initialize an empty dataframe to store the results
all_subs <- data.frame()
# Loop through each GameID
counter <- 1 # Initialize a counter variable
for (game_id in game_ids) {
cat("Scraping data for GameID #", counter, ":", game_id, "\n") # Print the current GameID with the iteration number
url <- glue::glue("https://stats.nba.com/stats/gamerotation?GameID={game_id}")
res <- GET(url = url, add_headers(.headers = headers))
json_resp <- fromJSON(content(res, "text"))
Sys.sleep(5)
subs_away <- data.frame(json_resp[["resultSets"]][["rowSet"]][[2]])
subs_home <- data.frame(json_resp[["resultSets"]][["rowSet"]][[1]])
all_subs <- bind_rows(all_subs, subs_away)
all_subs <- bind_rows(all_subs, subs_home)
counter <- counter + 1 # Increment the counter
}
The sys.sleep
function is incorporated to introduce a 5-second timeout, mitigating potential issues when scraping extensive data from the NBA website. Additionally, a counter on the console will provide real-time updates on the current game being scraped, ensuring clarity regarding the progress of the code.
Once the scraping is complete, and the comprehensive table is generated, we can proceed with similar steps as before for further analysis or visualization.
colnames(all_subs) <- json_resp[["resultSets"]][["headers"]][[1]]
team_rotations <- all_subs %>%
clean_names() %>%
retype()%>%
filter(team_name == "Warriors")
In case you are specifically working on this section, let's recreate the function responsible for converting time values to the correct string format. Following that, we can apply this function to the table to ensure consistency and readability in the representation of time.
convert_to_time_string <- function(seconds) {
time_strings <- sprintf("%02d:%02d", as.integer(seconds) %/% 60, as.integer(seconds) %% 60)
return(time_strings)
}
# Apply the conversion to in_time_real and out_time_real columns
team_rotations$in_time_real <- lapply(team_rotations$in_time_real / 10, convert_to_time_string)
team_rotations$out_time_real <- lapply(team_rotations$out_time_real / 10, convert_to_time_string)
Let’s convert again MM:SS to Unix time as follows.
date <- as.Date("2000-01-01")
# Convert "mm:ss" to Unix time (seconds since Unix Epoch)
team_rotations$in_unix_time <- as.integer(as.POSIXct(paste(date, team_rotations$in_time_real), format = "%Y-%m-%d %M:%S"))
team_rotations$out_unix_time <- as.integer(as.POSIXct(paste(date, team_rotations$out_time_real), format = "%Y-%m-%d %M:%S"))
Once more, we will provide the function for creating images and proceed to merge the headshot URLs dataframe with our existing dataframe. This step ensures that our dataset is enriched with the necessary player images for visualization.
link_to_img <- function(x, width = 28) {
glue::glue("<img src='{x}' width='{width}'/>")
}
team_rotations=merge(team_rotations, image_df, by.x="person_id", by.y="idPlayer")
We filter the dataframe to isolate the rotations for the specific team we initially aimed to analyze in terms of seasonal rotations. Following this, we arrange the order of the dataframe based on the number of substitutions that occurred during the course of the season. This arrangement provides a clear view of player involvement and substitution patterns.
summary_team <- team_rotations %>%
retype()%>%
group_by(urlPlayerHeadshot) %>%
summarize(n_rows = n()) %>%
arrange(n_rows)%>%
mutate(urlPlayerHeadshot = link_to_img(urlPlayerHeadshot))
Now, we move on to plotting the seasonal rotations. If there are any issues with the images, it's likely due to complications with the URL of the images for two-way players and other less prominent players.
In such cases, a simple solution is to filter out the corresponding rows from the dataset, addressing the problem and ensuring a smoother visualization process.
p_season <- team_rotations %>%
mutate(urlPlayerHeadshot = link_to_img(urlPlayerHeadshot)) %>%
ggplot(aes(y = factor(urlPlayerHeadshot, levels=summary_team$urlPlayerHeadshot), xmin = in_unix_time, xmax = out_unix_time)) +
geom_linerange(size = 4, alpha=0.05, color = "#330000") +
scale_x_continuous(breaks = c(946681200, 946681920, 946682640, 946683360, 946684080),
labels = c("0", "12", "24", "36", "48")) +
labs(x = "", y = "", title="Warriors - Season Rotation Plot", caption="@ed_vergani | nba.com/stats") +
theme_minimal() +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank()) +
theme(
axis.title.x = element_text(vjust = 0, size = 16, face = "italic", family = "Archivo"),
axis.title.y = element_text(vjust = 2, size = 16, face = "italic", family = "Archivo"),
plot.caption = element_text(color = 'gray40', size = 12, family = "Archivo"),
plot.title = element_text(hjust = 0.5, face = "bold", size = 18, family = "Archivo"),
plot.subtitle = element_text(hjust = 0.5, size = 10, family = "Archivo"),
axis.text = element_text(size = 9, family = "Archivo"),
legend.position = "none",
legend.box = "horizontal",
legend.title = element_blank(),
legend.text = element_text(family = "Archivo", size = 10),
legend.key.height = unit(0.2, "in"),
legend.key.width = unit(0.6, "in"),
legend.background = element_rect(fill = '#efe8e8', color = '#efe8e8'),
panel.grid = element_line(color = "#afa9a9"),
plot.background = element_rect(fill = '#efe8e8', color = '#efe8e8'),
panel.background = element_rect(fill = '#efe8e8', color = '#efe8e8'),
panel.grid.minor = element_blank(),
plot.margin = margin(t=1, unit = "cm", r=0.2),
axis.text.y = element_markdown(margin = margin(r = -15, unit = "pt"))
)
ggsave("SeasonRotation.png", p_season, w = 9, h = 6, dpi = 600, type = 'cairo')
If you have any uncertainties or suggestions regarding the plot or the code for improvement, feel free to share them in the comments. I'm here to assist and provide further guidance as needed.