Skip to Content

Plotting Trump's Tweets Over Time

Posted on 4 mins read

This R markdown document generates a density plot of Donald Trump’s tweets over time. It uses data collected from the Trump Twitter Archive GitHub. The code automatically adapts the plot to the number of years specified, back until 2011, which is when Trump started tweeting.

This chunk loads the necessary libraries. We’re going to use ggplot2 for the plotting, KernSmooth and MASS for density calculation, and viridis for color.

library(tidyverse)
library(lubridate)
library(jsonlite)
library(scales)
library(viridis)
library(KernSmooth)
library(MASS)
library(glue)

This chunk sets the timezone within R. This may not be necessary for older systems, but OS X High Sierra doesn’t seem to work well with lubridate, so manual setting is required.

Sys.setenv(tz = "America/Chicago")
options(tz = "America/Chicago")

This chunk imports data from GitHub, unzips each file, and then combines the JSONs into a data frame. It uses purrr to iterate over each file. It’s possible, though unwieldy, to download, unzip, and extract in one line.

# Importing data from the Trump Twitter Archive GitHub
tt.years <- 2017
tt.git <- "https://github.com/bpb27/trump_tweet_data_archive/raw/master/condensed_{y}.json.zip"

# Downloading files based on a vector of URLs
map(tt.years, ~ glue(tt.git, y = tt.years)) %>%
  flatten_chr() %>% unique() %>%
  map(., download.file(., basename(.), method = "libcurl"))

# Unzipping files and combining them in a data frame
dir(pattern = "*.zip", full.names = TRUE) %>%
  keep(~any(grepl("*.json", unzip(., list=TRUE)$Name))) %>%
  map_df(function(x) {
      temp <- tempdir()
      fromJSON(unzip(x, grep(x, "*.json"), exdir = temp)) %>%
        mutate(x, year = as.character(str_extract_all(x, "\\d+")))
  }) -> tt.df

# Cleaning up the downloaded .zip files
map(dir(pattern = "*.json.zip"), file.remove)

Working with timezones and dates in R is a huge pain, but lubridate fortunately makes things a bit easier. This chunk extracts the time and month from the created_at column of the JSONs and formats them as POSIXct objects for use with ggplot2.

# Converting the created_at time to POSIX and changing the timezone
tt.df$time <- as.POSIXct(
  tt.df$created_at,
  format = "%a %b %d %H:%M:%S",
  tz = "UTC")
tt.df$month <- format(
  tt.df$time,
  format = "%m")
tt.df$date <- as.POSIXct(
  paste(tt.df$year, tt.df$month, "01", sep = "/"),
  format = "%Y/%m/%d",
  tz = "UTC")
tt.df$time <- format(
  tt.df$time,
  format = "%H:%M:%S",
  tz = "America/New_York")
tt.df$time <- as.POSIXct(
  tt.df$time,
  format = "%H:%M:%S",
  tz = "UTC")
tt.df <- tt.df[!is.na(tt.df$time),]

This chunk provides various density functions using KernSmooth and MASS. The functions are used to calculate the relative density of tweets, which is the used as the color aesthetic in the plot.

# 2D density function which groups across months
tt.density <- function(x, y, n = 100) {
  den <- kde2d(x = x, y = y, n = n)
  dx <- findInterval(x, den$x)
  dy <- findInterval(y, den$y)
  dd <- cbind(dx, dy)
  return(den$z[dd])
}

# 1D density function for entire time period
# tt.density2 <- function(x) {
#   den <- bkde(x = x)
#   i <- findInterval(x, den$x)
#   return(den$y[i])
# }
# tt.df$density <- tt.density(as.numeric(tt.df$time))

# 1D density function for each month
# tt.df$density3 <- tt.df %>%
#   group_by(date) %>%
#   nest() %>%
#   { map(.$data, ~ tt.density(as.numeric(.$time))) } %>%
#   unlist()

tt.df$density <- tt.density(
  as.numeric(tt.df$time),
  as.numeric(tt.df$date))

This chunk creates the final plot, which shows Trump’s tweets by month by hour. Note that correlation between Trump’s tweet times and Fox & Friends airtime is not necessarily causative. It’s possible that Trump just has a morning routine of tweeting or that he only has time to heavily tweet in the morning. In the future, I plan to do computational text analysis of Trump’s tweets vs Fox & Friends transcripts establish a stronger causal link.

ggplot() +
  geom_tile(
    data = tt.df,
    aes(date, time, color = density),
    size = .3) +
  geom_hline(
    aes(yintercept = c(
      as.POSIXct(paste(Sys.Date(), "6:00:00"), tz = "UTC"),
      as.POSIXct(paste(Sys.Date(), "9:00:00"), tz = "UTC")),
    linetype = "Start/Stop"),
    color = "indianred",
    size = 1.2,
    show.legend = TRUE) +
  scale_y_datetime(
    labels = date_format("%H:%M"),
    breaks = date_breaks("2 hour"),
    expand = c(0, 0),
    limits = c(
      as.POSIXct(paste(Sys.Date(), "00:00:00"), tz = "UTC"),
      as.POSIXct(paste(Sys.Date(), "24:00:00"), tz = "UTC"))) +
  scale_x_datetime(
    breaks = date_breaks(
      paste(round(interval(min(tt.df$date),
                           now()) / months(1)) / 9, "months")),
    labels = ifelse(
      length(tt.years) > 1,
      date_format("%b %y"),
      date_format("%b")),
    expand = c(0, 0)) +
  labs(
    x = "Month",
    y = "Time",
    title = glue(
      "Trump Tweet Density vs. Fox & Friends Airtime, {y}",
      y = ifelse(
        length(tt.years) > 1,
        paste(min(tt.years), max(tt.years), sep = " - "),
        tt.years)),
    subtitle = "Tweets plotted by month and minute. Collected from trumptwitterarchive.com.",
    color = "Tweet Density") +
  scale_color_viridis(
    breaks = c(
      max(tt.df$density),
      (max(tt.df$density) + min(tt.df$density)) / 2,
      min(tt.df$density)),
    labels = c("High", "Mid", "Low")) +
  scale_linetype_manual(
    name = "Fox & Friends",
    values = c("Start/Stop" = "longdash")) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(
      size = 12,
      margin = margin(t = 8, unit = "pt")),
    axis.text.y = element_text(size = 12),
    axis.title.y = element_blank(),
    axis.title.x = element_blank(),
    plot.title = element_text(
      size = 16, face = "bold"),
    plot.subtitle = element_text(
      size = 12, margin = margin(b = 8, unit = "pt")),
    plot.margin = unit(c(10,10,20,10), "pt"))

trump-tweets