Publications by Ronan Harrington
Films with MPA ratings on Netflix
Setup Loading the R libraries and data set. # Loading libraries library(tidyverse) library(tidytext) library(tidytuesdayR) library(forcats) # Loading data set tt <- tt_load("2021-04-20") Downloading file 1 of 1: `netflix_titles.csv` Wrangling data for visualisation. # Selecting the films in the data set, creating a numeric variable for fil...
2426 sym R (3584 sym/5 pcs) 6 img
Predicting voluntary CEO departures using machine learning
Summary In this post, a machine learning model is created using this week’s #TidyTuesday data set: CEO Departures. This data set contains descriptions of thousands of CEO departures and reasons for these departures. After filtering missing values from the data set and simplifying the departure reasons to “Voluntary” and “Involuntary”, a...
3809 sym R (7006 sym/7 pcs) 4 img
Adjusting variable distribution and exploring data using mass linear regression
Introduction In this post, the BEA Infrastructure Investment data set from the #TidyTuesday project is used to illustrate variable transformation and the exploreR::masslm() function. The variable for gross infrastructure investment adjusted for inflation is transformed to make it less skewed. Using these transformed investment values, multiple li...
2833 sym R (2721 sym/3 pcs) 2 img
Text mining Star Trek dialogue and classifying characters using machine learning
Introduction In this article, the Star Trek voice commands data set from the #TidyTuesday project is used to investigate character diction using text mining, and train a machine learning model to distinguish between people and computers in the data set. The techniques used in this article are taken from the following textbooks, both of which are ...
6536 sym R (8445 sym/7 pcs) 10 img
Plotting Bee Colony Observations and Distributions using {ggbeeswarm} and {geomtextpath}
Setup Loading the R libraries and data set. # Loading libraries library(geomtextpath) # For adding text to ggplot2 curves library(tidytuesdayR) # For loading data set library(ggbeeswarm) # For creating a beeswarm plot library(tidyverse) # For the ggplot2, dplyr libraries library(gganimate) # For plot animation library(ggthemes) # For more ggplot2...
2495 sym R (6891 sym/5 pcs) 6 img
Text Mining Chocolate Bar Characteristics with {tidytext}
Introduction In this post, memorable characteristics of chocolate bars are plotted. These characteristics relate to anything about the bars, e.g. texture, flavour, overall opinion. This data set includes the country of cocoa bean origin for each chocolate bar, including “blend” for bars with multiple beans. To create these plots, the data se...
2495 sym R (4387 sym/4 pcs) 4 img