Publications by Ronan Harrington

Films with MPA ratings on Netflix

20.04.2021

Setup Loading the R libraries and data set. # Loading libraries library(tidyverse) library(tidytext) library(tidytuesdayR) library(forcats) # Loading data set tt <- tt_load("2021-04-20") Downloading file 1 of 1: `netflix_titles.csv` Wrangling data for visualisation. # Selecting the films in the data set, creating a numeric variable for fil...

2426 sym R (3584 sym/5 pcs) 6 img

Predicting voluntary CEO departures using machine learning

26.04.2021

Summary In this post, a machine learning model is created using this week’s #TidyTuesday data set: CEO Departures. This data set contains descriptions of thousands of CEO departures and reasons for these departures. After filtering missing values from the data set and simplifying the departure reasons to “Voluntary” and “Involuntary”, a...

3809 sym R (7006 sym/7 pcs) 4 img

Adjusting variable distribution and exploring data using mass linear regression

14.08.2021

Introduction In this post, the BEA Infrastructure Investment data set from the #TidyTuesday project is used to illustrate variable transformation and the exploreR::masslm() function. The variable for gross infrastructure investment adjusted for inflation is transformed to make it less skewed. Using these transformed investment values, multiple li...

2833 sym R (2721 sym/3 pcs) 2 img

Text mining Star Trek dialogue and classifying characters using machine learning

17.08.2021

Introduction In this article, the Star Trek voice commands data set from the #TidyTuesday project is used to investigate character diction using text mining, and train a machine learning model to distinguish between people and computers in the data set. The techniques used in this article are taken from the following textbooks, both of which are ...

6536 sym R (8445 sym/7 pcs) 10 img

Plotting Bee Colony Observations and Distributions using {ggbeeswarm} and {geomtextpath}

22.01.2022

Setup Loading the R libraries and data set. # Loading libraries library(geomtextpath) # For adding text to ggplot2 curves library(tidytuesdayR) # For loading data set library(ggbeeswarm) # For creating a beeswarm plot library(tidyverse) # For the ggplot2, dplyr libraries library(gganimate) # For plot animation library(ggthemes) # For more ggplot2...

2495 sym R (6891 sym/5 pcs) 6 img

Text Mining Chocolate Bar Characteristics with {tidytext}

25.01.2022

Introduction In this post, memorable characteristics of chocolate bars are plotted. These characteristics relate to anything about the bars, e.g. texture, flavour, overall opinion. This data set includes the country of cocoa bean origin for each chocolate bar, including “blend” for bars with multiple beans. To create these plots, the data se...

2495 sym R (4387 sym/4 pcs) 4 img