Publications by Jacob Martin

DS 2870 - Module 5 - Maps with GGplot

16.01.2024

knitr::opts_chunk$set(echo = F, fig.align = "center") ## Load the libraries we will be using pacman::p_load(gapminder, socviz, tidyverse, grid, ggthemes, usmap, maps, statebins, viridis, leaflet) # Creating a vector for dem/rep colors party_colors <- c("Democratic" = "#2E74C0", "Re...

3771 sym 10 img

DS 1870 - Module 2: Creating a Frequency Table to a Single Variable

03.01.2024

Setting up the R Markdown File knitr::opts_chunk$set(echo = F) # Start by loading the tidyverse, gt, and skimr package pacman::p_load(tidyverse, skimr, gt) # Next, read in the Titanic Data set from github titanic <- read.csv("https://raw.githubusercontent.com/Shammalamala/DS-1870-Data/main/titanic.csv") Let’s check the data by using head() ...

2642 sym Python (1264 sym/8 pcs) 4 tbl

DS 2870 - Homework 8 Fall 2023 - Solutions

29.11.2023

knitr::opts_chunk$set(echo = TRUE, message = F, warning = F, fig.align = "center") # load packages: typical - tidyverse and skimr # Classification - class, caret, rpart, rpart.plot pacman::p_load(tidyverse, skimr, class, caret, rpart, rpart.plot) theme_set(them...

4472 sym Python (12408 sym/25 pcs) 3 img

DS 2870 - Homework 4 Key - Fall 2023

15.11.2023

Data Description The movies data set has 44010 rows about the amount of explicit content (drugs, language, sex, nudity, and violence) found in 1467 movies released since 1958. Each movie is represented by 30 rows (1 row = movie & tag_name type combo). The relevant variables in the data set are: imdb_id: The identifier used by IMDB to uniquely ...

4335 sym Python (7582 sym/13 pcs) 1 img

DS 2870 - Homework 7 Solutions - Fall 2023

14.11.2023

Data Description We’ll be using the AirBnB data set from New York City in 2019. It lists every apartment available in New York City in 2019. The relevant columns are: neighbourhood_group: Which of the 5 boroughs the apartment is located in room_type: The type of room being offered (“Entire home/apt”, “Private room”, “Shared room”)...

4732 sym Python (9126 sym/21 pcs) 1 img 1 tbl

DS 2870 - Homework 5 Solutions - Fall 2023

24.10.2023

Data Description The temps data contains the high (TMAX) and low (TMIN) for each day (DATE), recorded at the South Burlington Airport between December 1940 to September 2023. Snow (SNOW) and rain (PRCP) are also included in the data, but won’t be used (neither will be TAVG since it is missing for almost 80% of the days). You will be creatin...

3151 sym Python (5534 sym/8 pcs) 4 img

DS 2870: Module 6 - Creating County Level Maps

16.10.2023

Set Up Your Project and Load Libraries knitr::opts_chunk$set(echo = F, fig.align = "center") ## Set the default size of figures (I only use it for knitting) # knitr::opts_chunk$set(fig.width=8, fig.height=5) ## Load the libraries we will be using pacman::p_load(gapminder, socviz, tidyverse, usmap, map...

2251 sym 3 img

DS 2870: Module 5: Adding text to a scatter plot

09.10.2023

Set Up Your Project and Load Libraries knitr::opts_chunk$set(echo = F, fig.align = "center", warning = F, message = F, fig.height = 6, fig.width = 8) ## Set the default size of figures # knitr::opts_chunk$set(fig.width=8, fig.heigh...

3157 sym 12 img 4 tbl

DS 2870: Module 5 - Dumbbell Plots

02.10.2023

What is a Dumbbell Plot? A dumbbell plot is used to demonstrate the difference or change of a case (person, place, thing, etc…) of a quantitative variable across a binary variable (categorical variable with 2 outcomes). Streaming service subscribers in 2021 vs 2022 Case: Streaming service Quantitative: Number of subscribers Binary: Year (202...

5246 sym 3 img 3 tbl

DS 2870 HW 3 Solutions - Fall 2023

02.10.2023

Data Description The movies data set has 5684 rows about the amount of expicit content (drugs, language, sex & nudity, and violence) found in 1421 movies released since 1985. Each movies is represented by 4 rows (1 row = movie & content type combo). The relevant variables in the data set are: imdb_id: The identifier used by IMDB to uniquely sp...

2682 sym 6 img