Publications by Jacob Martin

Spaghetti Plot

26.02.2024

knitr::opts_chunk$set(echo = TRUE) # Loading packages pacman::p_load(tidyverse, ggrepel) # Loading the data in: temps <- read.csv("bton weather2.csv") Data cleaning Before we create the graph, we need to do a little data cleaning: Change the column names to lower case We’ll use the clean_names() function in the janitor package Remo...

1412 sym Python (5403 sym/12 pcs) 5 img

DS 1870 - Homework 2.2 Solutions - Sp 2024

23.02.2024

Data Description: The lbj data set contains information about the 1703 games Lebron James has played in the NBA through the 2023/2024 season, including regular season and playoff games. While there are 29 columns in the data set, we’ll be primarily interested in only a few of them: game_type: The type of game being played: “Reg Season”: ...

3727 sym 4 img 3 tbl

DS 1870 - Homework 3.1 Solutions - Spring 2024

23.02.2024

See the blackboard post for a description of the data. If a question asks for any calculations (means, medians, tables, proportions, etc…) or graphs, make sure they appear in the knitted document The final document should not show any warnings Question 1: Skimming the data set Skim the data set. skim(bones) Data summary Name bones Number of...

2657 sym 6 img 3 tbl

STAT 5230 - Chapter 15 - Agglomerative Hierarchical Clustering

22.02.2024

Step 1: Exploratory Analysis Let’s start by looking at a correlation plot: ggcorr( data = crime, low = "red3", high = "blue3", mid = "white", label = T, label_round = 2 ) Other than burglary and larceny, there aren’t a lot of strong correlations between pairs of crimes. Next, let’s create a biplot of the first 2 PCs to se...

8904 sym Python (14862 sym/61 pcs) 33 img 2 tbl

STAT 5230: Chapter 15 - DBSCAN

20.02.2024

Introduction DBSCAN is an acronym that stands for Density Based Spatial Clustering of Applications with Noise where noise is outliers that don’t belong to a true cluster. We’ll be using the fake data from the factoextra package called multishapes since it has important features to demonstrate the advantage DBSCAN has over k-means clusterin...

3009 sym Python (5927 sym/17 pcs) 7 img

Borel Simulations

16.02.2024

Simulations for the cards in Borel, using 100,000 simulations per card Card 2 Roll all the dice. Will the sum of the results be at least 45? Card 18 Keep rolling the d30 until you roll an even number. Will the sum of all rolls be greater than 25? Card 24 Roll four d6. Will exactly half the rolls produce an even number? Card 30 Roll a d6 and t...

1923 sym 20 img

DS 1870 - Module 2: Diamond Practice Solutions

13.02.2024

Setup knitr::opts_chunk$set(echo = T, fig.align = "center") # Load the tidyverse packages: library(tidyverse) The diamonds data # We'll use the diamonds data frame, stored in ggplot2. Take a look at it: data(diamonds) tibble(diamonds) ## # A tibble: 53,940 × 10 ## carat cut color clarity depth table price ...

1139 sym R (5180 sym/11 pcs) 7 img

STAT 5230 - k-means clustering - iris data

13.02.2024

Exploratory analysis When conducting any sort of cluster analysis, it starts with visualizing the data. If there are two variables, we can make a scatter plot If there are three or more variables, we make a biplot(s) using the relevant PCs Since the iris data set has 4 numeric columns we’ll use to cluster the data, we’ll us PCA to see if t...

4938 sym Python (7695 sym/30 pcs) 14 img 1 tbl

DS 2870 - Module 4 - by argument to form groups

12.02.2024

Using dplyr Set Up Your Project and Load Libraries knitr::opts_chunk$set(echo = F, fig.align = "center") ## Load the tidyverse package pacman::p_load(tidyverse) ## Change the default theme to theme_bw() theme_set(theme_bw()) ## Read in the "us counties.csv" data set and save it as counties counties <- read.csv(...

3796 sym 2 img

DS 2870 - Homework 2 Solutions - Spring 2024

12.02.2024

knitr::opts_chunk$set(echo = T, warning = F, message = F, fig.align = "center") ## Load the required package: tidyverse library(tidyverse) ## Reading in the Dr Who data from github drwho <- read.csv("https://raw.githubusercontent.com/Shammalamala/DS-2870-Data-Sets/main/d...

2685 sym R (3313 sym/6 pcs) 5 img