Publications by Jacob Martin
DS 1870 - Module 3 Practice: Diamonds and Penguins
Setup knitr::opts_chunk$set(echo = F, fig.align = "center") # Load the tidyverse library(tidyverse) The diamonds data We’ll start by using the diamonds data frame, stored in ggplot2. Take a look at it: ## # A tibble: 53,940 × 10 ## carat cut color clarity depth table price x y z ## <dbl> <or...
3198 sym R (2719 sym/12 pcs) 15 img
STAT 5230: Homework 2 - Question 3 - Spring 2024
Question 3a) Covariance Matrix and Correlation Plot Calculate the covariance matrix and correlation plot. Comment on any important characteristics for PCA # Covariance matrix: print("Stars Covariance Matrix:") ## [1] "Stars Covariance Matrix:" round(cov(stars), digits = 1) ## Ascension Declination Mag10 Mag_Earth Log_Dist ## Ascensio...
2839 sym 8 img
DS 2870 - Homework 4 Key - Spring 2024
Data Description The data set has information about 1538 skeletons kept in different locations across the world. There are 3 categorical variables: sex: The sex of the skeleton (“Male” = known male, “Female” = known female, “uMale” = probably male, “uFemale” = probably female) age: the age group of the skeleton (18-24, 25-29, 3...
4653 sym Python (7107 sym/15 pcs) 3 img
DS 2870 - Module 5 - Adding Text to a Bar Chart
Setup knitr::opts_chunk$set(echo = TRUE) # Load your package when you want to use it: pacman::p_load(tidyverse, ggfittext) # Changing default theme to theme_test() theme_set(theme_test()) theme_update( plot.title = element_text(hjust = 0.5, size = 14) ) # Reading in the drives2 data set drives <- read.csv(...
1477 sym 5 img
DS 2870: Homework 3 Solutions - Spring 2024
Data Description: The lbj data set contains information about the 1703 games Lebron James has played in the NBA through the 2023/2024 season, including regular season and playoff games. While there are 29 columns in the data set, we’ll be primarily interested in only a few of them: game_type: The type of game being played: “Reg Season”: ...
3128 sym 7 img
Spaghetti Plot
knitr::opts_chunk$set(echo = TRUE) # Loading packages pacman::p_load(tidyverse, ggrepel) # Loading the data in: temps <- read.csv("bton weather2.csv") Data cleaning Before we create the graph, we need to do a little data cleaning: Change the column names to lower case We’ll use the clean_names() function in the janitor package Remo...
1412 sym Python (5403 sym/12 pcs) 5 img
DS 1870 - Homework 2.2 Solutions - Sp 2024
Data Description: The lbj data set contains information about the 1703 games Lebron James has played in the NBA through the 2023/2024 season, including regular season and playoff games. While there are 29 columns in the data set, we’ll be primarily interested in only a few of them: game_type: The type of game being played: “Reg Season”: ...
3727 sym 4 img 3 tbl
DS 1870 - Homework 3.1 Solutions - Spring 2024
See the blackboard post for a description of the data. If a question asks for any calculations (means, medians, tables, proportions, etc…) or graphs, make sure they appear in the knitted document The final document should not show any warnings Question 1: Skimming the data set Skim the data set. skim(bones) Data summary Name bones Number of...
2657 sym 6 img 3 tbl
STAT 5230 - Chapter 15 - Agglomerative Hierarchical Clustering
Step 1: Exploratory Analysis Let’s start by looking at a correlation plot: ggcorr( data = crime, low = "red3", high = "blue3", mid = "white", label = T, label_round = 2 ) Other than burglary and larceny, there aren’t a lot of strong correlations between pairs of crimes. Next, let’s create a biplot of the first 2 PCs to se...
8904 sym Python (14862 sym/61 pcs) 33 img 2 tbl
STAT 5230: Chapter 15 - DBSCAN
Introduction DBSCAN is an acronym that stands for Density Based Spatial Clustering of Applications with Noise where noise is outliers that don’t belong to a true cluster. We’ll be using the fake data from the factoextra package called multishapes since it has important features to demonstrate the advantage DBSCAN has over k-means clusterin...
3009 sym Python (5927 sym/17 pcs) 7 img