Publications by Jacob Martin
STAT 5230: PCA - Part 2
Let’s look at the variances of each variable: data.frame( variance = bones |> cov() |> diag() ) |> # Changing the row name to a column in the data rownames_to_column(var = "bone") |> # Arranging from largest to smallest variance arrange(-variance) ## bone variance ## 1 fem_l 993 ## 2 fem_r 9...
4480 sym Python (8451 sym/25 pcs) 10 img
STAT 5230 - PCA part 1
What is Principal Component Analysis? Principal Component Analysis (PCA) is a method that uses the correlation between the original variables (\(\textbf{y}\)) to create a smaller set (\(k\)) or uncorrelated variables (\(\textbf{z}\)). We use the first eigenvector \(\textbf{e}_1\) of the covariance matrix, \(\textbf{S}\), or correlation matrix, ...
3860 sym Python (9013 sym/15 pcs) 3 img
DS 2870: Chapter 4 - MVN
Let’s clean the data by: dropping any non-numeric columns removing any rows with a missing value making the column names a little easier to use: bones <- bone_measures |> # Picking the only numeric columns select(where(is.numeric)) |> # dropping any rows with a missing value drop_na() |> # using clean_names() in the janit...
3728 sym Python (11887 sym/38 pcs) 12 img
STAT 5230: Chapter 4 - Multivariate Normal
Let’s clean the data by: dropping any non-numeric columns removing any rows with a missing value making the column names a little easier to use: bones <- bone_measures |> # Picking the only numeric columns select(where(is.numeric)) |> # dropping any rows with a missing value drop_na() |> # using clean_names() in the janit...
3720 sym Python (11893 sym/39 pcs) 12 img
STAT 5230: Chapter 4 - Multivariate Normal
Let’s clean the data by: dropping any non-numeric columns removing any rows with a missing value making the column names a little easier to use: bones <- bone_measures |> # Picking the only numeric columns select(where(is.numeric)) |> # dropping any rows with a missing value drop_na() |> # using clean_names() in the janit...
3729 sym Python (13206 sym/47 pcs) 13 img
STAT 5230: Chapter 3 - Summarizing the Bones Data
Initial Examination of the data There are a few ways to initial examine the data in R: is(data) will tell you what type of object R is storing the data as str(data) will report the type of variable for each column head(data) will show the first 6 rows skim (in the skimr package) shows the descriptive statistics for the data # Determining what t...
2664 sym 2 img 5 tbl
DS 2870 - Module 5 - Maps with GGplot
knitr::opts_chunk$set(echo = F, fig.align = "center") ## Load the libraries we will be using pacman::p_load(gapminder, socviz, tidyverse, grid, ggthemes, usmap, maps, statebins, viridis, leaflet) # Creating a vector for dem/rep colors party_colors <- c("Democratic" = "#2E74C0", "Re...
3771 sym 10 img
DS 1870 - Module 2: Creating a Frequency Table to a Single Variable
Setting up the R Markdown File knitr::opts_chunk$set(echo = F) # Start by loading the tidyverse, gt, and skimr package pacman::p_load(tidyverse, skimr, gt) # Next, read in the Titanic Data set from github titanic <- read.csv("https://raw.githubusercontent.com/Shammalamala/DS-1870-Data/main/titanic.csv") Let’s check the data by using head() ...
2642 sym Python (1264 sym/8 pcs) 4 tbl
DS 2870 - Homework 8 Fall 2023 - Solutions
knitr::opts_chunk$set(echo = TRUE, message = F, warning = F, fig.align = "center") # load packages: typical - tidyverse and skimr # Classification - class, caret, rpart, rpart.plot pacman::p_load(tidyverse, skimr, class, caret, rpart, rpart.plot) theme_set(them...
4472 sym Python (12408 sym/25 pcs) 3 img
DS 2870 - Homework 4 Key - Fall 2023
Data Description The movies data set has 44010 rows about the amount of explicit content (drugs, language, sex, nudity, and violence) found in 1467 movies released since 1958. Each movie is represented by 30 rows (1 row = movie & tag_name type combo). The relevant variables in the data set are: imdb_id: The identifier used by IMDB to uniquely ...
4335 sym Python (7582 sym/13 pcs) 1 img