Publications by Jacob Martin
Clustering US Counties with Education and Economic Features
Loading the data and initial cleaning Cleaning the education data education <- read_xlsx("Education.xlsx", skip = 3) |> # Making the names R friendly janitor::clean_names() |> # Connecticut has missing values for 2022, we'll use the next newest year mutate( # # rucc # x2023_rurual_urban_continuum_code = if_else( # ...
17913 sym Python (26420 sym/44 pcs) 22 img 10 tbl
DS 2870: Homework 8 - Fall 2024 - key
Data Description: The used cars.csv file has information about 1000 randomly sampled used sedans (4 door cars) in 2021. The variables are: manufactor: The company that makes the car model: The model of the car price: The sale price of the used car (our response variable) year: The year are the car was manufactured age: The age of the car when i...
3995 sym Python (5489 sym/13 pcs) 3 img
Does Receiving the Second Half Kickoff Have an Advantage in the NFL?
Does the second half kickoff have an impact on who wins an NFL game? We’ll be looking at the probability the team that receives the kickoff after halftime wins an NFL game. We’ll be accounting for who is winning at the half. The data is the last 10 NFL season results, collected from the nflfastR package and the load_pbp() function. pbp <- ...
7586 sym Python (12378 sym/15 pcs) 6 img 4 tbl
Is it easier to kick field goals in indoor stadiums in the NFL?
Introduction In the NFL, teams can score points in three ways: Safety: Two points Occurs less than 1% of possessions Field Goal: Three points Occurs about 40% of teams’ offensive possessions Touchdown: Six points Occurs about 20% of teams’ offensive possessions A field goal attempt occurs when a team attempts to kick the ball through go...
13180 sym R (26504 sym/27 pcs) 8 img 7 tbl
Is there an advantage kicking field goals in domes vs stadiums in the NFL?
Introduction In the NFL, teams can score points in three ways: Safety: Two points Occurs less than 1% of possessions Field Goal: Three points Occurs about 40% of teams’ offensive possessions Touchdown: Six points Occurs about 20% of teams’ offensive possessions A field goal attempt occurs when a team attempts to kick the ball through go...
11843 sym 8 img 7 tbl
DS 2870 - Homework 7 - Solutions
Data Description: The used cars.csv file has information about 1000 randomly sampled used sedans (4 door cars) in 2021. The variables are: manufactor: The company that makes the car model: The model of the car price: The sale price of the used car (our response variable) year: The year are the car was manufactured age: The age of the car when i...
6069 sym 4 img 1 tbl
DS 2870: Homework 6 Solutions - Fall 2024
Set up Logistic regression For our example of objective functions, we looked a simple linear regression since it is a very common machine learning method. Another common machine learning method is logistic regression, which attempts to estimate the probability of success of a binary (categorical with two outcome) variable. While we won’t be ...
5671 sym Python (3799 sym/18 pcs) 1 img
DS 2870: Homework 5 - Fall 2024 - Solutions
Reading in the data The code chunk below will create two data sets: strava_full: A data set on 149 recorded bike activities with two columns date: The date of the activity. A date can appear multiple times if there were multiple activities on the same day distance: The total distance of the trip in kilometers (km). by_day: A data set with one...
1782 sym Python (3650 sym/5 pcs) 2 img
DS 2870: Gradient Descent
Minimizing a second level polynomial We’ll start by minimizing a simple function: \[f(x) = (x - 2)^2\] Since the whole term is squared, the minimum value of \(f(x)\) is 0, which occurs when \(x = 2\) We can also graph it: But what can we do if we can’t graph the function, or it the answer can’t be calculated directly? We can use gradient de...
7328 sym 10 img 1 tbl
DS 2870: Module 4 Homework Key - Fall 2024
Data Description The movies data set has 44010 rows about the amount of explicit content (drugs, language, sex, nudity, and violence) found in 1467 movies released since 1958. Each movie is represented by 30 rows (1 row = movie & tag_name type combo). The relevant variables in the data set are: imdb_id: The identifier used by IMDB to uniquely ...
4345 sym Python (7454 sym/11 pcs) 1 img