Publications by Jacob Martin

Clustering US Counties with Education and Economic Features

02.01.2025

Loading the data and initial cleaning Cleaning the education data education <- read_xlsx("Education.xlsx", skip = 3) |> # Making the names R friendly janitor::clean_names() |> # Connecticut has missing values for 2022, we'll use the next newest year mutate( # # rucc # x2023_rurual_urban_continuum_code = if_else( # ...

17913 sym Python (26420 sym/44 pcs) 22 img 10 tbl

DS 2870: Homework 8 - Fall 2024 - key

05.12.2024

Data Description: The used cars.csv file has information about 1000 randomly sampled used sedans (4 door cars) in 2021. The variables are: manufactor: The company that makes the car model: The model of the car price: The sale price of the used car (our response variable) year: The year are the car was manufactured age: The age of the car when i...

3995 sym Python (5489 sym/13 pcs) 3 img

Does Receiving the Second Half Kickoff Have an Advantage in the NFL?

22.11.2024

Does the second half kickoff have an impact on who wins an NFL game? We’ll be looking at the probability the team that receives the kickoff after halftime wins an NFL game. We’ll be accounting for who is winning at the half. The data is the last 10 NFL season results, collected from the nflfastR package and the load_pbp() function. pbp <- ...

7586 sym Python (12378 sym/15 pcs) 6 img 4 tbl

Is it easier to kick field goals in indoor stadiums in the NFL?

21.11.2024

Introduction In the NFL, teams can score points in three ways: Safety: Two points Occurs less than 1% of possessions Field Goal: Three points Occurs about 40% of teams’ offensive possessions Touchdown: Six points Occurs about 20% of teams’ offensive possessions A field goal attempt occurs when a team attempts to kick the ball through go...

13180 sym R (26504 sym/27 pcs) 8 img 7 tbl

Is there an advantage kicking field goals in domes vs stadiums in the NFL?

19.11.2024

Introduction In the NFL, teams can score points in three ways: Safety: Two points Occurs less than 1% of possessions Field Goal: Three points Occurs about 40% of teams’ offensive possessions Touchdown: Six points Occurs about 20% of teams’ offensive possessions A field goal attempt occurs when a team attempts to kick the ball through go...

11843 sym 8 img 7 tbl

DS 2870 - Homework 7 - Solutions

18.11.2024

Data Description: The used cars.csv file has information about 1000 randomly sampled used sedans (4 door cars) in 2021. The variables are: manufactor: The company that makes the car model: The model of the car price: The sale price of the used car (our response variable) year: The year are the car was manufactured age: The age of the car when i...

6069 sym 4 img 1 tbl

DS 2870: Homework 6 Solutions - Fall 2024

04.11.2024

Set up Logistic regression For our example of objective functions, we looked a simple linear regression since it is a very common machine learning method. Another common machine learning method is logistic regression, which attempts to estimate the probability of success of a binary (categorical with two outcome) variable. While we won’t be ...

5671 sym Python (3799 sym/18 pcs) 1 img

DS 2870: Homework 5 - Fall 2024 - Solutions

17.10.2024

Reading in the data The code chunk below will create two data sets: strava_full: A data set on 149 recorded bike activities with two columns date: The date of the activity. A date can appear multiple times if there were multiple activities on the same day distance: The total distance of the trip in kilometers (km). by_day: A data set with one...

1782 sym Python (3650 sym/5 pcs) 2 img

DS 2870: Gradient Descent

17.10.2024

Minimizing a second level polynomial We’ll start by minimizing a simple function: \[f(x) = (x - 2)^2\] Since the whole term is squared, the minimum value of \(f(x)\) is 0, which occurs when \(x = 2\) We can also graph it: But what can we do if we can’t graph the function, or it the answer can’t be calculated directly? We can use gradient de...

7328 sym 10 img 1 tbl

DS 2870: Module 4 Homework Key - Fall 2024

07.10.2024

Data Description The movies data set has 44010 rows about the amount of explicit content (drugs, language, sex, nudity, and violence) found in 1467 movies released since 1958. Each movie is represented by 30 rows (1 row = movie & tag_name type combo). The relevant variables in the data set are: imdb_id: The identifier used by IMDB to uniquely ...

4345 sym Python (7454 sym/11 pcs) 1 img