Publications by Andrew Bowen
DATA 622: Homework 4
The data I’ll be using comes from the Radiation Exposure Monitoring System (REMS) query tool provided by the Department of Energy (DOE). It provides annual aggregations across all DOE sites of radiation exposure levels across different types of workers. Reducing radiological exposure is a goal for the Department of Energy, and being able to ident...
5609 sym Python (22964 sym/50 pcs) 7 img
DATA 624 Homework 10
library(tidyverse) library(caret) library(arules) # Read in file from BB posting data <- read.transactions("https://learn-us-east-1-prod-fleet02-xythos.content.blackboardcdn.com/61aab133e7df2/15994286?X-Blackboard-S3-Bucket=learn-us-east-1-prod-fleet01-xythos&X-Blackboard-Expiration=1714154400000&X-Blackboard-Signature=EXNbsp64188aDjV5ghvpIntldSZyo...
1661 sym R (8717 sym/15 pcs) 1 img
DATA 622: Homework 3
Articles The articles linked are using different means to the same end: predicting Covid-19 using machine learning algorithms. The Ahmad paper relies on decision trees to predict the presence of Covid-19, while the Guhathakurata paper relies on support vector machines (SVMs) to predict the presence of the disease. At its core, both papers undertake...
4222 sym
DATA 624: Homework 9
library(tidyverse) library(caret) library(party) library(randomForest) library(gbm) Exercise 8.1 First, let’s recreate the data as they do in K&J library(mlbench) set.seed(200) simulated <- mlbench.friedman1(200, sd = 1) simulated <- cbind(simulated$x, simulated$y) simulated <- as.data.frame(simulated) colnames(simulated)[ncol(simulated)] <- "y...
2630 sym R (13647 sym/79 pcs) 1 img
DATA 624: Homework 8
library(tidyverse) library(MASS) library(caret) library(mlbench) library(xgboost) library(GGally) library(e1071) library(corrplot) Exercise 7.2 (K&J) Let’s create the simulated data from Friedman using the same bit of code to generate the simulated data. We’ll also create the feature distribution plot in the same manner as the text. set.seed(2...
3047 sym R (24860 sym/55 pcs) 5 img
DATA 622: Homework 2
For this assignment, our classification task will be to try to predict whether the home team or away team will win a given NFL game, based on the respective ELO ratings prior to the game, along with other features included in the Kaggle dataset. We’ll be using the NFL ELO rating dataset we used in homework 1. This dataset comes from Kaggle, and c...
4956 sym 3 img
DATA 624: Homework 7
library(tidyverse) library(AppliedPredictiveModeling) library(caret) library(pls) library(GGally) library(questionr) library(corrplot) Exercixe 6.2 (K&J) First, let’s load the data data(permeability) First, we’ll filter out any sparse predictors with the caret::nearZeroVar fucntion filteredPredictors <- nearZeroVar(fingerprints) length(filter...
2342 sym R (8268 sym/26 pcs) 5 img
DATA 624: Project 1
Part A In this part, we’ll be attempting to forecast how much cash is taken out of 4 different ATM machines for May 2010. Our dataset is provided for us, and available on my GitHub. Data Wrangling and Visualization Our DATE column looks to be the number of days since 1900, so we # Read in our ATM data atm <- read_excel("../data/ATM624Data.xlsx")...
4832 sym Python (4712 sym/34 pcs) 14 img
DATA 624: Homework 6
Exercise 9.1 This set of correlograms shows white noise for different sets of random numbers. The dashed blue lines represent the level at which the autocorelation (y-axis) becomes significantly different than 0. These plots do not have any lag periods (x-axis) past the blue dashed line, which means that For plot a, we do see a more storng auto-cor...
4985 sym 24 img
DATA 622 Homework 1
Introduction Sport gambling has become a lucrative industry in the United States since the Supreme Cuurt’s 2018 ruling legalizing it. While not a gambler myself, I am an avid sports fan. I was able to pull two datasets (one small and one large) rleated to both the National Football League Game results in the Super Bowl Era (since 1967) and NBA in...
6949 sym Python (8048 sym/27 pcs) 9 img