Publications by Mael Illien

data621hw3

26.10.2020

Data 621 Homework3 Introduction For this assignment, we were tasked with building a binary logistic regression model from a dataset containing information on crime in various neighborhoods of a major city. Given a vector of predictors, we seek to predict whether the neighborhood crime rate is above the median. data_train <- read.csv("https://raw...

10873 sym R (20489 sym/79 pcs) 20 img

data621finalproject

21.12.2020

Abstract There are a few ways of predicting stock prices. From blind guessing to machine learning, many techniques have been tested to try to achieve decent results. Capturing the complexity of the market in a model is a daunting task but it can be broken down in a number of approaches. Since predicting the exact percent increase of a stock the n...

23393 sym R (22134 sym/84 pcs) 13 img

data621blog3

16.12.2020

Poisson Regression Sports generate multitudes of data. Sports data analysis is useful not only for coaches trying to understand the relative strength or weakness of teams, but also for avid betters around the world looking for an edge. Given the popularity of fantasy sports and betting sites, regression might provide a valuable tool in predicting...

12354 sym R (5145 sym/11 pcs) 1 img

data621blog2

16.12.2020

Multiple Regression A paramount concern in agriculture is maximizing crop production and regression analysis can be used to assist in solving that problem in a number of ways. We can use it to answer questions such as: What are the environmental and meteorological factors that influence crop yield? Given meteorological and/or environmental/spati...

4117 sym R (778 sym/2 pcs) 4 img

data621blog1

16.12.2020

Simple Linear Regression One of the most important aspects of the design of a construction project is the budget. When assembling a budget, cost estimates for the major trade are often provided by contractors based on the information available. Estimating these costs involves extracting quantities of materials from plans and determining how many ...

3459 sym R (3359 sym/9 pcs) 2 img 3 tbl

data621blog4

16.12.2020

Binary Logistic Regression Marketing campaigns can be expensive. Knowing which customers to target is essential in order to conduct an efficient campaign. Given information about a client, can we predict whether that client will respond positively to the product being marketed? In the case presented, the target variable (the client’s response) ...

4343 sym R (8859 sym/6 pcs) 1 img 4 tbl

data621blog5

16.12.2020

Time Series Regression Time series introduce a challenge that violate the assumptions of most regression techniques, that the errors are independent. With time series, we deal with serially correlated errors. This means that the error in a predictor is related to a previous time value. In this exercise we are working with a time series of the wat...

2861 sym R (2907 sym/16 pcs) 4 img

Poisson Regression Soccer Predictions

13.02.2021

Poisson Regression Sports generate multitudes of data. Sports data analysis is useful not only for coaches trying to understand the relative strength or weakness of teams, but also for avid betters around the world looking for an edge. Given the popularity of fantasy sports and betting sites, regression might provide a valuable tool in predicting...

4351 sym R (5145 sym/11 pcs) 1 img 6 tbl

data622hw1

17.02.2021

Logisitic and Multinomial Regression Setup library(skimr) library(tidyverse) library(caret) # For classification report library(pROC) # For AUC calculation library(nnet) # For multinomial regression Data Exploration The penguin dataset is composed of 344 observations with 8 variables, 5 of which are numeric and 3 which are qualitative. The data...

7419 sym R (12460 sym/37 pcs) 5 img 3 tbl

data622hw2

11.03.2021

LDA, QDA & Naive Bayes Setup library(skimr) library(tidyverse) library(caret) # For featureplot, classification report library(MASS) # For LDA, QDA library(e1071) # For Naive Bayes library(corrplot) # For correlation matrix library(klaR) # For QDA partition plot Data Exploration The penguin dataset is composed of 344 observations with 8 variabl...

6641 sym R (7966 sym/20 pcs) 10 img 3 tbl