Publications by Gregg Maloy

Document

23.03.2025

Assignment Introduction In Machine Learning, Experimentation refers to the systematic process of designing, executing, and analyzing different configurations to identify the optimal settings that performs best on a given task. Experimentation is learning by doing. It involves systematically changing parameters, evaluating results with metrics,...

24294 sym R (39783 sym/72 pcs) 16 img 4 tbl

Data 622 Assignment 1

22.02.2025

Assignment Introduction This assignment focuses on one of the most important aspects of data science, Exploratory Data Analysis (EDA). Many surveys show that data scientists spend 60-80% of their time on data preparation. EDA allows you to identify data gaps & data imbalances, improve data quality, create better features and gain a deep understand...

20902 sym R (17318 sym/37 pcs) 8 img 1 tbl

Data 621 Blog 5

30.11.2024

Introduction In this blog, I will demonstrate how to use the random forest model to predict the species of iris flowers in the popular ‘Iris’ dataset. Random forest is a machine learning algorithm that constructs multiple decision trees and combines their outputs to improve accuracy and reduce overfitting. Load Packages We will use the ra...

5760 sym R (3778 sym/12 pcs) 1 img

Blog_4

23.11.2024

Introduction In this blog, I will demonstrate how to use the hypergeometric distribution for hypothesis testing. Hypergeometric distribution is a probability distribution which describes the probability of success K after n draws without replacement from a fixed population N. Below, the probability mass function will provide the probability of ...

3803 sym Python (918 sym/11 pcs)

Data 621 Blog 3

02.11.2024

Introduction In this blog, I aim to explore when to use Zero-Inflated Poisson (ZIP) and Zero-Inflated Negative Binomial (ZINB) models. More specifically, I will outline the steps involved in determining which of these models provides the best fit for a given dataset. The data used in this blog comes from the AER package which includes a datase...

4694 sym R (5229 sym/10 pcs) 2 img 1 tbl

Comparison of Ridge and Lasso Regression in Predicting Diabetes Disease Progression

06.10.2024

Instructions This blog explores the differences between Lasso and Ridge regression in predicting diabetes disease progression. The dataset used for this analysis is commonly employed for practicing these regression techniques and is available in various packages, such as LARS. PART I: DOWNLOAD DATA & INSPECTION There are 442 rows/observations ...

4646 sym R (5109 sym/33 pcs) 2 img

Data 621 HW 2

04.10.2024

Instructions In this homework assignment, you will work through various classification metrics. You will be asked to create functions in R to carry out the various calculations. You will also investigate some functions in packages that will let you obtain the equivalent results. Finally, you will create graphical output that also can be used t...

5297 sym R (7503 sym/31 pcs) 2 img 2 tbl

Data 605 Final

19.05.2024

House Prices: Advanced Regression Techniques DATA 605 Final Gregg Maloy PART I: SETUP Pick one of the quantitative independent variables from the training data set (train.csv) , and define that variable as X. Make sure this variable is skewed to the right! Pick the dependent variable and define it as Y. library(readr) library(ggplot2) library(...

10616 sym R (21752 sym/68 pcs) 6 img

Data 605 Assignment Week 15

10.05.2024

Question 1 Find the equation of the regression line for the given points. Round any final values to the nearest hundredth, if necessary. ( 5.6, 8.8 ), ( 6.3, 12.4 ), ( 7, 14.8 ), ( 7.7, 18.2 ), ( 8.4, 20.8 ). #place in df x <- c(5.6, 6.3, 7, 7.7, 8.4) y <- c(8.8, 12.4, 14.8, 18.2, 20.8) df <- data.frame(x, y) df #lm/coefficent/slope model ...

2987 sym 1 img

Data 605 Discussion Week 14

25.04.2024

In Exercises 24, write out the first 5 terms of the Binomial series with the given k-value. 24. k = 4 library(Ryacas0) x <- Sym("x") expression <- (1 + x)^4 expression_final <- Expand(expression) print(expression_final) ## yacas_expression(x^4 + 4 * x^3 + 6 * x^2 + 4 * x + 1) term 1: x^4 term 2: 4x^3 term 3: 6x^2 term 4: 4x term 5: 1...

171 sym R (177 sym/2 pcs)