Publications by Irene Jacob

Data606_Lab 9

29.11.2020

Grading the professor Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related charac...

11742 sym R (11102 sym/32 pcs) 11 img

DATA605 Final Project

23.05.2021

Problem 1. Using R, generate a random variable X that has 10,000 random uniform numbers from 1 to N, where N can be any number of your choosing greater than or equal to 6. Then generate a random variable Y that has 10,000 random normal numbers with a mean of $\mu=\sigma=(N+1)/2$ set.seed(10) N <- 25 #i am choosing N as 25 n <- 10000 X <- ...

5019 sym R (37105 sym/90 pcs) 10 img 4 tbl

DATA 621 Final Project

12.12.2021

set.seed(123) mat_df <- data.frame(matrix(ncol = 3, nrow = 0) ,stringsAsFactors = FALSE) por_df <- data.frame(matrix(ncol = 3, nrow = 0) ,stringsAsFactors = FALSE) 1. Data Exploration mat <- read.csv("https://raw.githubusercontent.com/irene908/DATA621/main/student-mat.csv") por <- read.csv("https://raw.githubusercontent.com/irene908/DATA621...

261 sym R (29067 sym/51 pcs) 10 img

DATA 621 Blog 5

10.12.2021

Blog 5 - One-Hot Encoding Introduction Categorical data refers to variables that are made up of label values like different categories that sometimes have a natural ordering to them. Some machine learning algorithms can work directly with categorical data depending on implementation, such as a decision tree, but most require the variables to be ...

1492 sym R (662 sym/3 pcs)

DATA 621 Blog 4

13.11.2021

Blog 4 Forward Selection In my previous blog post I discussed about Backward elimination. In this post I will be discussing about Forward Selection using the same train dataset from assignment 3. Forward selection typically begins with only an intercept. One tests the various variables that may be relevant, and the ‘best’ variable—where �...

927 sym R (946 sym/4 pcs)

DATA 621 Blog 3

13.11.2021

Blog 3 Backward elimination BACKWARD STEPWISE REGRESSION is a stepwise regression approach that begins with a full (saturated) model and at each step gradually eliminates variables from the regression model to find a reduced model that best explains the data. Also known as Backward Elimination regression. The stepwise approach is useful because ...

841 sym R (4943 sym/4 pcs)

DATA621 Assignment 1

04.11.2021

Assignment 1 1. Data Exploration train <- read.csv("https://raw.githubusercontent.com/irene908/DATA621/main/moneyball-training-data.csv") %>%select(-INDEX) test <- read.csv("https://raw.githubusercontent.com/irene908/DATA621/main/moneyball-evaluation-data.csv") %>%select(-INDEX) dim(train) ## [1] 2276 16 summary(train) ## TARGET_WINS T...

617 sym R (21755 sym/46 pcs) 12 img

DATA 621 Blog 2

31.10.2021

Blog 2 Poisson Regression Poisson regression is used to model count variables. Poisson regression is similar to regular multiple regression except that the dependent (Y) variable is an observed count that follows the Poisson distribution. Thus, the possible values of Y are the nonnegative integers: 0, 1, 2, 3, and so on. It is assumed that large...

1474 sym R (1242 sym/6 pcs) 1 img

DATA 621 Blog 1

31.10.2021

Blog 1 Model Comparisons This blog describes a few model comparison packages using an example. Data blog1 <- read.csv("https://raw.githubusercontent.com/irene908/DATA621/main/Blog1.csv") Creating a copy of the data to store the log of PrizeMoney Logblog1 <- blog1 Logblog1$logPrizeMoney <- log(blog1$PrizeMoney) Logblog1$PrizeMoney <- NULL St...

983 sym R (3642 sym/11 pcs) 2 tbl

DATA608_Homework1

12.09.2021

Principles of Data Visualization and Introduction to ggplot2 I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in: inc <- read.csv("https://raw.githubusercontent.com/charleyferrari/CUNY_DATA_608/master/module1/Data/inc5000_data.csv", header= TRUE) And lets preview this...

1662 sym R (7068 sym/19 pcs) 4 img 4 tbl