Publications by Habib Khan
Data 621 - HW 1
Executive Summary The team has developed a model using historical baseball data to determine a team’s performance based on statistics of their performance. While correlation does not equal causation it is suggested that a focus on some of the variables such as a focus on either single hits or triple or more hits to the exclusion of doubles migh...
18747 sym R (23719 sym/40 pcs) 14 img
Data 624 - HW4
Exercise 3.1 The UC Irvine Mache Learning Repository contains a data set related to glass identification. The data consist of 214 glass samples labeled as one of seven class categories. There are nine predictors, including the refractive index and percentages of eight elements: Na, Mg, Al, Si, K, Ca, Ba, and Fe. The data can be accessed via: libr...
4760 sym R (17722 sym/39 pcs) 9 img
Data 621 - HW2
Description In this assignment we created R functions to calculate several different classification metrics as R functions from base R commands. We also verified the functions by checking R package implementations against our output. Lastly, we graphed the output of the classification model Dataset The data set was provided by the professor. Fir...
8609 sym R (6158 sym/47 pcs) 3 img
Data 624 - HW5
Exercise 7.1 Consider the pigs series - the number of pigs slaughtered in Victoria each month. summary(pigs) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 33873 79080 91662 90640 101493 120184 str(pigs) ## Time-Series [1:188] from 1980 to 1996: 76378 71947 33873 96428 105084 ... pigs ## Jan Feb Mar Apr Ma...
7308 sym R (12926 sym/59 pcs) 14 img
Data 621 - HW3
Overview In this homework assignment, you will explore, analyze and model a data set containing information on crime for various neighborhoods of a major city. Each record has a response variable indicating whether or not the crime rate is above the median crime rate (1) or not(0). Your objective is to build a binary logistic regression model o...
17698 sym R (14802 sym/14 pcs) 11 img
Data 624 - HW6
Exercise 8.1 (a) The major difference between these three figures is sample size which were 36, 360 and 1000 random numbers. ACF is biggest with 36 sample size as compared with 360 and 1000 which becomes very small. Each autocorrelation is expected to be close to 0 for white noise series. Also, there is no pattern and spikes are all within the r...
6296 sym R (6155 sym/66 pcs) 30 img
Data 624 - Project 1
Objective In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain ...
15826 sym R (19920 sym/92 pcs) 31 img
Data 621 - Blog 5
Ridge Regression Method Ridge regression is a way of creating a model when predictors exceed the number of observations or when a data has high correlations among each other. Ridge regression penalizes the model if a predictor is less significant and thus avoids over fitting. It uses ridge estimator as a shrinkage estimator that shrinks the param...
747 sym R (3133 sym/16 pcs) 1 img
Data 621 - Blog 3
A usage of ANOVA In this blog, I am going to use ANOVA to see the average difference of multiple groups. ANOVA is useful when you have to check the average across different groups. It verifies the average difference statistically across the groups. It is a very useful tool in different areas. I have been using it to check the difference on supply...
819 sym R (3637 sym/6 pcs) 1 img
Data 624 - HW 8
library(mlbench) library(AppliedPredictiveModeling) library(caret) library(kableExtra) library(tidyverse) Exercise 7.2 Friedman (1991) introduced several benchmark datasets created by simulation. One of these simulations used the following non-linear equation to create data: \[ y=10sin(πx1x2)+20(x3−0.5)2+10x4+5x5+N(0,σ2) \] where the x...
5610 sym R (17303 sym/47 pcs) 2 img