Publications by Abdelmalek Hajjam/ Monu Chacko
DATA 605 - Discussion 9
Discussion 9 The price of one share of stock in the Pilsdorff Beer Company (see Exercise 8.2.12) is given by Yn on the nth day of the year. Finn observes that the differences Xn = Yn+1 − Yn appear to be independent random variables with a common distribution having mean µ = 0 and variance σ2 = 1/4. If Y1 = 100, estimate the probability that Y...
461 sym R (279 sym/7 pcs)
HW 4 DATA 621 – Business Analytics and Data Mining
In this homework assignment, you will explore, analyze and model a dataset containing approximately 8000 records representing a customer at an auto insurance company. Each record has two response variables. The first responsevariable, TARGET_FLAG, is a 1 or a 0. A “1” means that the person was in a car crash. A zero means that the person was ...
1180 sym R (47225 sym/97 pcs) 8 img
DATA 605 HW 12
The attached who.csv dataset contains real-world data from 2008. The variables included follow. Country: name of the country LifeExp: average life expectancy for the country in years InfantSurvival: proportion of those surviving to one year or more Under5Survival: proportion of those surviving to five years or more TBFree: proportion of the popu...
1923 sym R (3504 sym/18 pcs) 4 img 2 tbl
DATA 605 - Home Work 13
Question 1 Use integration by substitution to solve the integral below. \[ \int{4e^{-7x}dx}\] \[ u=-7x,\text{ }u'=-7x\ dx = -\frac{1}{7}\\ \int{4e^{-7x}}dx = 4 \int{e^u} du * u'\\ -\frac{4}{7}\int{e^u du}\\ -\frac{4}{7} e^u\\ -\frac{4}{7} e^{-7X} \] Question 2 Biologists are treating a pond contaminated with bacteria. The level of conta...
2865 sym R (266 sym/5 pcs)
DATA 605 Discussion 16
Exercises 12.5 - Question 9 (Page 728) Question: \[z = 5x + 2y, x = 2 cos t + 1, y = sin t - 3; t = π= 4\] Solution: t <- pi/4 -10*sin(t) + 2*cos(t) ## [1] -5.656854 ...
128 sym R (48 sym/2 pcs)
Data 622 HW 1
Question 1 Load data data <- read.csv("data622hw1.csv", header = TRUE) Examine the data data[] <- lapply(data, as.factor) head(data) ## X Y label ## 1 5 a BLUE ## 2 5 b BLACK ## 3 5 c BLUE ## 4 5 d BLACK ## 5 5 e BLACK ## 6 5 f BLACK summary(data) ## X Y label ## 5 :6 a:6 BLACK:22 ## 19:6 b:6 BLUE :14 ...
1945 sym R (4062 sym/31 pcs) 5 img
Data 622 Finding the best Model
Part A library(caret) library(pROC) library(tidyverse) library(kableExtra) library(ggplot2) STEP# 0: Pick any two classifiers of (SVM, Logistic, DecisionTree, NaiveBayes). Pick heart or ecoli dataset. Heart is simpler and ecoli compounds the problem as it is NOT a balanced dataset. We pick the heart data data <- read.csv('https://raw.githubu...
6077 sym R (28703 sym/58 pcs) 10 img
Bagging and LOOCV
(A) Run Bagging (ipred) About ipred: Improved Predictors Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error. You can see details here: https://cran.r-project.org/web/packages/ipred/ipred.pdf Data Preparation Here we are...
1447 sym R (6336 sym/33 pcs)
DATA 624 Homework 2 - Forecaster Toolbox
library(fpp) library(fpp2) library(ggplot2) library(knitr) library(kableExtra) library(readxl) Question 3.1 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. usnetelec usgdp mcopper enplanements funcCmpr <- function(data, ylabtext, title, bcttitle){ print(head(data)) print(summa...
3072 sym R (3925 sym/30 pcs) 21 img
DATA 624 Homework 1 - Time Series
library(fpp) library(fpp2) library(ggplot2) library(kableExtra) Question 2.1 Use the help function to explore what the series gold, woolyrnq and gas represent. Use autoplot() to plot each of these in separate plots. What is the frequency of each series? Hint: apply the frequency() function. Use which.max() to spot the outlier in the gold ser...
3932 sym R (1927 sym/33 pcs) 34 img