Publications by Aranzazu Chaparro Molina
Dow Jones Case Study
DATA dowjones = read.table("dow_jones_index.data", sep = ",", header = TRUE) str(dowjones) ## 'data.frame': 750 obs. of 16 variables: ## $ quarter : int 1 1 1 1 1 1 1 1 1 1 ... ## $ stock : chr "AA" "AA" "AA" "AA" ... ## $ date : chr "1/7/2011" "1/14/2...
2540 sym R (33311 sym/52 pcs) 8 img
Data Visualizations in R - Homework 3
Before you begin, note that, in the header, the output format of this document is html_notebook. When you save this file, it automatically creates another file with the same file name but with .nb.html extension in the same directory. This is the file you will submit as your homework solution along with the .Rmd file. Warnings: Don’t delete t...
11557 sym R (61 sym/1 pcs)
Bookbinders Marketing Case Study
Abha Jha, Aranzazu Chaparro, Mihreteab Teklehaimanot, Sashank Nalluri, Ujwala Sirigineedi Load Data book_train <- read_excel("BBBC-Train.xlsx") book_test <- read_excel("BBBC-Test.xlsx") book_train = book_train[,-1] book_test = book_test[,-1] str(book_train) ## tibble [1,600 x 11] (S3: tbl_df/tbl/data.frame) ## $ Choice : num [1:16...
2228 sym R (24845 sym/79 pcs) 4 img
Bank Marketing Case
Load the file bank <- read_delim("bank-additional.csv", delim = ";", escape_double = FALSE, trim_ws = TRUE) ## Rows: 4119 Columns: 21 ## -- Column specification -------------------------------------------------------- ## Delimiter: ";" ## chr (11): job, marital, education, default, housing, loan, contact, month, ...
5553 sym R (41117 sym/94 pcs) 11 img
Assignment 4
Exercise 3 We now review k-fold cross-validation. (a) Explain how k-fold cross-validation is implemented. It divides the data set into k groups (folds) of approximately equal size, just like LOOCV, each fold (group) will be treated as a validation (test) set, and the model gets fit on the remaining k-1 folds (groups). LOOCV and K-Fold Cross Val...
7751 sym R (7568 sym/59 pcs)
Assignment 3
Exercise 10 This question should be answered using the Weekly data set, which is part of the ISLR package. This data is similar in nature to the Smarket data from this chapter’s lab, except that it contains 1, 089 weekly returns for 21 years, from the beginning of 1990 to the end of 2010. library(ISLR) ## Warning: package 'ISLR' was built unde...
8355 sym R (23170 sym/144 pcs) 3 img
Assignment 2
Exercise 2 Carefully explain the differences between the KNN classifier and KNN regression methods. The main difference between them is that the KNN classifier method is a method used for discrete (categorical) response variables, and the KNN regression method is used more for a continuous response variable. Also the main goal of the KNN regres...
6358 sym R (12321 sym/51 pcs) 3 img
Data Visualizations in R - Homework 2
Homework 2 is all about using ggplot2. You will use tech_co_cstat.dta (or .zip) data set you have used previously in Homework 1. You are aware of its structure and the meanings of the variables. Recall that you explored it in Homework 1. Knowing your data set well before you start exploring it is absolutely essential for data science. # Both .zip...
5585 sym R (2886 sym/12 pcs) 7 img
Assignment 5
library(ISLR) ## Warning: package 'ISLR' was built under R version 4.1.2 library(MASS) library(caTools) ## Warning: package 'caTools' was built under R version 4.1.2 library(glmnet) ## Warning: package 'glmnet' was built under R version 4.1.3 ## Loading required package: Matrix ## Loaded glmnet 4.1-3 library(pcr) ## Warning: package 'pcr' was bu...
5718 sym R (15344 sym/92 pcs) 8 img
Assignment 08
library(ISLR) library(tidyverse) library(plotly) library(e1071) library(ISLR) library(caret) Exercise 5 We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary.We will now see that we can also obtain a non-linear decision boundary by performing logistic regression ...
6271 sym R (12509 sym/71 pcs) 12 img