Publications by Mary Anna Kivenson + Elina v2 + Charls
knn
Loading and spliting the data. Lets use iris dataset and split the data into training and testing. library(class) library(caret) ## Loading required package: lattice ## Loading required package: ggplot2 library(knitr) set.seed(1234) ind <- sample(2, nrow(iris), replace=TRUE, prob=c(0.6, 0.3)) trainData <- iris[ind==1,] testData <- iris[ind=...
849 sym R (2698 sym/15 pcs) 1 img 1 tbl
data621-assign2
Step 1. Download the classification output data set. df <- read.csv("https://raw.githubusercontent.com/che10vek/Data621/master/classification-output-data.csv") kable(head(df,10), booktabs = T) pregnant glucose diastolic skinfold insulin bmi pedigree age class scored.class scored.probability 7 124 70 33 215 25.5 0.161 37 0 0 0.3284523 2 122 76...
5998 sym R (6251 sym/42 pcs) 3 img 3 tbl
Handling Missing Values
Introduction One of the major challange while preparing the data for building machine learning model is to correct the missing values. The reason behind the missing data is poor data collection process or data entry error. It is critical to fix such missing values because model building is not possible with Nulls or NAs in the dataset. There are ...
5419 sym R (10703 sym/25 pcs) 9 img 4 tbl
Data621-assign1
Multiple Linear Regression library(MASS) library(caret) library(car) library(corrplot) library(knitr) library(mice) Load the dataset Load the data set that was curated after the preliminary explanatory analysis. Plotted a correlation matrix on the original data set df = read.csv('https://raw.githubusercontent.com/mkivenson/Business-Analytic...
4626 sym R (22966 sym/99 pcs) 12 img 3 tbl
Data621_assign5
Data Exploration Taking a look at a summary of the data, there seem to be many missing values in the ResidualSugar,Chlorides,FreeSulfurDioxide,TotalSulfurDioxide,pH,Sulphates,Alcohol, and STARS fields. The STARS and LabelAppeal columns are both ordinal variables and may need to be transformed into dummy variables. ## TARGET FixedAcidi...
3671 sym R (30314 sym/10 pcs) 4 img 1 tbl
Data624_wk1
#install.packages('fpp2') library(fpp2) ## Loading required package: ggplot2 ## Loading required package: forecast ## Registered S3 method overwritten by 'quantmod': ## method from ## as.zoo.data.frame zoo ## Loading required package: fma ## Loading required package: expsmooth Use the help function to explore what the series go...
3123 sym R (3969 sym/58 pcs) 28 img
Data624_wk6
7.1 ) Consider the pigs series — the number of pigs slaughtered in Victoria each month. a. Use the ses() function in R to find the optimal values of α and ℓ0 , and generate forecasts for the next four months. library(fpp2) ## Loading required package: ggplot2 ## Loading required package: forecast ## Registered S3 method overwritten by 'quan...
4137 sym R (8114 sym/73 pcs) 11 img
Data624_wk4
Data 624 - Week 3 Assignment 6.2 The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years. a. Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle? library(fpp2) ## Loading required package: ggplot2 ## Loading required package: f...
2011 sym R (2508 sym/16 pcs) 7 img
Data624_wk2
Exercise 3.1 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. usnetelec usgdp mcopper enplanements Lets look at usnetelec dataset. This dataset is about the annual US net electricity generation. The frequency is every month library(fpp2) ## Loading required package: ggplot2 ## Loading requir...
5442 sym R (4029 sym/44 pcs) 16 img
Data624_wk5
3.1. The UC Irvine Machine Learning Repository6 contains a data set related to glass identification. The data consist of 214 glass samples labeled as one of seven class categories. There are nine predictors, including the refractive index and percentages of eight elements: Na, Mg, Al, Si, K, Ca, Ba, and Fe. #install.packages('mlbench') library(m...
3254 sym R (3460 sym/23 pcs) 10 img