Publications by Mary Anna Kivenson + Elina v2 + Charls
Loading and spliting the data. Lets use iris dataset and split the data into training and testing. library(class) library(caret) ## Loading required package: lattice ## Loading required package: ggplot2 library(knitr) set.seed(1234) ind <- sample(2, nrow(iris), replace=TRUE, prob=c(0.6, 0.3)) trainData <- iris[ind==1,] testData <- iris[ind=...
Step 1. Download the classification output data set. df <- read.csv("") kable(head(df,10), booktabs = T) pregnant glucose diastolic skinfold insulin bmi pedigree age class scored.class scored.probability 7 124 70 33 215 25.5 0.161 37 0 0 0.3284523 2 122 76...
Handling Missing Values
Introduction One of the major challange while preparing the data for building machine learning model is to correct the missing values. The reason behind the missing data is poor data collection process or data entry error. It is critical to fix such missing values because model building is not possible with Nulls or NAs in the dataset. There are ...
Multiple Linear Regression library(MASS) library(caret) library(car) library(corrplot) library(knitr) library(mice) Load the dataset Load the data set that was curated after the preliminary explanatory analysis. Plotted a correlation matrix on the original data set df = read.csv('
Data Exploration Taking a look at a summary of the data, there seem to be many missing values in the ResidualSugar,Chlorides,FreeSulfurDioxide,TotalSulfurDioxide,pH,Sulphates,Alcohol, and STARS fields. The STARS and LabelAppeal columns are both ordinal variables and may need to be transformed into dummy variables. ## TARGET FixedAcidi...
#install.packages('fpp2') library(fpp2) ## Loading required package: ggplot2 ## Loading required package: forecast ## Registered S3 method overwritten by 'quantmod': ## method from ## zoo ## Loading required package: fma ## Loading required package: expsmooth Use the help function to explore what the series go...
7.1 ) Consider the pigs series — the number of pigs slaughtered in Victoria each month. a. Use the ses() function in R to find the optimal values of α and ℓ0 , and generate forecasts for the next four months. library(fpp2) ## Loading required package: ggplot2 ## Loading required package: forecast ## Registered S3 method overwritten by 'quan...
Data 624 - Week 3 Assignment 6.2 The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years. a. Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle? library(fpp2) ## Loading required package: ggplot2 ## Loading required package: f...
Exercise 3.1 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. usnetelec usgdp mcopper enplanements Lets look at usnetelec dataset. This dataset is about the annual US net electricity generation. The frequency is every month library(fpp2) ## Loading required package: ggplot2 ## Loading requir...
3.1. The UC Irvine Machine Learning Repository6 contains a data set related to glass identification. The data consist of 214 glass samples labeled as one of seven class categories. There are nine predictors, including the refractive index and percentages of eight elements: Na, Mg, Al, Si, K, Ca, Ba, and Fe. #install.packages('mlbench') library(m...
