Publications by Priyank Goyal
PCA_Commuter_Case
Case Commuters Reading Data: this is an spss file library(foreign) raqData1 <- read.spss("COMMUTER data for Factor Analysis.sav",to.data.frame = TRUE) ## re-encoding from UTF-8 raqData2 <- raqData1[,-c(1,2,3,4)] We need to convert to numeric. First we need to determine the order # Specify that they are ordinal variables with the given levels ra...
1986 sym R (25373 sym/39 pcs) 2 img
Factor Analysis - Two Wheeler
Case Twenty two wheeler users were surveyed about the perceptions and image attributes of the vehicles they owned. Ten questions were asked to each of them, all answered on a scale of 1 to 7 ( 1= completely agree, 7 = completely disagree) I use a two-wheeler because it is affordable. It gives me a sens of freedom to own a two wheeler. Low mainte...
3361 sym R (16049 sym/36 pcs) 3 img
PCA_Detailed_Working
Sample Size With a little help from a few lecturer friends I collected 2571 completed questionnaires (at this point it should become apparent that this example is fictitious). …In short, their study indicated that as communalities become lower the importance of sample size increases. With all communalities above .6, relatively small samples (le...
6148 sym R (34521 sym/32 pcs) 1 img
Principal Component Analysis-1
PCA method is particularly useful when the variables within the data set are highly correlated. The Main purpose of PCA is 1. Identify hidden patterns in a data set. 2. reduce the dimensionality by removing noise and redundancy in the data 3. Identify correlated variables library(FactoMineR) library(factoextra) ## Loading required package: ggplot...
7482 sym R (10638 sym/41 pcs) 12 img
LR Categorical Variables
the categorical variables are recoded into a set of separate binary variables. This recoding is called “dummy coding” and leads to the creation of a table called contrast matrix. load packages library(tidyverse) ## ── Attaching packages ──────────────────────────────────�...
4727 sym R (3258 sym/24 pcs)
Interaction Effects in MLR
The equation for example sales = b0 + b1* youtube + b2* facebook is known as additive model. It assumes that there is no relationship between predictors. This assumption may not be true. eg. spending money on facebook ad might increase the effectiveness of Youtube advertising. In statistics, it is called as interaction effect In that case the equ...
2082 sym R (2863 sym/26 pcs)
Linear Regression- Lesson 1
A simple workflow to build to build a predictive regression model is as follow: Randomly split your data into training set (80%) and test set (20%) Build the regression model using the training set Make predictions Loading Required Packages library("tidyverse") ## ── Attaching packages ──────────────────�...
4061 sym R (4753 sym/34 pcs) 1 img
Random Forest: Earnings Manipulation
About Earning Manipulation Case The object of the study is to analyze the 8 features of different companies in order to evaluate if the companies have actually manipulated their earnings. This is a typical case of data imbalance. Read CSV raw_data <- read.csv("fraud_data.csv", head=TRUE,na.strings=c("", " ", "NA"), sep=",") ...
1397 sym R (8366 sym/48 pcs) 2 img
Probit Model
loading libraries library(aod) ## Warning: package 'aod' was built under R version 3.6.3 library(ggplot2) ## Registered S3 methods overwritten by 'ggplot2': ## method from ## [.quosures rlang ## c.quosures rlang ## print.quosures rlang Reading Data mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")...
5543 sym R (3842 sym/29 pcs) 1 img
Two Sample Variance Test
Note that F-test requires two samples to be normally distributed Compute F Test in R The R function var.test() can be used to compare two variances as follow: Method 1 var.test(values ~ groups, data, alternative = “two.sided”) or Method 2 var.test(x, y, alternative = “two.sided”) x,y: numeric vectors alternative: the alternative hypo...
2066 sym R (2008 sym/22 pcs)