Publications by V Patel

Blog5: Dummy variable

22.05.2020

## load library if (!require("fastDummies")) install.packages("fastDummies") if (!require("tidyverse")) install.packages("tidyverse")# Metapackge Introduction Dummy variables (or binary variables) are commonly used in statistical analyses and in more simple descriptive statistics. A dummy column is one which has a value of one when a categoric...

1088 sym R (1360 sym/5 pcs)

Blog3

22.05.2020

## load library if (!require("ggplot2")) install.packages("ggplot2") if (!require("gridExtra")) install.packages("gridExtra") Introduction In the situation the response variable is based “yes”/“no” responses, such as whether a particular restaurant is recommended by being included in a prestigious guide. Ideally such responses follow a...

1247 sym R (1352 sym/6 pcs) 2 img

Blog4

21.05.2020

## load library if (!require("ggplot2")) install.packages("ggplot2") if (!require("gridExtra")) install.packages("gridExtra") if (!require("plotROC")) install.packages("plotROC") Introduction The Receiver Operating Characteristic (ROC) curve is used to assess the accuracy of a continuous measurement for predicting a binary outcome. In medicin...

2660 sym R (746 sym/8 pcs) 4 img

Blog2

21.05.2020

Introduction Missing data can be a not so trivial problem when analysing a dataset and accounting for it is usually not so straightforward either. If the amount of missing data is very small relatively to the size of the dataset, then leaving out the few samples with missing features may be the best strategy in order not to bias the analysis, how...

2875 sym R (1998 sym/12 pcs) 3 img

blog1

21.05.2020

Introduction In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The case of one explanatory variable is called simple linear regression. Extract the data and create the training and testing sample Fo...

2990 sym R (3487 sym/12 pcs) 2 img

Covid-19 Data Dashboard

16.05.2020

Covid-19 Data Dashboard plotly, highcharter crosstalk flexdashboard Worldwide Column Worldwide Column Confirmed Cases : (last updated : 2020-05-14) 4,442,163 Active Cases (last updated : 2020-05-14) 2,551,852 Recovered Cases (last updated : 2020-05-14) 1,587,893 Deceased Cases (last updated : 2020-05-14) 302,418 US Colum...

2843 sym

Document

26.04.2020

Read Data Here, we read the training dataset into a dataframe. insurance_tf_train <- read.csv( "https://raw.githubusercontent.com/charlsjoseph/Data621/master/Data621-Assignment4/insurance_tf_train.csv")[-1] insurance_tf_test <- read.csv("https://raw.githubusercontent.com/charlsjoseph/Data621/master/Data621-Assignment4/insurance_tf_test.csv")[-1]...

404 sym R (16639 sym/19 pcs) 3 img

Final Proposal

24.03.2020

Objective Create a visualization (dashboard) that shows update on Coronavirus disease 2019 (COVID-19). dashboad will include. Overall 1. No of cases by Country Numbers of cases over time Case by country and Age US Vs Other country Recevery rate compare to other country Date case compare to compare to other countery Data Source(s) DATA hub Dat...

2668 sym

Baseball Data - Data Exploration and Preparation

29.02.2020

HOMEWORK #1 Overview: In this homework assignment, you will explore, analyze and model a data set containing approximately 2200 records. Each record represents a professional baseball team from the years 1871 to 2006 inclusive. Each record has the performance of the team for the given year, with all of the statistics adjusted to match the perfor...

5694 sym R (10583 sym/25 pcs) 12 img 1 tbl

Data 608: HW1

10.02.2020

Principles of Data Visualization and Introduction to ggplot2 I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in: inc <- read.csv("https://raw.githubusercontent.com/charleyferrari/CUNY_DATA_608/master/module1/Data/inc5000_data.csv", header= TRUE) And lets preview this...

1456 sym R (4059 sym/10 pcs) 3 img