Publications by V Patel
Blog5: Dummy variable
## load library if (!require("fastDummies")) install.packages("fastDummies") if (!require("tidyverse")) install.packages("tidyverse")# Metapackge Introduction Dummy variables (or binary variables) are commonly used in statistical analyses and in more simple descriptive statistics. A dummy column is one which has a value of one when a categoric...
1088 sym R (1360 sym/5 pcs)
Blog3
## load library if (!require("ggplot2")) install.packages("ggplot2") if (!require("gridExtra")) install.packages("gridExtra") Introduction In the situation the response variable is based “yes”/“no” responses, such as whether a particular restaurant is recommended by being included in a prestigious guide. Ideally such responses follow a...
1247 sym R (1352 sym/6 pcs) 2 img
Blog4
## load library if (!require("ggplot2")) install.packages("ggplot2") if (!require("gridExtra")) install.packages("gridExtra") if (!require("plotROC")) install.packages("plotROC") Introduction The Receiver Operating Characteristic (ROC) curve is used to assess the accuracy of a continuous measurement for predicting a binary outcome. In medicin...
2660 sym R (746 sym/8 pcs) 4 img
Blog2
Introduction Missing data can be a not so trivial problem when analysing a dataset and accounting for it is usually not so straightforward either. If the amount of missing data is very small relatively to the size of the dataset, then leaving out the few samples with missing features may be the best strategy in order not to bias the analysis, how...
2875 sym R (1998 sym/12 pcs) 3 img
blog1
Introduction In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The case of one explanatory variable is called simple linear regression. Extract the data and create the training and testing sample Fo...
2990 sym R (3487 sym/12 pcs) 2 img
Covid-19 Data Dashboard
Covid-19 Data Dashboard plotly, highcharter crosstalk flexdashboard Worldwide Column Worldwide Column Confirmed Cases : (last updated : 2020-05-14) 4,442,163 Active Cases (last updated : 2020-05-14) 2,551,852 Recovered Cases (last updated : 2020-05-14) 1,587,893 Deceased Cases (last updated : 2020-05-14) 302,418 US Colum...
2843 sym
Document
Read Data Here, we read the training dataset into a dataframe. insurance_tf_train <- read.csv( "https://raw.githubusercontent.com/charlsjoseph/Data621/master/Data621-Assignment4/insurance_tf_train.csv")[-1] insurance_tf_test <- read.csv("https://raw.githubusercontent.com/charlsjoseph/Data621/master/Data621-Assignment4/insurance_tf_test.csv")[-1]...
404 sym R (16639 sym/19 pcs) 3 img
Final Proposal
Objective Create a visualization (dashboard) that shows update on Coronavirus disease 2019 (COVID-19). dashboad will include. Overall 1. No of cases by Country Numbers of cases over time Case by country and Age US Vs Other country Recevery rate compare to other country Date case compare to compare to other countery Data Source(s) DATA hub Dat...
2668 sym
Baseball Data - Data Exploration and Preparation
HOMEWORK #1 Overview: In this homework assignment, you will explore, analyze and model a data set containing approximately 2200 records. Each record represents a professional baseball team from the years 1871 to 2006 inclusive. Each record has the performance of the team for the given year, with all of the statistics adjusted to match the perfor...
5694 sym R (10583 sym/25 pcs) 12 img 1 tbl
Data 608: HW1
Principles of Data Visualization and Introduction to ggplot2 I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in: inc <- read.csv("https://raw.githubusercontent.com/charleyferrari/CUNY_DATA_608/master/module1/Data/inc5000_data.csv", header= TRUE) And lets preview this...
1456 sym R (4059 sym/10 pcs) 3 img