Publications by Janish Parikh
Data Manipulation in R
Clear the global environment Import Required Libraries library(lubridate) ## ## Attaching package: 'lubridate' ## The following objects are masked from 'package:base': ## ## date, intersect, setdiff, union Lubridate provides simple functions to get and set components of a date-time, such as year(), month(), mday(), hour(), minute() and s...
1599 sym R (94250 sym/60 pcs) 6 img
Statistical inference with the GSS data
Setup Load packages library(ggplot2) library(dplyr) library(statsr) Load data Since 1972, the General Social Survey (GSS) has been monitoring societal change and studying the growing complexity of American society. load("gss.Rdata") gss<-tibble(gss) dim(gss) ## [1] 57061 114 Part 1: Data Background The General Social Survey (GSS) is a s...
10078 sym R (6600 sym/32 pcs) 6 img
Exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database - Health and Economic Impacts
Exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database - Health and Economic Impacts Synopsis Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventin...
4597 sym R (20849 sym/39 pcs) 2 img
What makes a movie popular?
Abstract Our purpose for this exercise is to develop a multiple linear regression model that will explain what makes movies popular given the variables in a dataset that contains information from Rotten Tomatoes, a website that keeps track of all reviews for each films and aggregates the results and Internet Movie Database IMDB, an online databas...
16366 sym R (9577 sym/30 pcs) 10 img
CS-605 Data Analytics Project
Abstract In this project we have developed a Multi-variate Linear Regression Model that will explain the factors that makes a movie popular and help predict movie popularity. The data in the dataset is collected from the following sources: Rotten Tomatoes, a website that keeps track of all reviews for each films and aggregates the results and In...
20073 sym R (32257 sym/78 pcs) 22 img
Prediction of trip-duration of NYC Citibike trips
Loading Required Libraries library(dplyr) library(ggplot2) library(GGally) library(lubridate) library(caret) library(gbm) library(tidyverse) library(caret) Abstract The NYC “CitiBike” bicycle sharing scheme went live (in midtown and downtown Manhattan) in 2013, and has been expanding ever since, both as measured by daily ridership as well as...
7347 sym R (34402 sym/98 pcs) 17 img
CitiBike Data Analysis
Loading Required Libraries library(dplyr) library(ggplot2) library(GGally) library(lubridate) library(caret) library(gbm) library(tidyverse) library(caret) 1 Abstract The NYC “CitiBike” bicycle sharing scheme went live (in midtown and downtown Manhattan) in 2013, and has been expanding ever since, both as measured by daily ridership as well ...
5972 sym R (13739 sym/46 pcs) 3 img
Demystifying Decision Tress
Let’s clean the global environment before moving further rm(list=ls()) cat("\014") Let’s load the dataset: From this Kaggle competition https://www.kaggle.com/c/titanic train_data <- read.csv("~/Downloads/TitanicTrain.csv") test_data <- read.csv("~/Downloads/TitanicTest.csv") Let’s take a look at the dataset str(train_data) ## 'data.frame'...
3273 sym R (9195 sym/44 pcs) 3 img
R Review and Data Formatting
Load Packages library(ggplot2) library(gridExtra) library(lubridate) Let’s clean the global environment before moving further rm(list=ls()) dev.off() ## null device ## 1 cat("\014") Data Types in R Logical Data Type x <- c(12,1,5,18,2,6,NA) is.na(x) ## [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE x[!is.na(x)] ## [1] 12 1 5 18 2...
2520 sym R (5455 sym/84 pcs) 1 img
Introduction to Hypothesis Testing
P-Values and Z-Scores Load Packages library(ggplot2) library(statsr) library(gridExtra) library(lubridate) Let’s clean the global environment before moving further rm(list=ls()) dev.off ## function (which = dev.cur()) ## { ## if (which == 1) ## stop("cannot shut down device 1 (the null device)") ## .External(C_devoff, as.int...
2791 sym R (7117 sym/49 pcs) 7 img