Publications by Amit Kapoor

Data 607- TidyVerse Extend

19.04.2020

stringr: a package used to manipulate strings Ordering Strings Combining Strings Replacing Strings Get the Length of a String Extended By Amit Kapoor str_which() str_match() str_pad() str_dup() str_detect() str_locate() Conclusion Devin Teran - Extended By :: Amit Kapoor 4/16/2020 Getting started First we need to load these packages: tidyve...

6806 sym R (7384 sym/41 pcs)

Data 605 - Assignment 11

20.04.2020

Using the “cars” dataset in R, build a linear model for stopping distance as a function of speed and replicate the analysis of your textbook chapter 3 (visualization, quality evaluation of the model, and residual analysis.) library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## fi...

3148 sym R (2406 sym/25 pcs) 6 img

Data607-Project4

26.04.2020

Data607 - Project 4 Amit Kapoor 4/22/2020 Document Classification Introduction For Project 4 we will try to classify new “test” documents using already classified “training” documents. A common example is using a corpus of labeled spam and ham (non-spam) e-mails to predict whether or not a new document is spam. For this project, we used...

4613 sym R (7516 sym/53 pcs) 3 img

Data608HW1

06.09.2020

Principles of Data Visualization and Introduction to ggplot2 library(dplyr) library(ggplot2) I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in: inc <- read.csv("https://raw.githubusercontent.com/charleyferrari/CUNY_DATA_608/master/module1/Data/inc5000_data.csv", hea...

1456 sym R (7708 sym/23 pcs) 3 img

Data 608 - Final Project

09.12.2020

Data 608 - Final Project Amit Kapoor 12/01/2020 OpenFlights Introduction Airlines industry has been a major mode of transportation within any country or across the countries around the globe now. Though it involves its strict guidelines for airport operations, flights, their routes and all, every other country is now substantially looking to in...

8611 sym R (26582 sym/32 pcs) 8 img

Data624 - Homework1

14.02.2021

library(fpp2) Exercise 2.1 Use the help function to explore what the series gold, woolyrnq and gas represent. ?gold ?woolyrnq ?gas gold: Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989. woolyrnq: Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994. gas: Australian monthly gas production: 1...

5088 sym R (4467 sym/60 pcs) 32 img

Data624 - Homework3

27.02.2021

library(fpp2) library(dplyr) library(gridExtra) library(seasonal) 6.2 The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years. ?plastics plastics - Monthly sales of product A for a plastics manufacturer. glimpse(plastics) ## Time-Series [1:60] from 1 to 5.92: 742 697 776 898 103...

2257 sym R (3080 sym/15 pcs) 9 img

Data624 - Homework2

21.02.2021

library(fpp2) library(gridExtra) 3.1 For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. # function to draw 2 plots: original and with BoxCox transformation plot_timeseries <- function(timeseries) { lambda <- BoxCox.lambda(timeseries) ts_original <- autoplot(timeseries) + ggtitle(sub...

2714 sym R (6110 sym/30 pcs) 14 img

Data624 - Project1

11.04.2021

Overview This project includes 3 time series dataset and requires to select best forecasting model for all 3 datasets. Part A - ATM Forecast Part B - Forecasting Power Part C - Waterflow Pipe Part A - ATM Forecast The dataset contains cash withdrawals from 4 different ATM machines from May 2009 to Apr 2010. The variable ‘Cash’ is provided i...

17785 sym R (43401 sym/186 pcs) 40 img 6 tbl

Data624 - Homework6

26.03.2021

Exercise 8.1 Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers, and 1,000 random numbers. a Explain the differences among these figures. Do they all indicate that the data are white noise? As sample size increases (from 36 random number to 360 random numbers and then 1,000 random numbers), the correlation tends to 0.The ACF b...

12617 sym R (12550 sym/96 pcs) 31 img