Publications by Diego Correa
DATA624 - HW2
3 Time Series Decomposition 1 Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time? Based on the graph and calculations, the country with the top GDP per capita is Monaco and Liechtenstein, with Liechenstein having the most...
3702 sym R (4373 sym/27 pcs) 15 img
DATA624- HW1
Chapter 2 - Time Series Graphics 2.1 Use the help function to explore what the series gafa_stock, PBS, vic_elec and pelt represent. Use autoplot() to plot some of the series in these data sets. What is the time interval of each series? ?gafa_stock ## starting httpd help server ... done ?PBS ?vic_elec ?pelt A # a gafa_stock %>% autoplo...
2935 sym R (4529 sym/36 pcs) 13 img
DATA624 - HW3
Chapter 5 - Forecasting: Principles and Practice 1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case: Australian Population (global_economy) Bricks (aus_production) NSW Lambs (aus_livestock) Household wealth (hh_budget). Australian takeaway food turnover (aus_ret...
2661 sym R (8711 sym/53 pcs) 18 img
DATA624 - HW5
Exponential Smoothing 1 Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset. Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of \(\alpha_0\) and \(l_0\), and generate forecasts for the next four months. \(\alpha = 0.322\) \(l_0 = 10064...
3542 sym R (6377 sym/41 pcs) 9 img
DATA624 - HW6
1 Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers. Explain the differences among these figures. Do they all indicate that the data are white noise? The difference among the graph is that the length of the time series is smaller and smaller causing the ACF bounds to become narrower and narrower. Each...
4984 sym R (8307 sym/73 pcs) 19 img
Publish Document
Part 1 - ATM In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Expla...
10357 sym R (22658 sym/156 pcs) 30 img
DATA624 - HW9
8.1 Recreate the simulated data from Exercise 7.2: library(mlbench) set.seed(200) simulated <- mlbench.friedman1(200, sd = 1) simulated <- cbind(simulated$x, simulated$y) simulated <- as.data.frame(simulated) colnames(simulated)[ncol(simulated)] <- "y" Fit a random forest model to all of the predictors, then estimate the variable importance...
4190 sym R (24000 sym/30 pcs) 7 img
DATA624 - HW10
Overview *Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket - and therefore ‘Market Basket Analysis’. That is exactly what the Groceries Data Set contains: a collection of receipts with each line repr...
1419 sym R (17553 sym/209 pcs) 3 img 1 tbl
DATA622 - Final Project
Libraries library(kableExtra) library(tidyverse) library(ggplot2) library(dplyr) library(psych) library(caret) library(mice) library(randomForest) library(caTools) library(corrplot) library(class) library(rpart) library(rpart.plot) library(naniar) library(xgboost) library(usmap) library(DiagrammeR) library(earth) library(plotly)...
21655 sym R (25566 sym/99 pcs) 20 img 15 tbl
DATA608 - HW1
Principles of Data Visualization and Introduction to ggplot2 I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in: inc <- read.csv("https://raw.githubusercontent.com/charleyferrari/CUNY_DATA_608/master/module1/Data/inc5000_data.csv", header= TRUE) And lets preview this...
1468 sym R (4693 sym/11 pcs) 4 img