Publications by Orli Khaimova

DATA 608 | R Notebook | Module 1

13.09.2021

Principles of Data Visualization and Introduction to ggplot2 I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in: inc <- read.csv("https://raw.githubusercontent.com/charleyferrari/CUNY_DATA_608/master/module1/Data/inc5000_data.csv", header= TRUE) And lets preview this...

6486 sym R (3172 sym/12 pcs) 3 img

DATA 624 Homework 5

14.03.2022

8.1 Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset. Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months. fit <- aus_livestock %>% filter(State == "Victori...

7409 sym R (11940 sym/45 pcs) 12 img

DATA 624 Homework 3

28.02.2022

5.1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case: Australian Population (global_economy) Since the population of Australia has an increasing trend, it is best to use the RW(y ~ drift()) method, to show growth in the forecast. global_economy %>% filter(C...

5825 sym R (6169 sym/30 pcs) 18 img

DATA 624 Homework 1

14.02.2022

2.1 Use the help function to explore what the series gafa_stock, PBS, vic_elec and pelt represent. Use autoplot() to plot some of the series in these data sets. autoplot(gafa_stock, Open) + ggtitle("Daily Opening Price for Stocks Traded, 2014 - 2018") It can be seen that the opening price increased over time, especially for Amazon and Googl...

3867 sym R (3268 sym/23 pcs) 12 img

DATA 624 Homework 2

21.02.2022

3.1 Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time? global_economy %>% autoplot(GDP / Population, show.legend = FALSE) + labs(title= "GDP per capita", y = "$US") global_economy %>% mutate(GDP_per_capita = GD...

6122 sym R (6434 sym/21 pcs) 30 img

DATA 624 Homework 4

07.03.2022

3.1 The UC Irvine Machine Learning Repository contains a data set related to glass identification. The data consist of 214 glass samples labeled as one of seven class categories. There are nine predictors, including the refractive index and percentages of eight elements: Na, Mg, Al, Si, K, Ca, Ba, and Fe. Using visualizations, explore the predic...

3389 sym R (3137 sym/11 pcs) 42 img

DATA 624 Homework 6

29.03.2022

9.1 Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers. Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers. Explain the differences among these figures. Do they all indicate that the ...

8345 sym R (10310 sym/47 pcs) 33 img

DATA 624 Project 1

04.04.2022

Part A | ATM Forecast, ATM624Data.xlsx In part A, forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable Cash is provided in hundreds of dollars, other than that it is straight forward. Explain and demonstrate your process, techniques used and not used, and your actual forec...

9635 sym R (27464 sym/86 pcs) 34 img

DATA 624 Project 2

16.05.2022

Project #2 (Team) Assignment This is role playing. I am your new boss. I am in charge of production at ABC Beverage and you are a team of data scientists reporting to me. My leadership has told me that new regulations are requiring us to understand our manufacturing process, the predictive factors and be able to report to them our predictive mode...

4231 sym R (11671 sym/42 pcs) 9 img 3 tbl

DATA 624 Homework 9

02.05.2022

8.1 Recreate the simulated data from Exercise 7.2: library(mlbench) ## Warning: package 'mlbench' was built under R version 4.0.5 set.seed(200) simulated <- mlbench.friedman1(200, sd = 1) simulated <- cbind(simulated$x, simulated$y) simulated <- as.data.frame(simulated) colnames(simulated)[ncol(simulated)] <- "y" Fit a random forest model to...

6491 sym R (7479 sym/39 pcs) 7 img