Publications by Jaya Veluri
Document
Overview Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket - and therefore ‘Market Basket Analysis’. That is exactly what the Groceries Data Set contains: a collection of receipts with each line rep...
3107 sym R (6204 sym/19 pcs) 6 img
Document
Overview ABC Beverage has new regulations in place and the leadership team requires the data scientists team to understand the manufacturing process, the predictive factors and be able to report to them predictive model of PH. The selection of model depends upon various factors like model accuracy, data relevance, cross validation etc. R packa...
9939 sym Python (49826 sym/111 pcs) 21 img
Document
8.1 Recreate the simulated data from Exercise 7.2: set.seed(200) simulated <- mlbench.friedman1(200, sd = 1) simulated <- cbind(simulated$x, simulated$y) simulated <- as.data.frame(simulated) colnames(simulated)[ncol(simulated)] <- "y" A.Fit a random forest model to all of the predictors, then estimate the variable importance scores: ## ...
6329 sym 4 img
Document
7.2 Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data: y=10sin(πx1x2)+20(x3−0.5)2+10x4+5x5+N(0,σ2) where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created...
3166 sym Python (12113 sym/46 pcs) 3 img
Document
9.1 Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers. a Explain the differences among these figures. Do they all indicate that the data are white noise? One of the main differences we can observe is that each plot is made up of different sizes of random numbers (36, 360 and 1,000). The auto correlat...
8731 sym Python (12798 sym/89 pcs) 26 img
Document
6.2 Developing a model to predict permeability (See Sect. 1.4) could save significant resources for a pharmaceutical company, while at the same time more rapidly identifying molecules that have a sufficient permeability to become a drug: A Start R and use these commands to load the data: data(permeability) B The fingerprints predictors indicat...
3978 sym Python (17503 sym/39 pcs) 7 img 7 tbl
Document
Project1 In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feeling. Explain...
6166 sym Python (19157 sym/91 pcs) 30 img 1 tbl
Document
8.1 Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset. a. Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and l0 generate forecasts for the next four months. dafit <- aus_livestock %>% filter(State == "Victoria", Anima...
5186 sym 18 img
Document
3.1 The UC Irvine Machine Learning Repository contains a data set related to glass identification. The data consist of 214 glass samples labeled as one of seven class categories. There are nine predictors, including the refractive index and percentages of eight elements: Na, Mg, Al, Si, K, Ca, Ba, and Fe. The data can be accessed via: library(mlben...
3075 sym 5 img 3 tbl
Document
5.1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case: ### a.Australian Population (global_economy) b. Bricks (aus_production) Naïve method ## Warning: Removed 20 rows containing missing values (`geom_line()`). #### Seasonal naïve method ## Warning: Removed 20 ...
2231 sym 20 img