Publications by Diana Plunkett
Data624 Project 2
Project Introduction: New regulations are requiring ABC Beverage to provide a report with an outline of our manufacturing process, and a predictive model of PH including an explanation of predictive factors. Our data science team is tasked with developing the predictive model from provided historical data and using that model to predict PH on te...
19721 sym R (63443 sym/141 pcs) 18 img 1 tbl
Assignment 4
Introduction Choose a dataset, select a methodology from weeks 1-10 (Supervised Learning) and another from weeks 11-15 (Unsupervised Learning). Describe the problem you are trying to solve. Describe your data and what you did to prepare the data for analysis. Methodologies you used for analyzing the data. What’s the purpose of the analysis ...
6827 sym Python (39522 sym/60 pcs) 18 img
Assignment 3
Introduction Perform an analysis of the dataset used in Homework #2 using the SVM algorithm. Reference: https://www.geeksforgeeks.org/support-vector-machine-classifier-implementation-in-r-with-caret-package/ Select a files As in Assignment 1 & 2, using the public library data. PLDS2022 is the 2022 (most recent) version of the Public Library Surv...
4827 sym Python (8615 sym/24 pcs) 2 img
hw9
8.1 Recreate the simulated data from Exercise 7.2: library(mlbench) set.seed(200) simulated <- mlbench.friedman1(200, sd = 1) simulated <- cbind(simulated$x, simulated$y) simulated <- as.data.frame(simulated) colnames(simulated)[ncol(simulated)] <- "y" a Fit a random forest model to all of the predictors, then estimate the variable importan...
9035 sym R (7046 sym/31 pcs) 2 img
HW8
7.2 Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data: \(y=10sin(\pi x_1x_2)+20(x_3−0.5)^2+10x_4+5x_5 + N (0, σ^2)\) where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative var...
6518 sym R (18612 sym/54 pcs) 7 img
Assignment 2
Introduction Select a files Choose a dataset from a source in Assignment #1, or another dataset of your choice. Again, as in Assignment 1, using the public library data. PLDS2022 is the 2022 (most recent) version of the Public Library Survey Data. Found here, the data is a census survey of over 190 variables collected annually from over 9,000 l...
7574 sym Python (15946 sym/34 pcs) 6 img
HW7
Resources: https://topepo.github.io/caret/model-training-and-tuning.html https://daviddalpiaz.github.io/r4sl/the-caret-package.html https://towardsdatascience.com/create-predictive-models-in-r-with-caret-12baf9941236 6.2 Developing a model to predict permeability (see Sect. 1.4) could save significant resources for a pharmaceutical company, while...
4609 sym R (12550 sym/39 pcs) 2 img
Proj1
Part A - ATM Forcast In part A, I want you to forecast how much cash is taken out of 4 different ATM machines for May 2010. The data is given in a single file. The variable ‘Cash’ is provided in hundreds of dollars, other than that it is straight forward. I am being somewhat ambiguous on purpose to make this have a little more business feel...
13041 sym Python (16579 sym/83 pcs) 30 img
HW6
9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman 9.1 Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers. a. Explain the differences among these figures. Do they all indicate that the data are white noise? Yes, all the data look like white noise, with no autocorrelations outside of the limits (the dashed l...
7367 sym Python (16169 sym/122 pcs) 24 img
HW5
8.1 Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset. pigs <- aus_livestock |> filter (Animal == 'Pigs' & State == 'Victoria') pigs |> autoplot(Count) a. Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ0, and gen...
5079 sym Python (8329 sym/45 pcs) 16 img