Publications by William Aiken
DATA624 HW3
Exercise 1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case: Australian Population (global_economy) global_economy_hw <- global_economy %>% mutate(Country = as.character(Country)) global_economy_hw <- global_economy_hw |> filter(Country == 'Austria') |> select(Ye...
3758 sym 19 img
DATA624 HW2
3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 Exercise 1 Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time? This data set requires some pre-processing before we can do anything meaningful with it. First we are going to cast th...
6474 sym Python (10574 sym/65 pcs) 30 img
DATA624 HW1
Exercise 2.1 Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec. Use ? (or help()) to find out about the data in each series. What is the time interval of each series? Use autoplot() to produce a time plot of each series. For the last plot, modify the axis labels and titl...
5339 sym R (10607 sym/78 pcs) 26 img
DATA622 HW4
Introduction The dataset that I’m using is a Kaggle dataset of surgical operative data. It contains patient demographic data, procedure information, temporal information about the time of day and year associated with the operation along with outcomes of 30 day mortality and complication events. Link to dataset This dataset originally came fr...
4914 sym R (15580 sym/45 pcs) 2 img
DATA622 HW3
Links to Given Articles A novel approach to predict COVID-19 using support vector machine Decision Tree Ensembles to Predict Coronavirus Disease 2019 Infection; A Comparative Study Links to Articles about Predicting Product Origin with SVM and Random Forest Discrimination of geographical origin of extra virgin olive oils using terahertz spec...
8025 sym R (3833 sym/35 pcs) 5 img 74 tbl
DATA622 HW2
Introduction In this assignment we are exploring “the good, the bad and the ugly” of using Decision Trees. We are looking into the bias and variance issues associated with decision trees and seeing if Random Forest provides a way around some of the issues associated with Decision Trees. LInk to Good, Bad and Ugly Decision Tree Article Loa...
7725 sym R (10840 sym/48 pcs) 7 img 74 tbl
DATA622 HW1
Load in the datasets wineDf <- read.csv("G:/Documents/DATA622_HW1/winemag-data_first150k.csv") ramenDf <- read.csv("G:/Documents/DATA622_HW1/ramen-ratings.csv") Remove the review number (unnecessary key) and cast star rating as numeric library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ##...
8152 sym R (40645 sym/60 pcs) 21 img 188 tbl
DATA 609 HW8
Ex.1 Use the nnet package to analyze the iris data set. Use 80% of the 150 sampels as the training data and the rest for validation. Discuss the results. library(nnet) library(dplyr) ## Warning: package 'dplyr' was built under R version 4.1.2 ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filt...
2171 sym R (4533 sym/30 pcs)
DATA 609 HW7
library(e1071) ## Warning: package 'e1071' was built under R version 4.1.2 library(dplyr) ## Warning: package 'dplyr' was built under R version 4.1.2 ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdif...
1763 sym R (5557 sym/19 pcs)
DATA 609 HW6
Problem 1 Use a data set such as the PlantGrowth in R to calculate three different distance metrics and discuss the results. Manhattan Euclidean Canberra man <- dist(PlantGrowth, method = 'manhattan', diag = FALSE, upper = FALSE) euc <- dist(PlantGrowth, method = 'euclidean', diag = FALSE, upper = FALSE) can <- dist(PlantGrowth, method = 'canbe...
2814 sym 2 img