Publications by Matthew Tillmawitz
Data 607 week 7 assignment
The Data The data is stored in my github in each of the three formats requested. The structure of the data is slightly different in each format due to the differing formatting requirements but is fundamentally the same information. xml_address = "https://raw.githubusercontent.com/Tillmawitz/data_607/refs/heads/main/assignment_7/books.xml" html_addr...
1663 sym
Data 624 Homework 5
8.1 8.1.1 By using the ETS function with an additive error and no trend or season we model simple exponential smoothing. The optimal α is 0.3221247 and the optimal l[0] is 100646.6 for this model. vic_pigs <- aus_livestock |> filter(Animal == "Pigs", State == "Victoria") |> select(Month, Count) pig_model <- vic_pigs |> model(ETS(Count ~ e...
4569 sym Python (8913 sym/47 pcs) 14 img
Data 607 Project 2 Buffalo Weather Cleanup
Initial Analysis This is a very fun data set to work with, as to the human eye it is a very easily interpretable dataset but it needs some serious coercion to be usable for analysis by a machine. Kevin Havis was kind enough to provide a csv in his post, and deserves credit for said portion. As the raw data can be easily copied from the source into ...
3197 sym 1 img
Data 607 Project 2 Movies Cleanup
Reading the Data The file is actually semicolon separated, but it is simple to read in using read_delim. An initial look shows that overall the data is well structured, but would benefit from collapsing the genre data. raw_data <- read_delim('https://raw.githubusercontent.com/Tillmawitz/data_607/refs/heads/main/project_2/movies.csv', delim = ";") #...
3316 sym 4 img
Data 624 Assignment 4
3.1 3.1.1 Looking at the distrobutions of the predictors in the Glass dataset, we immediately see some interesting scenarios. The Aluminum content appears to be the most normally distributed with Sodium, Calcium, and the Refractive Index being largely normal with a right skew. The Silicon and Magnesium measures also have fairly normal distribution...
4724 sym Python (9419 sym/17 pcs) 4 img
Data 607 Assignment 5
Read in data The data is read in from the csv that was created and can be viewed in the github repository this project resides in. Some filtering is done to remove empty rows in the csv and fill the airline name in. raw_data <- read_csv("flights.csv") marshal <- raw_data |> filter(!if_all(names(raw_data), ~ is.na(.))) |> fill(...1) |> rename(...
1574 sym Python (2434 sym/6 pcs) 2 img
Data 607 Project 1
Reading the data This a project that reads chess tournament results and writes a summarisation to a csv. To begin the data is read in skipping the first several rows that contain what would be the headers. We also want to filter out the dividing rows (those consisting of the “-” character) and trim any leading or trailing whitespace in the colu...
1554 sym
Data 624 Assignment 3
5.1 5.1.1 The Australian population data is most appropriately modeled using the drift method, and we have forecast the population growth for 4 years. aus_pop <- global_economy |> filter(Country == "Australia") |> select(Country, Year, Population) aus_pop_fc <- aus_pop |> model(RW(Population ~ drift())) |> forecast(h = 4) aus_pop_fc |> ...
4112 sym 20 img
Data 624 Assignment 2
3.1 Plotting the GDP per capita for all countries over time generates a rather cluttered graph with a legend that must be omitted due to the shear size of it. There are many ways to simplify this graph, but for the purpose of this question we only care about the countries with the “highest” GDP per capita. Given that this can be interpreted dif...
8672 sym 26 img
Data 607 Assignment 3
Instructions and the relevant questions can be found in the Instructions.txt file in the parent folder of this project. Question 1 There are only three majors in the 538 dataset of majors found in the majors-list.csv file that contain “DATA” or “STATISTICS”. college_majors <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data...
3060 sym