Publications by Matthew Lucich

4.19 Underage drinking, Part II

07.10.2020

We learned in Exercise 4.17 that about 70% of 18-20 year olds consumed alcoholic beverages in any given year. We now consider a random sample of fifty 18-20 year olds. Part A How many people would you expect to have consumed alcoholic beverages? And with what standard deviation? \(\mu = np\) n <- 50 p <- 0.697 # Average n*p ## [1] 34.85 \(\sigm...

791 sym R (278 sym/7 pcs) 1 img

Assignment 2 – Character Manipulation & Data Processing

21.02.2021

library(tidyverse) 1) "DATA" or "STATISTICS" College Majors # Load college majors CSV college_majors <- read.csv('majors-list.csv') # Preview data glimpse(college_majors) # Filter for only majors containing "DATA" or "STATISTICS" data_or_stat_majors <- college_majors %>% filter(grepl('DATA|STATISTICS', Major)) data_or_stat_majors 2) Transform...

841 sym R (2701 sym/9 pcs) 1 img

Assignment 2 – SQL and R

14.02.2021

library(DBI) library(dplyr) library(stringr) Overview and Workflow To develop a common pipeline: data was extracted from a CSV, loaded into multiple MySQL database tables, and briefly explored for high-level insights. First, consideration was put into the database schema, which ended up including three tables in order to enforce table normalizat...

1834 sym R (5745 sym/6 pcs) 2 img

Loading Data into a Data Frame

06.02.2021

library(tidyverse) library(ggplot2) Overview The article You Can’t Trust What You Read About Nutrition explains some of the main reasons nutrition studies and the reporting around them are problematic. This is mainly due to studies relying on questionable survey data as well as p-hacking. In regards to the latter issue, spurious associations a...

1278 sym R (4062 sym/13 pcs) 7 img

Project 1 - Chess Data Analysis

27.02.2021

library(tidyverse) Load and preview chess data # Load data cross_table <- read.delim('tournamentinfo.txt') # Check type typeof(cross_table) ## [1] "list" # Preview data head(cross_table,12) ## X......................................................................................... ## 1 Pair | Player Name |Total|Round|...

2181 sym R (7114 sym/15 pcs) 1 img

Tidyverse: using stringr, dplyr, and tibble to clean up catch phrases

03.04.2021

Cleaning up catch phrases from classic movies Source: https://www.kaggle.com/thomaskonstantin/150-famous-movie-catchphrases-with-context?select=Catchphrase.csv Chose a text only dataset in order to demonstrate efficient string manipulating functions from stringr. Additionally, most examples contain data stored in a tibble and data management func...

2120 sym R (4971 sym/14 pcs)

Project 2 – Dataset 3: Canada Labour Force Characteristics

14.03.2021

library(tidyverse) library(ggplot2) library(DT) Overview The raw Canada Labour Force Characteristics dataset used in this project is far from being able to be used for data analysis purposes. To change this several data cleaning and transformation tactics are employed. Extraneous rows are removed in lines 36, 65, 73, and 74. Columns are renamed ...

1632 sym R (6576 sym/10 pcs) 3 img

Project 2 – Dataset 2: Congressional Seats by State

14.03.2021

library(tidyverse) library(ggplot2) library(DT) Overview The approach below starts off with loading CSV data from the Brookings Institution, titled 1-1 Apportionment of Congressional Seats, by Region and State, 1910 - 2010. The dataset includes unnecessary headers, therefore the skip parameter is passed in while loading the dataframe. There are ...

1904 sym R (3906 sym/8 pcs) 2 img

Project 2 - Dataset 1 - GDP % Change by Country

13.03.2021

library(tidyverse) library(ggplot2) library(DT) Overview The objective of this project is to load, tidy, transform, and analyze a (potentially wide) dataset. The approach below starts off with loading World Bank GDP percent change data by country from a CSV. The initial data cleanup includes removing rows with all NAs for the GDP percent change ...

1705 sym R (11946 sym/18 pcs) 8 img

Document

07.03.2021

library(tidyverse) library(ggplot2) Overview The first objective in this process is loading the data, followed by cleaning up the data so that it takes on an expected dataframe form (i.e. empty rows, or null values when not appropriate). Next, we transform the data, moving from wide to long in order to more easily analyze. Conversely, we widen t...

1438 sym R (6679 sym/17 pcs) 5 img