Publications by Christian Thieme

Final Project - Student Performance Analysis - R Bridge

19.01.2020

Student Perfromance Analysis Overview of Dataset: This dataset contains the test results for 1,000 high school students in math, reading, and writing. In addition, this dataset contains the following socioeconomic dimensional data: gender (male, female) race/ethnicity (given as groups A - E) parental level of education (some high school, high s...

7981 sym R (8656 sym/17 pcs) 6 img

Week 2 Assignment R Bridge

28.12.2019

Question 1: Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes: #importing libraries needed for assignment library(readr) library(plyr) library(dplyr) # documentation for dataset found at https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/doc/boot/clar...

2155 sym R (6988 sym/24 pcs)

Week 1 Assignment - R Bridge Course

18.12.2019

1. Write a loop that calculates 12-factorial: val = 1 for (i in 1:12) { val <- sum(i * val) } sprintf("12! = %f", val) ## [1] "12! = 479001600.000000" 2. Show how to create a numeric vector that contains the sequence from 20 to 50 by 5: num_vect <- seq(from = 20, to= 50, by= 5) class(num_vect) #display data type to ensure it is numeric ...

314 sym R (778 sym/8 pcs)

3 Week Assignment DATA607 - R Character Manipulation

15.02.2020

Week 3 Assignment - R Character Manipulation The below assignment is geared toward jumping into character extraction/manipulation with R using Regular Expressions. The examples show how to use regex in a variety of ways such as identifying rows from a dataframe containing certain words, extracting key data from messy datasets, as well as using ca...

2831 sym R (3508 sym/12 pcs)

Week 5 Assignment DATA607 - Tidying and Transforming Data with tidyr

29.02.2020

Tidying and Transforming Data with tidyr Introduction: The purpose of this assignment is to: Demonstrate how to transform data between wide and long formats with tidyr Demonstrate how to tidy messy/unitdy data using tidyr - single entries on multiple lines and missing data Perform data analysis using ggplot Our dataset includes arrival delays ...

5612 sym R (7579 sym/17 pcs) 7 img

Using purrr::map() Instead of For Loops in R

26.03.2020

Using purrr::map() Instead of For Loops in R In many other programming languages, for loops are extremely important. However, R is a functional programming language, which means that R has the ability “to wrap up for loops in a function, and call that function instead of using the for loop directly” (R for Data Science, pg. 322). Many people ...

6286 sym R (3576 sym/13 pcs)

Tidyverse Extend Assignment - Lubridate

11.04.2020

Dates are a frequent feature in data analysis and data science projects. In this vignette we’ll look at the Lubridate package and perform a few date transformations. library(tidyverse) library(lubridate) Load information from Kaggle’s Hourly Energy Consumption dataset link to data description #energy <- read.csv('C:/Users/user/Documents/00_...

4425 sym R (3131 sym/21 pcs) 3 img

DATA605 Week 1 Assignment - Vectors, Matrices, and Systems of Equations

29.08.2020

Problem Set 1: You can think of vectors representing many dimensions of related information. For instance, Netflix might store all the ratings a user gives to movies in a vector. This is clearly a vector of very large dimensions (in the millions) and very sparse as the user might have rated only a few movies. Similarly, Amazon might store the ite...

2489 sym R (2754 sym/5 pcs) 6 img

DATA605 - Week 2 - Transpose Proof, Matrix Decomposition function

04.09.2020

Problem Set 1 Question 1: Proof: The rules of linear algebra state that multiplication of matrices is NOT commutative. The full proof comes from the fact that Matrices are members of the non-commutative ring theory with respect to multiplication - which is beyond the scope of this course. However, at its core, for all matrices besides two by tw...

2229 sym R (3048 sym/5 pcs) 4 img

DATA605 - Week 5 - Probability Distributions

25.09.2020

Probability Distributions Choose independently two numbers B and C at random from the interval [0, 1] with uniform density. Prove that B and C are proper probability distributions. Note that the point (B,C) is then chosen at random in the unit square. We will use the runif function in R to pick two numbers from the interval [0,1] with uniform de...

1502 sym R (882 sym/7 pcs) 2 img