Publications by Christian Thieme

DATA606 - Final Project - What factors are most predictive of stress in college students?

13.05.2020

Introduction The purpose of this statistical analysis is to answer the following question: Which factors are most predictive of depression in college students? Depression seems to be an issue that is more and more common among college students. Through several community service activities I regularly participate in, I have had, and continue to ha...

29861 sym R (22063 sym/94 pcs) 23 img 1 tbl

Project 4: Document Classification - Using Machine Learning to Build a SPAM Predictor

26.04.2020

Introduction The purpose of this project is to build a classification model that can accurately classify spam email messages from ham email messages. We will do this by using pre-classified email messages to build a training set and then build a predictive model to forecast unseen email messages as either spam or ham. In order to build this predi...

11478 sym R (20969 sym/75 pcs) 11 img

Thieme-Proposal DATA606

18.04.2020

library(tidyverse) library(psych) Data Preparation # load data data <- readr::read_csv("https://raw.githubusercontent.com/christianthieme/MSDS-DATA606/master/Analysis%20Project/depression.csv") head(data) ## # A tibble: 6 x 50 ## inter_dom Region Gender Academic Age Age_cate Stay Stay_Cate Japanese ## <chr> <chr> <chr> <chr> ...

2134 sym R (3674 sym/27 pcs) 2 img

Recommender Systems Analysis - Udemy’s Recommender Engine

15.04.2020

Introduction E-Learning is an industry that has seen tremendous growth for the past decade. Continuing education, even past undergraduate and graduate degrees are now the norm for many vocations - particularly in the technology field. With the rapid rise and shifts in technologies, it is important to constantly be learning and experimenting to ke...

11481 sym 7 img

Week 10 Assignment DATA607 - Sentiment Analysis

05.04.2020

Sentiment Analysis “Sentiment Analysis is the process of computationally identifying and categorizing opinions in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral.” - Oxford Dictionary The purpose of this project is two-fold: First, to...

7618 sym R (16264 sym/67 pcs) 11 img

Week 9 Assignment - Working with Web APIs

27.03.2020

Introduction Being able to interact with and extract data from API’s is a critical skill for a data scientist. For this project, I will work with The New York Times web site API. In looking at the documentation available to developers, there are several different APIs to choose from. As a father with two young daughters who love to read, the Bo...

4703 sym R (38195 sym/20 pcs)

Week 7 Assignment - Working with HTML, XML, and JSON in R

15.03.2020

Working with HTML, XML, and JSON in R Introduction The purpose of this project is to demonstrate knowledge of HTML, XML, and JSON, as well as how to parse and extract information from each. As part of this project I manually created three seperate files: an HTML file, an XML file, and a JSON file. These files all contain the same information, bu...

5464 sym R (9589 sym/14 pcs) 3 img

Project 2 - Data Transformation

07.03.2020

Introduction The purpose of this project is to demonstrate the ability to transform data from various wide formats into a more digestible format for analysis. As part of the project, I will also clean/tidy the data and perform analysis. Below you will see three different data sets that were provided by fellow classmates. In addition to providing ...

9439 sym R (10964 sym/21 pcs) 8 img

Week 4 Project 1- Chess Tournament Data - Regular Expressions

23.02.2020

Project 1: Chess Tournament Data In this project we will take a raw text file containing the results of a chess tournament and extract key infomration from the file and perform some cacluations. What makes this project particularly challenging is that each entry in the file (a single chess player) has data points spanning two rows. Our task will ...

7439 sym R (7865 sym/18 pcs) 5 img

DATA607 Week 1 Assignment

02.02.2020

Where Does America Stand on Gun Policy? Overview of the Data This dataset contains polling results (in the form of “percent in favor”) for 8 questions related to gun policy in the US. The dataset was gathered by FiveThirtyEight as part of their article Do You Know Where America Stands On Guns?. The original dataset can be found on their GitH...

5846 sym R (3754 sym/11 pcs) 3 img