Publications by Eric Lehmphul

DATA 622: Homework 4

23.12.2022

Code Show All Code Hide All Code DATA622: Homework 4 DATA622: Homework 4 Objective Data Source Read in Data Glimpse at Data Normalize Data Explore Data Frequency of Labels Explore Images Machine Learning XG Boost CNN Which Performed Better? (XG Boost vs CNN) Conclusion Eric Lehmphul 12/12/2022 library(tidyverse) library(te...

8346 sym 7 img

DATA606: Final Project

09.12.2021

Code Show All Code Hide All Code Final Project: Cardiovascular Disease Final Project: Cardiovascular Disease Abstract Libraries Part 1 - Introduction Part 2 - Data Part 3 - Exploratory data analysis Part 4 - Logistic Regression Part 5 - Conclusion References Eric Lehmphul 11/30/2021 Abstract Cardiovascular disease is a major concern arou...

8210 sym R (11537 sym/25 pcs) 5 img 2 tbl

Project 4: Document Classification

14.11.2021

DATA 607: Project 4 DATA 607: Project 4 Project Task Overview Steps taken to Classify Emails as Ham or Spam Obtain Data Read and Store text in the Files Preprocess the emails to more easily classify Creating Training and Testing Data Classify Emails Conclusion Eric Lehmphul 11/13/2021 library(tidyverse) library(R.utils) library(tm) lib...

4155 sym R (4778 sym/21 pcs)

DATA 607: Discussion 11

04.11.2021

Pandora Music Pandora is a popular music streaming service similar to Spotify, Amazon Music, and Apple Music. Pandora offers a highly personalized listening experience to each user through the Music Genome Project and Podcast Genome Project (“About Pandora”). Scenario Analysis Who are Pandora’s target users? Pandora targets anyone who enj...

3637 sym

DATA 606: Project Proposal

01.11.2021

Data Preparation Retrieved dataset from Kaggle: https://www.kaggle.com/sulianova/cardiovascular-disease-dataset. It is a dataset relating to cardiovascular disease and relative variables of interest. library(tidyverse) library(GGally) # load data url <- "https://raw.githubusercontent.com/SaneSky109/DATA606/main/Data_Project/Data/cardio_train....

6625 sym R (8343 sym/35 pcs) 17 img

DATA 607: Data Science in Context

27.10.2021

DATA 607: Data Science in Context - K Nearest Neighbors Algorithm Eric Lehmphul 10/27/2021 KNN Supervised learning algorithm used in both classification and regression problems In classification problems, new data points will be classified in a particular class In regression problems, new data points will be labeled based on the average va...

2665 sym R (4994 sym/18 pcs) 5 img

DATA 607 Tidyverse CREATE: stringr

25.10.2021

Using stringr to handle Character and String Data Overview This is a brief overview of the stringr package from Hadley Wickham’s Tidyverse. Strings and characters are frequent data types that a data scientist encounters. The stringr package simplifies data manipulation involving string and character data types. Below are a handful of useful fu...

3349 sym R (5854 sym/24 pcs)

Team 2: Project 3 Part 1

10.10.2021

Team Collaboration Our team decided to use Slack as the main method of written communication, Zoom as a means to carry out team meetups, and Asana as a way to assign tasks to individuals and set deadlines. Code and documents will be shared through two github repositories: https://github.com/baruab/msdsrepo/tree/main/Project_3_607 https://github....

2315 sym R (490 sym/1 pcs) 1 img

DATA 607: Project 2

03.10.2021

Introduction The goal of this assignment is to provide practice in preparing different datasets for downstream analysis work. The three datasets were chosen from the Discussion 5 discussion board. Student Testing This dataset was provided by me, Eric Lehmphul. It is a toy dataset that I created which holds student name, student’s testing score...

4651 sym R (15184 sym/59 pcs) 3 img

Week 5 HW

26.09.2021

Overview In this assignment, I cleaned the untidy COVID data set provided and performed the analysis necessary to answer the questions provided in the spreadsheet. Code Data Cleaning and Preprocessing Read the data from github: library(tidyverse) ## -- Attaching packages --------------------------------------- tidyverse 1.3.1 -- ## v ggplot2 3....

2166 sym R (4482 sym/23 pcs)