Publications by Jeff Shamp

Spam Ham Classifier - Data 607 - Project 4

10.04.2020

Spam vs Ham - Project 4 - Data 607 Jeff Shamp 2020-04-13 Cleaning and Data Preparation First, we will import, clean, and process the data for classification. Datasets I found a large library of datasets here. Below is an excerpt from the site regarding a library of resources. The email spam messages are collected from: The ENRON email archive...

7368 sym R (9840 sym/27 pcs) 1 img

607 HW 10 - Moby Dick

01.04.2020

607 - HW 10 - Text Mining Jeff Shamp 2020-04-03 Text Mining We are to start this assignment by producing a working version of the example used in “Text Mining with R”, chapter 2. Base Code First let’s simply reproduce the code so that things are in the correct place. tidy_books <- austen_books() %>% group_by(book) %>% mutate(linenumbe...

4612 sym R (5731 sym/16 pcs) 5 img

tidyverse_extent

31.03.2020

GGPlot Recipe Extension Jeff Shamp via Sam Bellows 2020-04-01 if (!file.exists('president_approval_polls.csv')){ download.file('https://projects.fivethirtyeight.com/polls-page/president_approval_polls.csv', 'president_approval_polls.csv') } df <- read.csv('president_approval_polls.csv') GGPlot Vignette GGPlot is centered around is base functi...

4104 sym R (3276 sym/21 pcs) 11 img

tidyverse_create

31.03.2020

dbplyr - Tidyverse - Data 607 Jeff Shamp 2020-03-31 dbplyr - Out-Source Your Data With TidyVerse dbplyr is a great tool for unloading heavy amounts of data from RStudio and onto a SQL server. dbplyr is good choice when: You have a data heavy environment that is slowing down your local machine Your knits are taking forever You have many files th...

3778 sym R (3140 sym/11 pcs)

data 607 project 2 part 3 - Migration

02.03.2020

Tidy Data - Project 2 Part 3 - UN Migration Data Jeff Shamp 2020-03-08 Using tidyr, dplyr, dbplyr, RMySQL, ggplot2, stringr, RCurl UN Immigration Data I will be using the data source from Subhalaxmi Rout regarding UN Migrartion Data. This data set is huge, so I’m only looking at one table, which cooresponds to the year 1990. Once a pipline ca...

2070 sym R (5354 sym/11 pcs)

data 607 HW 4 - flights and tidy data

26.02.2020

Data 607 HW4 - tidy data Jeff Shamp 2020-03-01 Task - tidy the data I did some casual code reviews with Sam, Angel, and Layla and picked up a few ideas about how to tidy data from them. For this assignment we are to load in a .csv file with the data shown from Blackboard and tidy the data from wide to long form using tidyr and dplyr. Then perfor...

2955 sym R (3907 sym/9 pcs) 3 img

HW2 - PS! & PS2

06.02.2020

Problem Set 1 Problem 1 - Prove \(A^{T}A\neq AA^{T}\) Proof by induction: Assume: \(A^{T}A = AA^{T}\) Then for rows i in A and columns j in \(A^{T}\), \(A_{i1}\cdot A^{T}_{1j} = A^{T}_{i1}\cdot A_{1j}\). This is true if and only if \(A_{i1} = A^{T}_{1j}\), such that \(A = A^{T}\). Therefore, the assumption is false outside of the rare case of i...

1393 sym R (825 sym/5 pcs)

jshamp_data_605_HW4

20.02.2020

Problem Set 1 Starting with martix A as describe in the assignment. A<- matrix(c(1,2,3,-1,0,4), byrow=T, nrow=2) A ## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] -1 0 4 Computing X and Y, \(X=AA^{T}\) and \(Y=A^{T}A\) X<- A%*%t(A) Y<- t(A)%*%A Using the built-in functions in R to compute the eigenvectors. Showing the eigen vectors ...

1316 sym R (2184 sym/30 pcs)

Data 607 project 2 part 1- Bank Stocks

02.03.2020

Tidy Data - Project 2 Part 1 - Bank Stocks Jeff Shamp 2020-03-08 Using tidyr, dplyr, ggplot2, stringr, RCurl Tidy Data I will be using the data source from data I posted since I never had to tidy it (thanks to python/pandas). Load Data b<-getURL( "https://raw.githubusercontent.com/Shampjeff/cuny_msds/master/DATA_607/data/banks.csv") df_banks<-...

2705 sym R (3191 sym/7 pcs) 3 img

data 607 project 2 part 2 - Unicef

02.03.2020

Tidy Data - Project 2 part 2 - UNICEF Jeff Shamp 2020-03-08 Using tidyr, dplyr, ggplot2, stringr, RCurl Tidy Data I will be using the data source from Sam Bellows regarding UNICEF child mortality rates Load Data Looking at the initial state of the data. u<-getURL( "https://raw.githubusercontent.com/Shampjeff/cuny_msds/master/DATA_607/data/unic...

3084 sym R (2085 sym/7 pcs) 4 img