Publications by Leo Yi & Christopher Bloome

Data607 Final Presentation

12.05.2020

Introduction We want to research the association of basic education and its impact to different aspects of society. There’s a common understanding that education is beneficial for society and as people become more educated, society becomes more civilized. This is paired with the idea that when evaluating a scale of animal instincts to conscious...

17646 sym R (34326 sym/104 pcs) 37 img

Data607 Final Project Draft_20200502

03.05.2020

Introduction We want to research the association of basic education and its impact to different aspects of society. There’s a common understanding that education is beneficial for society and as people become more educated, society becomes more civilized. This is paired with the idea that when evaluating a scale of animal instincts to conscious...

11103 sym R (19577 sym/58 pcs) 18 img

Data606 Project

02.05.2020

May 2, 2020 Part 1 - Introduction Does the GDP per capita of a country in a given year predict the number of doctors per thousand people? In this presentation, we’ll explore the idea that richer countries have more doctors. It seems like a fair assumption, but let’s use data to determine whether this actually true or not. If there is a rel...

3906 sym R (2484 sym/9 pcs) 5 img

Data607 Project 4

21.04.2020

Spam or Ham Today, we’ll be looking at a group of spam and ham messages (not spam). We’ll gather the data into a single dataframe, and then separate that dataset into two sections- one to train a model, and another to test it. The files were downloaded from here according to the project instructions found on blackboard and decompressed into m...

1541 sym R (2201 sym/10 pcs)

Data607 Assignment 12

14.04.2020

YouTube Recommender Systems YouTube is the undoubtedly the best known online video content provider. There’s multiple recommender systems in play contributing to YouTube’s success. The goal is to keep users engaged for as long as possible and maximize the ads that they see. A few of the systems they use to recommend content include: What con...

5934 sym

Data607 Assignment 10

28.03.2020

tidytextmining ch 2 This week’s assignment is to run the primary code for chapter 2 of ‘Text Mining with R’ which can be found here. First, we’ll run the code found on the site to demonstrate sentiment analysis in R in book written by Jane Austin. Afterwards, we’ll extend the practice using our own example and another sentiment lexicon....

1386 sym R (11863 sym/69 pcs) 9 img

Tidyverse Create

27.03.2020

Universe? Tidyverse! Today, we’ll be demonstrating some of the uses of the tidyverse. We’ll be using dplyr, tidyr, stringr, and ggplot2 to take a dataset and perform some exploratory analysis. The tidyverse is a wonderful tool to be able to take a dataset of any type, transform it, and present findings. Being flexible and powerful allows user...

5508 sym R (5764 sym/25 pcs) 4 img

Data607 Assignment 9

24.03.2020

NYT Movie Reviews This week’s assignment is to access the NYT web API. We’ll get data in JSON format and convert it into a dataframe. Let’s load the packages we’ll use: library(httr) library(jsonlite) ## Warning: package 'jsonlite' was built under R version 3.6.3 library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects ...

717 sym R (4295 sym/11 pcs)

Data607 Project 3

22.03.2020

Overview What are the Most Valued Data Science Skills? The dataset used to answer this question was sourced from Kaggle: https://www.kaggle.com/elroyggj/indeed-dataset-data-scientistanalystengineer. It contains information from 5,715 data-science related job postings on the job-listings site, Indeed. It includes information like the job title, d...

4796 sym R (29956 sym/89 pcs) 34 img 3 tbl

Project 3 test run

21.03.2020

3/11/2020 Soft Skills #keywords as 'keywords' keywords<-unname(unlist(read.csv("https://raw.githubusercontent.com/chilleundso/DATA607/master/Project3/softskills.csv", stringsAsFactors = FALSE))) #Iniates new columns for each keyword; also makes list index list df[keywords] <- NA keywordColIndex <- seq(length(df)-length(keywords)+1,length(df...

912 sym R (12169 sym/48 pcs) 11 img 1 tbl