Publications by Leo Yi & Christopher Bloome
Data607 Final Presentation
Introduction We want to research the association of basic education and its impact to different aspects of society. There’s a common understanding that education is beneficial for society and as people become more educated, society becomes more civilized. This is paired with the idea that when evaluating a scale of animal instincts to conscious...
17646 sym R (34326 sym/104 pcs) 37 img
Data607 Final Project Draft_20200502
Introduction We want to research the association of basic education and its impact to different aspects of society. There’s a common understanding that education is beneficial for society and as people become more educated, society becomes more civilized. This is paired with the idea that when evaluating a scale of animal instincts to conscious...
11103 sym R (19577 sym/58 pcs) 18 img
Data606 Project
May 2, 2020 Part 1 - Introduction Does the GDP per capita of a country in a given year predict the number of doctors per thousand people? In this presentation, we’ll explore the idea that richer countries have more doctors. It seems like a fair assumption, but let’s use data to determine whether this actually true or not. If there is a rel...
3906 sym R (2484 sym/9 pcs) 5 img
Data607 Project 4
Spam or Ham Today, we’ll be looking at a group of spam and ham messages (not spam). We’ll gather the data into a single dataframe, and then separate that dataset into two sections- one to train a model, and another to test it. The files were downloaded from here according to the project instructions found on blackboard and decompressed into m...
1541 sym R (2201 sym/10 pcs)
Data607 Assignment 12
YouTube Recommender Systems YouTube is the undoubtedly the best known online video content provider. There’s multiple recommender systems in play contributing to YouTube’s success. The goal is to keep users engaged for as long as possible and maximize the ads that they see. A few of the systems they use to recommend content include: What con...
5934 sym
Data607 Assignment 10
tidytextmining ch 2 This week’s assignment is to run the primary code for chapter 2 of ‘Text Mining with R’ which can be found here. First, we’ll run the code found on the site to demonstrate sentiment analysis in R in book written by Jane Austin. Afterwards, we’ll extend the practice using our own example and another sentiment lexicon....
1386 sym R (11863 sym/69 pcs) 9 img
Tidyverse Create
Universe? Tidyverse! Today, we’ll be demonstrating some of the uses of the tidyverse. We’ll be using dplyr, tidyr, stringr, and ggplot2 to take a dataset and perform some exploratory analysis. The tidyverse is a wonderful tool to be able to take a dataset of any type, transform it, and present findings. Being flexible and powerful allows user...
5508 sym R (5764 sym/25 pcs) 4 img
Data607 Assignment 9
NYT Movie Reviews This week’s assignment is to access the NYT web API. We’ll get data in JSON format and convert it into a dataframe. Let’s load the packages we’ll use: library(httr) library(jsonlite) ## Warning: package 'jsonlite' was built under R version 3.6.3 library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects ...
717 sym R (4295 sym/11 pcs)
Data607 Project 3
Overview What are the Most Valued Data Science Skills? The dataset used to answer this question was sourced from Kaggle: https://www.kaggle.com/elroyggj/indeed-dataset-data-scientistanalystengineer. It contains information from 5,715 data-science related job postings on the job-listings site, Indeed. It includes information like the job title, d...
4796 sym R (29956 sym/89 pcs) 34 img 3 tbl
Project 3 test run
3/11/2020 Soft Skills #keywords as 'keywords' keywords<-unname(unlist(read.csv("https://raw.githubusercontent.com/chilleundso/DATA607/master/Project3/softskills.csv", stringsAsFactors = FALSE))) #Iniates new columns for each keyword; also makes list index list df[keywords] <- NA keywordColIndex <- seq(length(df)-length(keywords)+1,length(df...
912 sym R (12169 sym/48 pcs) 11 img 1 tbl