Publications by Keith Colella

Fuzzy Join Vignette

19.04.2023

library(tidyverse) library(fuzzyjoin) Overview This vignette will introduce the fuzzyjoin package, which enables joining of two datasets based on imperfect matches. This package is very helpful for combining data without unique keys. We will use data related to candidates running in the 2022 election for the House of Representatives. Specific...

6139 sym R (11045 sym/35 pcs)

Week 10 - NLP and “Text Mining with R”

02.04.2023

Reperformance of Textbook Exercises We begin by re-performing the text mining and sentiment analysis from Chapter 2 of Silge and Robinson’s “Text Mining with R” (https://www.tidytextmining.com/). I’ve directly leveraged the code and snippets of explanatory text from their book. In Part 2, I’ll complete the assignment by extending the an...

5013 sym R (8599 sym/45 pcs) 5 img

Week 10 - NLP and Congressional Candidate Tweets

31.03.2023

Assignment We’ve reviewed the sentiment analysis from Chapter 2 of Silge and Robinson’s “Text Mining with R” (https://www.tidytextmining.com/). Now, I’ll perform similar analysis on another corpus. The corpus I’ve chosen is a collection of ~280,000 tweets from 424 congressional candidates from the 2022 election cycle (hosted here on ...

6127 sym R (4713 sym/13 pcs) 5 img 5 tbl

Data607 - Week 9 - Web APIs

27.03.2023

Assignment Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame. Setup In addition to standard tidyverse usage, we’ll leverage the jsonlite for querying the API, lubridate for formatting dates, and kableExtra to display our results. library(tidyverse) li...

2015 sym R (3298 sym/10 pcs) 5 tbl

Week 7 - Web Technologies

13.03.2023

library(tidyverse) library(rvest) library(xml2) library(jsonlite) HTML Read in the file html <- read_html('https://raw.githubusercontent.com/kac624/cuny/main/D607/data/week7_books.html') Explore html %>% html_elements('title') ## {xml_nodeset (1)} ## [1] <title>This page has a table for D607.</title> html %>% html_elements('td') ## {xml_nodese...

183 sym R (3043 sym/19 pcs) 3 tbl

Project 2 - Dataset 2

06.03.2023

library(tidyverse) library(reshape2) Read in CSV TBD data <- read_csv('https://github.com/jwilber/Bob_Ross_Paintings/raw/master/data/bob_ross_paintings.csv') ## New names: ## Rows: 403 Columns: 28 ## ── Column specification ## ──────────────────────────────────────...

26 sym R (1825 sym/5 pcs)

Project 2 - Dataset 3

06.03.2023

library(tidyverse) library(reshape2) Read in CSV TBD data <- read_csv('https://github.com/kac624/cuny/raw/main/D607/data/healthcare_empl.csv', skip = 3) ## New names: ## Rows: 48 Columns: 15 ## ── Column specification ## ──────────────────────────────────...

28 sym R (2590 sym/7 pcs)

Project 2 - Dataset 1

06.03.2023

library(tidyverse) library(arrow) library(lubridate) library(sf) library(cowplot) Introduction and Exploratory Data Analysis I’ll focus on a massive dataset detailing all taxis rides in New York City since 2009. The data is maintained by the NYC government at the following site. https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page ...

6250 sym R (12886 sym/20 pcs) 2 img

Extra Credit - ELO Calculations

28.02.2023

library(tidyverse) Assignment Based on difference in ratings between the chess players and each of their opponents in our Project 1 tournament, calculate each player’s expected score (e.g. 4.3) and the difference from their actual score (e.g 4.0). List the five players who most overperformed relative to their expected score, and the five pl...

2154 sym R (3918 sym/11 pcs) 1 img

Week 5 - Tidying and Transforming Data

27.02.2023

library(tidyverse) library(reshape2) library(scales) Read in CSV First, I’ll read in the data from github. The flights data comes in .csv format, formatted exactly as provided in the assignment. data <- read_csv('https://raw.githubusercontent.com/kac624/cuny/main/D607/data/week5_flights.csv') ## New names: ## Rows: 5 Columns: 7 ## ── Co...

2324 sym R (3505 sym/12 pcs) 3 img