Publications by Sam Reeves

Text Mining with Jane Austen

17.04.2021

The task I’m going to use some code which gives a basic example of textmining with Jane Austen novels and extend it to a new corpus and a new lexicon. I’ll also bring in a new lexicon for sentiment analysis in another file. The legal bit The Jane Austen code is from Chapter 2: Looks at Sentiment Analysis in Text Mining with R by Julia Silge ...

2432 sym R (5629 sym/25 pcs) 5 img

Text Mining with Cartman

17.04.2021

I have found some domain specific lexicons created from some of the most popular reddit communites, made available by William Leif with an Apache License v2.0. https://github.com/williamleif/socialsent We will be using Eric Cartman’s dialogue from seasons 1-19 of Southpark to compare these lexicons. This information is available at https://gith...

1732 sym R (4411 sym/18 pcs)

absoluteBagels

11.04.2021

Picking an API After getting an API key from developer.nytimes.com/apis, I have opted to use the Article Search API. I want to know about all the NYTimes articles concerning the very center of the universe: Absolute Bagels on the UWS. When I left NYC, I would sometimes wake up from bagel dreams, chewing on a wet pillow. Maybe this will help me re...

948 sym R (3303 sym/6 pcs)

Document

22.03.2021

In this assignment, we will compare the data frame objects created by different file formats. At this point, I can’t really tell if I love or hate JSON data (it’s got to be one or the other). I suppose we will find out. library(RCurl) library(jsonlite) library(rvest) library(XML) library(janitor) Here are the files…. The JSON data loads qui...

352 sym R (2185 sym/7 pcs)

tidyAndTransform

06.03.2021

Load libraries and import .csv library(tidyr) library(dplyr) dat <- read.csv("https://raw.githubusercontent.com/TheWerefriend/data607/master/week5/numbersense.csv") Reshape the data Separate the two airlines, and create another column of totals. Create a tibble with all the data, and preserve the city names as a column. alaska <- t(dat[1:2, 3:7]...

468 sym R (967 sym/4 pcs)

Data Science in Context

03.03.2021

Kyats: The Worst Currency Ever Kyat is the national currency in Myanmar. The Central Bank of Myanmar sets 1 daily exchange rate each for 38 currencies. Some days it does not issue a new rate. The country has no real credit system. I once helped my friend’s father make a down payment on some real estate… We borrowed 7 garbage bags full of cash...

2489 sym R (5260 sym/9 pcs)

project2

16.03.2021

Exposition On 2/2/2021, a coup d’etats saw the arrest of the eleted leader of Myanmar, Aung San Suu Kyi. The national military body of the country, which had reserved for itself “25%” political power under the nominal democratic regime, took over many key industries and government departments over which it previously had no jurisdiction. Th...

5193 sym R (6331 sym/22 pcs)

DOGE_USD

11.04.2021

We’re going to showcase some features of various packages from the Tidyverse! Here we have data concerning the DogeCoin/USD exchange rates according to Yahoo Finance from 2014 to roughly the present. Using magrittr, dplyr, lubridate, and ggplot2, this should be a snap. Let’s get started. Load the data. rates <- read.csv("https://raw.githubus...

1256 sym R (1259 sym/16 pcs) 2 img

(Not) Predicting Eth/USD

16.05.2021

The Premise We want to predict the performance of Ethereum against USD using historical data. Generally, this practice is frowned upon because each new trading day changes the business cycles that produce following rates. However, for the sake of this assignment, instead of using regressors, we are going to try to create a model based on categori...

4214 sym R (6437 sym/28 pcs) 3 img

Spam Hunter

01.05.2021

The Project We have been tasked with creating a model to correctly separate spam emails from ham emails. Apparently, ham is industry jargon for the emails you actually want to receive. We will try to build a model using boilerplate text analysis. First, creating a corpus and document term matrix from the bank of labeled emails. The Data SpamAssa...

2184 sym R (5257 sym/18 pcs)