Publications by Keith Rafferty

Final_project_Data607

09.12.2024

Overview of Data Set: Whiskey Ratings It’s the season of fine food and drinks, especially spirits, which are both suitable gifts and celebratory libations. There are hundreds, even thousands of different whiskeys on the market, encompassing a range of types, qualities, and tastes. Price point, in particular, is of interest since it’s not al...

6114 sym R (11692 sym/14 pcs) 9 img 3 tbl

Project4

02.12.2024

Assignment Overview and Data Import The goal here is to develop a classification model for determining if a message is spam (i.e. dangerous) or ham (i.e. safe). I found and selected a spam and ham dataset of 5,572 text messages freely available via Kaggle (https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset), which I downloaded ...

2596 sym R (4718 sym/31 pcs) 1 tbl

Assignment_discussion_11

11.11.2024

Design Scenario for Youtube’s Recommender System Youtube is among the most popular websites in the world, perhaps only rivaled by Google in terms of active users and site visits per month. Given this prominent position, Youtube’s system for making video recommendations has been under the spotlight in recent years, receiving particular criti...

4442 sym

Assignment_10

04.11.2024

Sentiment Analysis of the Inaugural Speeches of Presidents Barack Obama and Donald Trump It is the eve of another polarizing Presidential election in the United States. The winner of the election will be in inaugurated on January 20th 2025 and then have the opportunity to speak to the nation for the first time as President via the inaugural add...

2617 sym R (3637 sym/23 pcs) 2 img

Week9_assignment_KJR

27.10.2024

Overview I set up an account with the NYT’s developers site and received a key to use with their APIs. For my API request, I selected the a request for the most shared articles on Facebook over the last day. The data is successfully requested, converted to JSON format, then into a tibble data frame. knitr::opts_chunk$set(echo = TRUE) library...

321 sym R (1547 sym/10 pcs) 1 tbl

Assignment_7

14.10.2024

Overview The manually formatted html, xml, and json files contain information for three books. Each book entry contains the title of the book, the author(s), the page count, and the year the book was first published. Using various packages, the data from each file structure is retrieved from github, imported, and then converted into a data fram...

435 sym R (3405 sym/17 pcs) 5 tbl

Project2_World_dev_indicators

07.10.2024

Overview of Tidying and Example Analysis The data set has a few opportunities for tidying. The column “Country_Code”, which contains a three letter code to indicate the country, is dropped since it both redundant with and less informative than the preceding “Country_Name” column. The column “Series_Name” is dropped because it is too...

1239 sym R (1784 sym/5 pcs) 1 img

Assignment5_resub

07.10.2024

Data Import, Tidying, and Transformation The data is imported as a CVS file and converted to a data frame. The third row is dropped from the data frame because it is an empty row that served as a spacer between the data for the individual airlines in the CSV file. Next, the airline names are duplicated to fill in the missing values in rows 2 an...

1907 sym R (1857 sym/5 pcs) 3 img 3 tbl

Project2_Buffalo_Snow

07.10.2024

Overview of Tidying and Example Analysis The data set has a number of elements that require tidying. First, the header row is regularly repeated throughout the data set, so these rows need to be removed. Secondly, for some observations, the snow fall amount is recorded as “T”, indicating a trace amount of snow; however, trace amounts do not...

1498 sym R (2666 sym/9 pcs) 1 img 2 tbl

Project2_world_population

07.10.2024

Overview of Tidying and Example Analysis The data set has a couple opportunities for tidying. The column “CCA3”, which contains a three letter code to indicate the country, is dropped since it both redundant with and less informative than the “Country/Territory” column. Arguably other columns could also be dropped depending on the analy...

1157 sym R (1919 sym/6 pcs) 1 img 1 tbl