Publications by Shoshana Farber

Tidyverse CREATE

17.04.2023

The goal of this project is to create a vignette of one or more features of a TidyVerse package in R. For this project, I will be focusing on some of the functions of the tidyr package and I will also use ggplot2 to demonstrate how these transformations can be helpful. Loading the Libraries library(tidyr) library(ggplot2) library(dplyr) The...

7243 sym R (6534 sym/14 pcs) 2 img

DATA 607 - Assignment 10

05.04.2023

This assignment aims at replicating and expanding upon the sentiment analysis code provided in Chapter 2 of Tidy Text Mining with R: A Tidy Approach. We start by getting the provided code to work and then extending the code in two ways: Working with a different corpus. Incorporating at least one additional sentiment lexicon. Loading Jane Aust...

6389 sym Python (8782 sym/23 pcs) 11 img

DATA 607 - Assignment 9

26.03.2023

DATA 607 - Assignment 9 DATA 607 - Assignment 9 Connecting and Requesting from the API Tidying the Data Some Analysis Conclusions Shoshana Farber March 26, 2023 The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis. You’ll need to start by signing up for an AP...

2355 sym 2 img 2 tbl

DATA 607 - Project 3

21.03.2023

Introduction. This work is the fourth of five stages of an analysis where the main objective was to identify the most valued data science skills. Our approach to this was to collect job postings from various job boards and extract the skills from the postings. The purpose of this specific file’s work is to analyze the skills extracted from th...

4291 sym Python (7689 sym/15 pcs) 11 img 4 tbl

DATA 607 - EC 4

21.03.2023

Working with the two JSON files available through the API at nobelprize.org, ask and answer 4 interesting questions, e.g. “Which country “lost” the most nobel laureates (who were born there but received their Nobel prize as a citizen of a different country)?” Requesting from the API V1 # laureates api V1 url_v1 <- GET('http://api.no...

2622 sym Python (6279 sym/17 pcs) 3 img 6 tbl

DATA 607 - Assignment 7

13.03.2023

Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting. Take the information that you’ve selected about these three books, and separately create three files which stor...

874 sym 3 tbl

DATA 607 - EC 3

08.03.2023

Loading the Data The chart above describes August 2021 data for Israeli hospitalization (“Severe Cases”) rates for people under 50 (assume “50 and under”) and over 50, for both un-vaccinated and fully vaccinated populations. israeli_vaccination <- read.csv(url("https://raw.githubusercontent.com/ShanaFarber/cuny-sps/master/DATA_607/israel...

3994 sym 3 tbl

DATA 607 - Project 2

06.03.2023

MTA Ridership Data Frame This data frame was provided by John Cruz and it was taken from the DATA.NY.GOV website. It shows the daily number of MTA riders on buses, subways, trains, bridges, and tunnels beginning in 2020. A full description can be found here. John’s suggested analysis of the data would be to show how commuter travel varies bas...

8403 sym Python (17946 sym/28 pcs) 6 img 3 tbl

DATA 607 - EC 2

01.03.2023

Load and Clean Data I used some of the code from project 1 to load and clean the data frame: raw <- readLines(url("https://raw.githubusercontent.com/ShanaFarber/cuny-sps/master/DATA_607/project1/player_stats.txt")) player_names <- unlist(str_extract_all(raw[-c(1, 2, 3, 4)], "([A-Z])+\\s([A-Z](\\s)?)*([A-Z])+")) totals <- unlist(str_extract_all...

2987 sym Python (9618 sym/14 pcs) 5 tbl

DATA 607 - Assignment 5

27.02.2023

Loading the Data I loaded in the data, making sure to include na.strings so that any empty character cell would be filled with NA. untidy_flights <- read.csv(url("https://raw.githubusercontent.com/ShanaFarber/cuny-sps/master/DATA_607/Assignment5/flights.csv"), na.strings=c("")) untidy_flights ## X X.1 Los.Angeles Pheonix San.Diego Sa...

1503 sym Python (4489 sym/14 pcs) 2 img 4 tbl