Publications by John Cruz
Working with NY Times API
Import Libraries library(tidyverse) library(glue) library(jsonlite) library(lubridate) library(ggrepel) Introduction The New York Times (NYT), provides access to their data through the Times API. With it, data analysis and visualizations can be performed on trends or decisions made within their articles published. This report will be a general use...
3115 sym R (4323 sym/10 pcs) 4 img 2 tbl
Favorite Books
Introduction This project focuses on working with different types of files for analysis. I will be manually creating HTML, XML and JSON formats that store three of my favorite books related to data science and programming. Each file will store the title, author(s), publisher, published date, ISBN-13, and a best sellers rank from Amazon’s website....
2762 sym R (2051 sym/6 pcs) 3 tbl
MTA Daily Ridership
Introduction The Metropolitan Transportation Authority (MTA), provides a daily ridership dataset containing systemwide ridership and traffic estimates for subways, buses, Long Island Rail Road, Metro-North Railroad, Access-A-Ride, and Bridges and Tunnels. The data starts March 1, 2020 (April 1, 2020 for LIRR and Metro-North) until current date, and...
2590 sym R (2167 sym/6 pcs) 8 img 2 tbl
Pokemon Pokedex
Introduction The Serebii website provides a list, called the National Pokedex, of all the Pokemon in all the games. The table breaks down each Pokemon with a unique ID number, name, type, abilities and base stats such as attack and defense. Data Sources: Pokemon Stats Lets determine a frequency chart of which Pokemon fall into which types they are....
3386 sym R (3046 sym/8 pcs) 1 img 6 tbl
CDC Health Care Employment 2000-2020
Introduction The Center for Disease Control and Prevention (CDC), through the National Center for Health Statistics (NCHS), released data about health care employment and wages within the United States between 2000-2020. The selected occupations range between two categories of health care practitioners and technical roles such as physician assistan...
3155 sym R (2482 sym/11 pcs) 1 img 9 tbl
Probability: Kobe Bryant and 'Hot Hand'
The Hot Hand Basketball players who make several baskets in succession are described as having a hot hand. Fans and players have long believed in the hot hand phenomenon, which refutes the assumption that each shot is independent of the next. However, a 1985 paper by Gilovich, Vallone, and Tversky collected evidence that contradicted this belief an...
11982 sym 3 img
Transforming Wide Data
Introduction The objective is to be able to transform a wide format data structure into a long format where you have ‘tidy’ the data to perform the analysis easier. The data is flight arrival counts from two airlines in five different cities. Required Libraries library(tidyverse) library(pollster) Import Data Import wide format of CSV data df...
1819 sym R (1739 sym/9 pcs) 2 img 7 tbl
Chess Text File to CSV
Introduction The objective is to import a text file containing the results of a chess tournament, extract the information needed, and export to a CSV file that could be used in a database. A preview of the text file being used Data Dictionary Column Description player_name player’s full name player_state player’s state total_pts total poin...
3029 sym R (2513 sym/12 pcs) 2 img 11 tbl
Character Manipulation
Required libraries library(tidyverse) library(rvest) Question 1 Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset, provide code that identifies the majors that contain either “DATA” or “STATISTICS” url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv" df <- read_cs...
1423 sym R (2016 sym/7 pcs) 1 tbl
How Americans Like Their Steak
Overview Walt Hickey from FiveThirtyEight collected data from people within the United States to see if a risk-averse person would be more likely to order a steak well done. They found no evidence a person that was a higher risk taker would prefer their steaks rare. ‘FiveThirtyEight Article’ ‘Data Source’ Required libraries library(pande...
1551 sym R (2102 sym/10 pcs) 2 img 4 tbl