Publications by Gregg Maloy
Week 9 Assignment
Part 1: Introduction The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis You’ll need to start by signing up for an API key. Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame. For this assig...
2335 sym 2 img
Document
Introduction The purpose of this assignment was to: 1. collaborate in teams of 5 or less, 2. together locate a data set which would help answer the question “Which are the most valued data science skills?”, 3. together conduct an analysis which would answer the question “Which are the most valued data science skills?”, and 4. presen...
4648 sym 11 img
Document
Part 1: Introduction The purpose of this assignment was record attributes from three books in three different mark down languages: HTML, XML and JASN. The attributes were written into a table structure in each respective file. Finally, these files were ingest into R and displayed. Each file was written by hand using W3schools.com as a reference...
1448 sym
Data_607_Project_2_NYC_Transportation
Introduction The goal of this assignment was to practice cleaning and manipulation of datasets for downstream analysis work. For the assignment, an untidy CSV was obtained from the NYC.gov portal (https://data.ny.gov/Transportation/MTA-Daily-Ridership-Data-Beginning-2020/vxuj-8kew) and loaded into R. The dataset consist of NYC MTA daily ridershi...
2519 sym 3 img
Data_607_Project_2_TB_HDI
Introduction The goal of this assignment was to practice cleaning and manipulation of datasets for downstream analysis work. For the assignment, an untidy CSV was created and loaded into R. The dataset consist of ‘Tuberculosis(TB) incidence rates’ per 100,000 and corresponding ‘Human Development Index (HDI) rates’ for Poland, Germany, H...
2249 sym 2 img
Data_607_Project_2_Healthcare_Occupation_Salary_and_Employment
Introduction The goal of this assignment was to practice cleaning and manipulation of datasets for downstream analysis work. The dataset used in this RMD file can be found at: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/Health_US/hus20-21tables/hcempl.xlsx . The dataset consists of employment numbers and mean salaries for health...
3160 sym Python (3682 sym/11 pcs) 2 img
Data_607_Assignment_5
Introduction The purpose of this assignment was tidy and transform data using the tidyr and dplyr packages. For the assignment, an untidy CSV was created and loaded into R. Part 1: Load File and Inspection Below the untidy CSV was loaded into R via the read_csv command and placed into a dataframe. There were numerous data quality issues which ...
2413 sym Python (4824 sym/13 pcs) 3 img
Data_607_Project_1
Introduction The purpose of this project is to generate a CSV file from a partially structured text file. The end CSV file should be suitable for ingestion into a SQL database, ie MS SQL server. Part 1: Load File and Inspection The file was loaded into R Studio from Githuband consists of the results of a chess tournament. There appears to be s...
2842 sym
Data_607_Assignment_3
Introduction The objective of this assignment is to use regular expressions to manipulate and analyze strings in R, as well as to become familiar with functions/packages utilized in string manipulation. Question 1 Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-g...
2953 sym
Data_607_Assignment_1
Part 1 - Introduction The article ‘Why Many Americans Don’t Vote’ describes a survey and subsequent analysis which sought to to provide insights on voter history. The survey captured the responses of 8,327 unique individuals on questions which range from demographic information to questions on elections and political perceptions. The su...
2242 sym Python (1358 sym/3 pcs)