Publications by rachelgreenlee
Data607 - Assignment Week 1
Overview In June of 2020 FiveThirtyEight published an article discussing how voter registration started out strong in early 2020, but dropped dramatically once COVID hit. This data set compares 2016 and 2020 voter registration for January through April or May (depending on locale) in 11 states and Washington DC. FiveThirtyEight obtained the data ...
2344 sym R (1874 sym/10 pcs) 1 img
Data607_Project3_TeamDAREZ
Introduction As recent as March of 2019 we are still hearing repeatedly that the demand of data scientists is not being met. The University of Pennsylvania states “Data analytics is becoming mission-critical to more and more businesses.” They quote LinkedIn co-found Allen Blue saying, “There are very few data scientists out here passing out...
28815 sym R (33317 sym/125 pcs) 10 img
DATA607 Assignment 5
Introduction I start with a screenshot of a small dataset and the goal is to put it into a tidy data format and then perform analysis to compare the arrival delays for the two airlines. Step 1 - Reproduce data in MySQL and import to R For the sake of practice, I’ll create two separate tables in MySQL, one for each airline. In MySQL I create th...
4739 sym R (1943 sym/11 pcs) 4 img
Practice Problem 3.41
3.41 HIV in Swaziland. Swaziland has the highest HIV prevalence in the world: 25.9% of this country’s population is infected with HIV. The ELISA test is one of the first and most accurate tests for HIV. For those who carry HIV, the ELISA test is 99.7% accurate. For those who do not carry HIV, the test is 92.6% accurate. If an individual from Sw...
803 sym 1 img
Project 1 - DATA607
Step 1 - Import the text file I access the .txt file on my github repo. I skip the first 4 lines as it’s dashes and headers. In order to to remove the full lines that comprise of dashes every 3 rows, I can write the pattern True, True, False for it to take the first and second rows, skip the third, and repeat. Now we read that into a table, wit...
1455 sym R (7770 sym/11 pcs)
DATA607 Assignment 3
Introduction Problem 1 Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS” First, import the CSV provided by FiveThirtyEight on Github. major...
2425 sym R (1945 sym/17 pcs)
DATA607_Proj2_fooddata
Introduction We choose to work with the dataset Rachel posted in last week’s discussion forum. Rachel found a dataset on Kaggle, originally from the Food and Agriculture Organization of the United Nations, that gives food production data for 245 countries. It shows what food items were produced for humans vs animals from 1961-2013. The years in...
6091 sym R (9788 sym/19 pcs) 7 img
Assignment 7
Introduction In this assignment I used Notepad to write the data for 3 books in 3 different formats: HTML, XML, and JSON. Next, I load each of these files into R and place in separate data frames. XML Data Using the XML and RCurl packages (as xmlParse wouldn’t accept my file as XML without RCurl), I access the .xml from my GitHub, parse it, an...
2224 sym R (1997 sym/7 pcs)
Assignment 9
Purpose For this assignment I’ll be learning how to read data in from an API with the New York Times that requires a access key in order to look at data from their bestseller book lists the week of my birthday this past year. Reading in the Data First I’ll need to install some packages to access, manipulate, and display the data library(httr...
2869 sym R (1446 sym/5 pcs)
Ussing ggExtra for Exploratory Plotting
Introduction This vignette will take a quick peek at two useful data exploration plot types provided in the ggExtra package using a UFO sightings dataset. We aren’t going to worry about style or labels, just some quick plots to explore your data well before further analysis and presenting findings to others. Setup To have some fun, I picked a ...
1241 sym R (3032 sym/10 pcs) 4 img