Publications by Cassie Boylan DH Kim Alexis M

Project 3B

18.10.2020

Data Loading library(readxl) library(httr) library(dplyr) library(ggplot2) library(lubridate) library(tidyverse) library(scales) library(janitor) library(epiDisplay) Data Source (Excel file) retailURL <- "http://archive.ics.uci.edu//ml//machine-learning-databases//00502//online_retail_II.xlsx" GET(retailURL, write_disk(tempFileName <- t...

7440 sym R (15395 sym/49 pcs) 4 img

DATA 607 - Project 2

04.10.2020

library(tidyverse) library(ggplot2) library(dplyr) library(stringr) library(readr) library(data.table) library(visdat) library(RCurl) Dataset 1 - Global TB rates NAMES_tb <- read.table("https://raw.githubusercontent.com/rodrigomf5/Tidydata/master/tb.csv", nrow = 1, stringsAsFactors = FALSE, sep = ",") DATA_tb <- read.table("https://raw....

3233 sym R (5750 sym/37 pcs) 3 img

DATA 606 - Hmwk 4

28.09.2020

Area under the curve, Part I. (4.1, p. 142) What percent of a standard normal distribution \(N(\mu=0, \sigma=1)\) is found in each region? Be sure to draw a graph. \(Z < -1.35\) - .0901 -> 9% \(Z > 1.48\) - 1-.9306 = .0694 -> 6.94% \(-0.4 < Z < 1.5\) - .9332 - .3446 = 58.86% \(|Z| > 2\) - 1 - .9772 = .0228 -> 2.28% Triathlon times, Part I (...

7721 sym R (57 sym/1 pcs) 7 img

DATA 606 - Lab 3

14.09.2020

library(tidyverse) library(openintro) Exercise 1 glimpse(kobe_basket) ## Rows: 133 ## Columns: 6 ## $ vs <fct> ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ORL, ... ## $ game <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1... ## $ quarter <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3......

5226 sym R (2223 sym/18 pcs) 2 img

DATA 606 - Lab 4

28.09.2020

Load packages library(tidyverse) library(openintro) head(fastfood) ## # A tibble: 6 x 17 ## restaurant item calories cal_fat total_fat sat_fat trans_fat cholesterol ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Mcdonalds Arti~ 380 60 7 2 0 95 ## 2 Mcdonalds ...

19508 sym R (5121 sym/50 pcs) 33 img

Stat Lab 5B - Confidence Intervals

05.10.2020

library(tidyverse) library(openintro) library(infer) set.seed(74229) Exercise 1 55% of my sample or 33 people think climate change affects their local community. us_adults <- tibble( climate_change_affects = c(rep("Yes", 62000), rep("No", 38000)) ) n <- 60 samp <- us_adults %>% sample_n(size = n) samp %>% count(climate_change_af...

9788 sym R (1017 sym/8 pcs)

Stat Lab 5A - Sampling Distributions

05.10.2020

library(tidyverse) library(openintro) library(infer) library(ggplot2) set.seed(74229) global_monitor <- tibble( scientist_work = c(rep("Benefits", 80000), rep("Doesn't benefit", 20000)) ) Exercise 1 Given that sample was taken randomly (assuming indp) and that size is > 30, we can assume that sampling distribution will approximate popul...

8090 sym R (2456 sym/12 pcs) 1 img

DATA 606 - Lab 6

12.10.2020

library(tidyverse) library(openintro) library(infer) set.seed(74226) Exercise 1 4792 have reported 0 days 925 have reported 1-2 days 4646 do not drive 827 have reported 30 days or texting and driving everyday in the past 30 days yrbss %>% count(text_while_driving_30d, sort=TRUE) ## # A tibble: 9 x 2 ## text_while_driving_30d n ##...

8696 sym R (4594 sym/23 pcs) 1 img

Stat Lab 7 - Numerical Inference

18.11.2020

library(tidyverse) library(openintro) library(infer) Exercise 1 The cases (rows) in this dataset are students surveyed in the YRBSS for a particular year. There are 13583 cases in this dataset. glimpse(yrbss) ## Rows: 13,583 ## Columns: 13 ## $ age <int> 14, 14, 15, 15, 15, 15, 15, 14, 15, 15, 15... ## $ gender ...

17273 sym R (6498 sym/54 pcs) 7 img

Stat Lab 8 - Linear Regression

18.11.2020

library(tidyverse) library(openintro) Exercise 1 The dimensions of this dataset are 123 columns or variables and 1458 rows or observations. dim(hfi) ## [1] 1458 123 #glimpse(hfi) Exercise 2 Relationships between 2 numerical variables are usually best shown in scatterplots. The relationship between pf_expression_control and pf_score can be est...

15277 sym R (5179 sym/31 pcs) 7 img