Publications by George Cruz

Project 3

17.10.2020

DATA SCIENCE SKILLS THAT MATTER Our group embarked on the quest to find an answer to the title question. Our approach consisted of trying to identify the terms that are commonly tagged along with the Twitter handle #Datascience. This provides the keywords that are most often associated with ‘data science’. The top 20 de-duped keywords will se...

9570 sym R (2700 sym/5 pcs) 16 img

DS606-Lab 5b

05.10.2020

library(tidyverse) library(openintro) library(infer) Exercise 1 us_adults <- tibble( climate_change_affects = c(rep("Yes", 62000), rep("No", 38000)) ) n <- 60 samp <- us_adults %>% sample_n(size = n) samp %>% count(climate_change_affects) %>% mutate(p = n /sum(n)) ## # A tibble: 2 x 3 ## climate_change_affects n p ## ...

7437 sym R (3366 sym/16 pcs)

DS606-Lab 5a

04.10.2020

library(tidyverse) library(openintro) library(infer) global_monitor <- tibble( scientist_work = c(rep("Benefits", 80000), rep("Doesn't benefit", 20000)) ) global_monitor %>% count(scientist_work) %>% mutate(p = n /sum(n)) ## # A tibble: 2 x 3 ## scientist_work n p ## <chr> <int> <dbl> ## 1 Benefits ...

14226 sym R (5580 sym/25 pcs) 5 img

DS607-Project 1

19.09.2020

Project Outline In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players: -Player’s Name -Player’s State, ...

2062 sym R (10589 sym/18 pcs)

DS607 - HW 3

13.09.2020

Homework 3 1. Using the 173 majors listed In fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS” library(tidyverse) theURL <- "https://raw.githubusercontent.com/fivethirtyeig...

4147 sym R (1569 sym/7 pcs)

DS606-HW3

14.09.2020

library(tidyverse) library(openintro) Rolling Dice. If you roll a pair of fair dice. What is the probability of getting a sum of: 1= The probability is 0. As the lowest possible sum is 2. 5= 4/36 as there are 4 possible combinations that will add up to 5. 12= 1/36 as there is only 1 possible combination that will add up to 12. Poverty and Lan...

14867 sym R (2263 sym/40 pcs) 2 img

DS606-Lab3

14.09.2020

library(tidyverse) library(openintro) Exercise 1 A streak of 1 means 1 hit. A streak of zero means a miss. Exercise 2 kobe_streak <- calc_streak(kobe_basket$shot) ggplot(data = kobe_streak, aes(x = length)) + geom_bar() The distribution is right-skewed. The typical streak length was 0 and the longest streak is 4. Exercise 3 coin_outcomes...

3490 sym R (1795 sym/14 pcs) 2 img

ds607 - HW5

28.09.2020

#Data 607: Tidying and transforming Data initial chart The chart above describes arrival delays for two airlines across five destinations. Your task is to: Create a .CSV file (or optionally, a MySQL database!) that includes all of the information above. You’re encouraged to use a “wide” structure similar to how the information appears ab...

5793 sym R (4003 sym/19 pcs) 3 img

DS606-HW4

30.09.2020

Area under the curve part I. (4.1 p. 142) Area under the curve, Part I. (4.1, p. 142) What percent of a standard normal distribution \(N(\mu=0, \sigma=1)\) is found in each region? Be sure to draw a graph. \(Z < -1.35\) scales::percent(pnorm(-1.35), accuracy = 0.01) ## [1] "8.85%" normalPlot(mean=0, sd = 1, bounds = c(-1.35, Inf), tails = TRU...

6495 sym R (2107 sym/45 pcs) 7 img

DS606-Presentation

30.09.2020

R Markdown 3.39 Grade distributions. Each row in the table below is a proposed grade distribution for a class. Identify each as a valid or invalid probability distribution, and explain your reasoning 0.3 0.3 0.3 0.2 0.1 0 0 1 0 0 0.3 0.3 0.3 0 0 0.3 0.5 0.2 0.1 -0.1 0.2 0.4 0.2 0.1 0.1 0 -0.1 1.1 0 0 is_valid <- function(vec) { result...

325 sym R (950 sym/13 pcs)