Publications by David Blumenstiel

Data 607 Homework 5

29.02.2020

Tidying Data Here, we will take some messy data and tidy it Importing the untidy data #Imports the csv (which I put on github) dirty <- read.csv('https://raw.githubusercontent.com/davidblumenstiel/data/master/Flight.csv') dirty ## X X.1 Los.Angeles Pheonix San.Diego San.Franscisco Seattle ## 1 ALASKA on time 497 221 ...

409 sym R (3163 sym/11 pcs) 1 img

Data 607 Project 1

22.02.2020

In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players: Player’s Name, Player’s State, Total Number of ...

959 sym R (12367 sym/15 pcs)

Data 606 Homework 3

16.02.2020

Dice rolls. (3.6, p. 92) If you roll a pair of fair dice, what is the probability of getting a sum of 1? 0. 1 + 1 = 2, which is the minimum getting a sum of 5? 0.111 (4/36). 4 out of 36 possibilities getting a sum of 12? 0.028 (1/36). 1 out of 36 possibilities Poverty and language. (3.8, p. 93) The American Community Survey is an ongoing...

7068 sym

Data 606 Lab 2

08.02.2020

Some define Statistics as the field that focuses on turning information into knowledge. The first step in that process is to summarize and describe the raw information - the data. In this lab, you will gain insight into public health by generating simple graphical and numerical summaries of a data set collected by the Centers for Disease Control ...

15971 sym R (5449 sym/94 pcs) 13 img

CUNY MSDS Data 607 Homework 1

30.01.2020

“Here’s What Your Part Of America Eats On Thanksgiving” https://fivethirtyeight.com/features/heres-what-your-part-of-america-eats-on-thanksgiving/ This article details the results of a survey about what we do and eat for Thanksgiving. The article discusses what types of side dishes are served in different parts of the country, and what peo...

863 sym R (1991 sym/3 pcs)

MSDS 2020 R Bridge HW-2

09.01.2020

Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes. path <- 'https://raw.githubusercontent.com/davidblumenstiel/nutsdataset/master/nuts' nuts <- read.csv(path) print(summary(nuts)) ## X cones ntrees dbh ## Min. : 1.00 ...

882 sym R (8013 sym/36 pcs)

R Bridge Final Project

18.01.2020

What makes a diamond valueable? This dataset contains descriptions of over 50,000 diamonds. From it, different characteristics of diamonds and their relationship to price will be examined. #Imports the data from github (was alerady up there) path <- 'https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/ggplot2/diamonds.csv' df <- re...

4173 sym R (7304 sym/40 pcs) 23 img

Data 606 Homework 2

08.02.2020

Stats scores. (2.33, p. 78) Below are the final exam scores of twenty introductory statistics students. 57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94 Create a box plot of the distribution of these scores. The five number summary provided below may be useful. boxplot(scores) fivenum(scores) ## [1] 57.0 72.5 78.5 ...

6239 sym R (715 sym/31 pcs) 5 img

Data 607 Homework 3

15.02.2020

1: Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS” #Loads and subsets the dataset address <- "https://raw.githubusercontent.com/fivethirtye...

2584 sym R (2626 sym/19 pcs)

Data 606 Lab 5b

07.03.2020

Sampling from Ames, Iowa If you have access to data on an entire population, say the size of every house in Ames, Iowa, it’s straight forward to answer questions like, “How big is the typical house in Ames?” and “How much variation is there in sizes of houses?”. If you have access to only a sample of the population, as is often the case...

6957 sym R (1183 sym/17 pcs) 3 img