Publications by Jordan Glendrange

Data 622 Homework 1

10.10.2022

Data Exploration First we use the glimpse function to get a better understanding of our data. First Sales record file is (1000, 14) and the second is (1000000, 14). Both data sets have the same number of columns and the column names/data types are the same. df1 <- read.csv("1000 Sales Records.csv") df2 <- read.csv("50000 Sales Records.csv") glim...

1645 sym 5 img

Data 622 Homework 2

28.10.2022

Data Exploration First we use the glimpse function to get a better understanding of our data. First Sales record file is (1000, 14) and the second is (1000000, 14). Both data sets have the same number of columns and the column names/data types are the same. df <- read.csv("adult.csv") glimpse(df) ## Rows: 32,561 ## Columns: 15 ## $ age ...

307 sym 3 img

Data 622 Homework 3

11.12.2022

Data Exploration First we use the glimpse function to get a better understanding of our data. First Sales record file is (1000, 14) and the second is (1000000, 14). Both data sets have the same number of columns and the column names/data types are the same. df <- read.csv("adult.csv") glimpse(df) ## Rows: 32,561 ## Columns: 15 ## $ age ...

337 sym 4 img

Data 622 Homework 4

14.12.2022

Data Exploration I chose to use a data set acquired from kaggle. Each observation represents survey results. Most of the columns are binary and describe the health of an individual. The goal is to predict if the person has diabetes, hypertension, or ever has had a stroke. df <- read.csv('health_data.csv') head(df) ## Age Sex HighChol CholCheck ...

2204 sym 6 img

Data607 Homework 1

06.02.2021

Dataset The data set I chose is titled “How Unpopular is Donald Trump?” The link can be found here: https://projects.fivethirtyeight.com/trump-approval-ratings/ The dataset is hosted on my github account. approval_poll <- read.csv("https://raw.githubusercontent.com/jglendrange/DATA607/main/approval_polllist.csv", TRUE, ",") head(approval_pol...

1259 sym R (4190 sym/7 pcs) 3 img

Data 607 Project 1

28.02.2021

Read text file The tournament file is saved on my github account where I am pulling it using the function “read.delim”. I tried multiple separators, but decided on “\t”. tournamentInfo <- read.delim("https://raw.githubusercontent.com/jglendrange/DATA607/main/tournamentinfo.txt", header=FALSE, sep="\t") head(tournamentInfo) ## ...

1883 sym R (6501 sym/16 pcs)

Data 607 Homework 3

21.02.2021

Problem 1 Here I am bringing the data in using the rvest and tidyverse libraries. Html_table brings in all the table into a list. Since I only have one table I select the first index. The “[-1,]” is to remove the first row of data since is was not relevant. The data is pretty messy, so for the purpose of the problem I remove all the other col...

1599 sym R (1699 sym/8 pcs)

Data 606 Lab 3

19.02.2021

library(tidyverse) library(openintro) Exercise 1 A streak of length 1 would be making 1 shot and then missing the next shot. A streak of length 0 is missing a shot when your previous shot is also a miss. kobe_streak <- calc_streak(kobe_basket$shot) Exercise 2 The distribution of Kobe’s streaks is right skewed and peaks at 0. The median is len...

4141 sym R (781 sym/11 pcs) 2 img

Data606 Lab 2

12.02.2021

library(tidyverse) library(openintro) library(ggplot2) Exercise 1 Observing the 3 histograms we can see with bin size set to 30 and 150 not much is revealed. However, when we set our bin size to 15 the first bin does not have the most common frequency. The most common frequency is around 0, while there is a smaller set of times less than 0. ggpl...

5878 sym R (2935 sym/20 pcs) 6 img

Lab 1: Intro to R

07.02.2021

library(tidyverse) library(openintro) Exercise 1 arbuthnot$girls ## [1] 4683 4457 4102 4590 4839 4820 4928 4605 4457 4952 4784 5332 5200 4910 4617 ## [16] 3997 3919 3395 3536 3181 2746 2722 2840 2908 2959 3179 3349 3382 3289 3013 ## [31] 2781 3247 4107 4803 4881 5681 4858 4319 5322 5560 5829 5719 6061 6120 5822 ## [46] 5738 5717 5847 6203 6033 ...

4490 sym R (2493 sym/16 pcs) 3 img