Publications by William Jasmine

Data 607 - Project 2 - Data Transformation

08.10.2022

Introduction This project includes work done to tidy three “messy” data sets that originated as CSV files. The sections below outline the steps required to clean these data sets, and then analyzes each of the resulting “clean” data frames to answer a specific research question. The three .csv files used, as well as a description of the da...

12181 sym Python (25318 sym/60 pcs) 4 img

Data 606 - Lab 5b - Confidence Intervals

08.10.2022

If you have access to data on an entire population, say the opinion of every adult in the United States on whether or not they think climate change is affecting their local community, it’s straightforward to answer questions like, “What percent of US adults think climate change is affecting their local community?”. Similarly, if you had dem...

15141 sym 3 img 1 tbl

Data 606 - Lab 5a - Sampling Distributions

08.10.2022

In this lab, you will investigate the ways in which the statistics from a random sample of data can serve as point estimates for population parameters. We’re interested in formulating a sampling distribution of our estimate in order to learn about the properties of the estimate, such as its distribution. Setting a seed: We will take some rando...

17525 sym 6 img

Data 606 - Lab 6 - Inference for Categorical Data

14.10.2022

Getting Started Load packages In this lab, we will explore and visualize the data using the tidyverse suite of packages, and perform statistical inference using infer. The data can be found in the companion package for OpenIntro resources, openintro. Let’s load the packages. library(tidyverse) library(openintro) library(infer) data('yrbss', p...

15126 sym Python (4845 sym/29 pcs) 2 img

Data 607 - Assignment 5 - Working with Document File Formats

17.10.2022

Introduction This assignment involves creating three different document files of different type (.xml, .html, and .xml), and loading them into R dataframes. As such, the following three files were created by hand: physics_books.html, physics_books.xml, and physics_books.json. They each contain the same information (details regarding three physics...

2567 sym

Data 606 - Lab 7 - Inference for Numerical Data

22.10.2022

Getting Started Load packages In this lab, we will explore and visualize the data using the tidyverse suite of packages, and perform statistical inference using infer. The data can be found in the companion package for OpenIntro resources, openintro. Let’s load the packages. library(tidyverse) library(openintro) library(infer) seed <- 1234 T...

12366 sym 2 img

Data 606 - Lab 7b - ANOVA

22.10.2022

The a previous lab we introduced the two-group independent \(t\)-test as a method for comparing the means of two groups. In some settings, it is useful to compare the means across more than two groups. The methodology behind a two-group independent \(t\)-test can be generalized to a procedure called analysis of variance (ANOVA). Assessing whether...

12536 sym 5 img

Data 607 - Project 3 - Most Valued Data Science Skills

24.10.2022

Introduction The work presented in this analysis is meant to answer the question: “what are the most valued data science skills?” While this is definitely a broad question to answer, there are a number of preexisting data sets that we can use to help answer this question. The data used here comes from Kaggle, and comprises the results of a su...

18469 sym Python (22422 sym/80 pcs) 9 img

Data 607 - Assignment 6 - Working With APIs

26.10.2022

Introduction This document outlines the process of retrieving data from an API so that it can be analyzed in R. In this case, we will be using the New York Times APIs as our example. More specifically, we will be pulling data from NYT’s Times Newswire API, which “provides an up-to-the-minute stream of published articles.” The steps outlined...

4307 sym Python (7015 sym/11 pcs) 2 img

Data 607 - Tidyverse CREATE Assignment

27.10.2022

Introduction This document will attempt to show how to make more efficient certain functional programming processes in R using Tidyverse’s purrr library. The data that will be used in this case comes from FiveThirtyEight’s data set containing predictions for the 2022-2023 NBA season, which can be found on Github. To make the predictions FiveT...

3603 sym