Publications by Ken Wood

Statistical Inference in R Course Project - Part 2

06.10.2020

Introduction For the second portion of the course project, we’re going to analyze the ToothGrowth data in the R datasets package. Specifically, we will: Load the ToothGrowth data and perform some basic exploratory data analyses. Provide a basic summary of the data. Use confidence intervals and/or hypothesis tests to compare tooth growth by sup...

2430 sym R (2529 sym/13 pcs) 3 img

Getting and Cleaning Data in R - Course Project

06.10.2020

# Create one R script called run_analysis.R that does the following: # 1. Merges the training and the test sets to create one data set. # 2. Extracts only the measurements on the mean and standard deviation for each measurement. # 3. Uses descriptive activity names to name the activities in the data set # 4. Appropriately labels the data set wit...

5 sym R (5355 sym/10 pcs)

R Programming - Week 2 Assignment

06.10.2020

pollutantmean <- function(directory, pollutant, id=1:332) { # Create a list of files in the directory argument files_list <- list.files(directory, full.names = TRUE) df <- data.frame() #creates an empty data frame # Loop through the files, rbinding them together for (i in id) { df <- rbind(df, read.csv(files_list[i])) } # S...

5 sym R (7067 sym/29 pcs)

Data Science Capstone in R - Week 2 Analysis Alternate

07.10.2020

Instructions The goal of this project is to display that we’ve become familiar with the data and that we are on track to create our prediction algorithm. This report (to be submitted on R Pubs (http://rpubs.com/)) explains our exploratory analysis and our goals for the eventual app and algorithm. This document should be concise and explain only...

2901 sym R (4310 sym/21 pcs) 3 img

Data Science Capstone in R - Week 3 N-gram Generator

18.10.2020

As noted earlier, a corpus is a body of text from which we build and test LMs. rm(list = ls()) library(quanteda) ## Package version: 2.1.2 ## Parallel computing: 2 of 4 threads used. ## See https://quanteda.io for tutorials and examples. ## ## Attaching package: 'quanteda' ## The following object is masked from 'package:utils': ## ## View l...

852 sym R (3523 sym/22 pcs)

Data Science Capstone in R - Shiny App Presentation

16.11.2020

11/15/2020 Executive Summary Natural Language Processing App All code implemented using R Hosted at https://www.shinyapps.io \[\\\] Goal: Predict third word of tri-gram given two leading words \[\\\] Use Katz Back-Off Method for predictions Provide list of word predictions along with probabilities Training Corpus & Prediction Method Three ...

1045 sym 1 img

Bayesian Statistics - Data Analysis Project Rubric

07.06.2021

Bayesian Statistics - Data Analysis Project Rubric Part 1: Data (2 points) 1 pt for correct reasoning for generalizability – Answer should discuss whether random sampling was used. Learners might discuss any reservations, those should be well justified. 1 pt for correct reasoning for causality – Answer should discuss whether random assignme...

3595 sym

Bayes Regression

03.06.2021

This second lab will deal with model assumptions, selection, and interpretation. The concepts tested here will prove useful for the final peer assessment, which is much more open-ended. First, let us load the data: load("ames_train.Rdata") library(MASS) library(dplyr) library(ggplot2) library(plotly) library(devtools) library(statsr) library(broo...

5496 sym R (8488 sym/22 pcs) 1 img

Bayesian Inference Lab

28.05.2021

Bayesian Inference Getting Started In this lab we will review exploratory data analysis using the ggplot2 package for data visualization, which is included in the tidyverse. The main focus of this lab is to be able to obtain and interpret credible intervals and hypothesis tests using Bayesian methods for numerical variables. The data and functio...

18166 sym R (13841 sym/46 pcs) 8 img 2 tbl

Bayesian Statistics - Week 2 Practice Quiz

26.05.2021

Question 5: You are hired as a data analyst by politician A. She wants to know the proportion of people in Metrocity who favor her over politician B. From previous poll numbers, you place a Beta(40,60) prior on the proportion. From polling 200 randomly sampled people in Metrocity, you find that 103 people prefer politician A to politician B. What...

1333 sym R (276 sym/6 pcs)