Publications by Ken Wood
Statistical Inference in R Course Project - Part 2
Introduction For the second portion of the course project, we’re going to analyze the ToothGrowth data in the R datasets package. Specifically, we will: Load the ToothGrowth data and perform some basic exploratory data analyses. Provide a basic summary of the data. Use confidence intervals and/or hypothesis tests to compare tooth growth by sup...
2430 sym R (2529 sym/13 pcs) 3 img
Getting and Cleaning Data in R - Course Project
# Create one R script called run_analysis.R that does the following: # 1. Merges the training and the test sets to create one data set. # 2. Extracts only the measurements on the mean and standard deviation for each measurement. # 3. Uses descriptive activity names to name the activities in the data set # 4. Appropriately labels the data set wit...
5 sym R (5355 sym/10 pcs)
R Programming - Week 2 Assignment
pollutantmean <- function(directory, pollutant, id=1:332) { # Create a list of files in the directory argument files_list <- list.files(directory, full.names = TRUE) df <- data.frame() #creates an empty data frame # Loop through the files, rbinding them together for (i in id) { df <- rbind(df, read.csv(files_list[i])) } # S...
5 sym R (7067 sym/29 pcs)
Data Science Capstone in R - Week 2 Analysis Alternate
Instructions The goal of this project is to display that we’ve become familiar with the data and that we are on track to create our prediction algorithm. This report (to be submitted on R Pubs (http://rpubs.com/)) explains our exploratory analysis and our goals for the eventual app and algorithm. This document should be concise and explain only...
2901 sym R (4310 sym/21 pcs) 3 img
Data Science Capstone in R - Week 3 N-gram Generator
As noted earlier, a corpus is a body of text from which we build and test LMs. rm(list = ls()) library(quanteda) ## Package version: 2.1.2 ## Parallel computing: 2 of 4 threads used. ## See https://quanteda.io for tutorials and examples. ## ## Attaching package: 'quanteda' ## The following object is masked from 'package:utils': ## ## View l...
852 sym R (3523 sym/22 pcs)
Data Science Capstone in R - Shiny App Presentation
11/15/2020 Executive Summary Natural Language Processing App All code implemented using R Hosted at https://www.shinyapps.io \[\\\] Goal: Predict third word of tri-gram given two leading words \[\\\] Use Katz Back-Off Method for predictions Provide list of word predictions along with probabilities Training Corpus & Prediction Method Three ...
1045 sym 1 img
Bayesian Statistics - Data Analysis Project Rubric
Bayesian Statistics - Data Analysis Project Rubric Part 1: Data (2 points) 1 pt for correct reasoning for generalizability – Answer should discuss whether random sampling was used. Learners might discuss any reservations, those should be well justified. 1 pt for correct reasoning for causality – Answer should discuss whether random assignme...
3595 sym
Bayes Regression
This second lab will deal with model assumptions, selection, and interpretation. The concepts tested here will prove useful for the final peer assessment, which is much more open-ended. First, let us load the data: load("ames_train.Rdata") library(MASS) library(dplyr) library(ggplot2) library(plotly) library(devtools) library(statsr) library(broo...
5496 sym R (8488 sym/22 pcs) 1 img
Bayesian Inference Lab
Bayesian Inference Getting Started In this lab we will review exploratory data analysis using the ggplot2 package for data visualization, which is included in the tidyverse. The main focus of this lab is to be able to obtain and interpret credible intervals and hypothesis tests using Bayesian methods for numerical variables. The data and functio...
18166 sym R (13841 sym/46 pcs) 8 img 2 tbl
Bayesian Statistics - Week 2 Practice Quiz
Question 5: You are hired as a data analyst by politician A. She wants to know the proportion of people in Metrocity who favor her over politician B. From previous poll numbers, you place a Beta(40,60) prior on the proportion. From polling 200 randomly sampled people in Metrocity, you find that 103 people prefer politician A to politician B. What...
1333 sym R (276 sym/6 pcs)