Publications by Ken Wood

Practical Machine Learning in R - Quiz 2

05.10.2020

Question 1 library(caret) ## Loading required package: lattice ## Loading required package: ggplot2 library(AppliedPredictiveModeling) data(AlzheimerDisease) What are the set of commands that will create non-overlapping training and test sets with about 50% of the observations assigned to each? adData = data.frame(diagnosis,predictors) trainIndex...

1475 sym R (4792 sym/31 pcs) 3 img

Regression Models - Course Project

05.10.2020

Executive Summary Motor Trend, a magazine about the automobile industry, wants to look at a data set of a collection of cars to learn more about mileage. They are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). Specifically, they are interested in answering the following two questions: �...

3818 sym R (3613 sym/14 pcs) 2 img

Reproducible Research in R - Week 4 Assignment

06.10.2020

Executive Summary Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This analysis leverages the U.S. National Oceanic and Atmo...

3222 sym R (6388 sym/16 pcs) 2 img

Statistical Inference in R Course Project - Part 2

06.10.2020

Introduction For the second portion of the course project, we’re going to analyze the ToothGrowth data in the R datasets package. Specifically, we will: Load the ToothGrowth data and perform some basic exploratory data analyses. Provide a basic summary of the data. Use confidence intervals and/or hypothesis tests to compare tooth growth by sup...

2430 sym R (2529 sym/13 pcs) 3 img

Getting and Cleaning Data in R - Course Project

06.10.2020

# Create one R script called run_analysis.R that does the following: # 1. Merges the training and the test sets to create one data set. # 2. Extracts only the measurements on the mean and standard deviation for each measurement. # 3. Uses descriptive activity names to name the activities in the data set # 4. Appropriately labels the data set wit...

5 sym R (5355 sym/10 pcs)

R Programming - Week 2 Assignment

06.10.2020

pollutantmean <- function(directory, pollutant, id=1:332) { # Create a list of files in the directory argument files_list <- list.files(directory, full.names = TRUE) df <- data.frame() #creates an empty data frame # Loop through the files, rbinding them together for (i in id) { df <- rbind(df, read.csv(files_list[i])) } # S...

5 sym R (7067 sym/29 pcs)

Data Science Capstone in R - Week 2 Analysis Alternate

07.10.2020

Instructions The goal of this project is to display that we’ve become familiar with the data and that we are on track to create our prediction algorithm. This report (to be submitted on R Pubs (http://rpubs.com/)) explains our exploratory analysis and our goals for the eventual app and algorithm. This document should be concise and explain only...

2901 sym R (4310 sym/21 pcs) 3 img

Data Science Capstone in R - Week 3 N-gram Generator

18.10.2020

As noted earlier, a corpus is a body of text from which we build and test LMs. rm(list = ls()) library(quanteda) ## Package version: 2.1.2 ## Parallel computing: 2 of 4 threads used. ## See https://quanteda.io for tutorials and examples. ## ## Attaching package: 'quanteda' ## The following object is masked from 'package:utils': ## ## View l...

852 sym R (3523 sym/22 pcs)

Data Science Capstone in R - Shiny App Presentation

16.11.2020

11/15/2020 Executive Summary Natural Language Processing App All code implemented using R Hosted at https://www.shinyapps.io \[\\\] Goal: Predict third word of tri-gram given two leading words \[\\\] Use Katz Back-Off Method for predictions Provide list of word predictions along with probabilities Training Corpus & Prediction Method Three ...

1045 sym 1 img

Bayesian Statistics - Data Analysis Project Rubric

07.06.2021

Bayesian Statistics - Data Analysis Project Rubric Part 1: Data (2 points) 1 pt for correct reasoning for generalizability – Answer should discuss whether random sampling was used. Learners might discuss any reservations, those should be well justified. 1 pt for correct reasoning for causality – Answer should discuss whether random assignme...

3595 sym