Publications by Ken Wood

Data Science Capstone in R - Week 3 Quiz

18.10.2020

rm(list = ls()) library(quanteda) ## Package version: 2.1.2 ## Parallel computing: 2 of 4 threads used. ## See https://quanteda.io for tutorials and examples. ## ## Attaching package: 'quanteda' ## The following object is masked from 'package:utils': ## ## View library(data.table) library(dplyr) ## ## Attaching package: 'dplyr' ## The foll...

2162 sym R (18436 sym/46 pcs)

Katz Backoff Prediction Model - Week 3

17.10.2020

Applying the Katz Backoff Algorithm: As noted earlier, a corpus is a body of text from which we build and test LMs. rm(list = ls()) library(quanteda) ## Package version: 2.1.2 ## Parallel computing: 2 of 4 threads used. ## See https://quanteda.io for tutorials and examples. ## ## Attaching package: 'quanteda' ## The following object is masked fr...

5764 sym R (19407 sym/54 pcs)

Katz Backoff Example

12.10.2020

Example of Applying the Algorithm: The Little Corpus That Could As noted earlier, a corpus is a body of text from which we build and test LMs. To illustrate how the mathematical formulation of the KBO Trigram model works, it’s helpful to look at a simple corpus that is small enough to easily keep track of the n-gram counts, but large enough to ...

6585 sym R (18310 sym/51 pcs)

R Programming - Week 4 Assignment

06.10.2020

rm(list=ls(all=TRUE)) best <- function(state, outcome) { ## Read outcome data df <- read.csv("outcome-of-care-measures.csv", na.strings="Not Available",stringsAsFactors=FALSE, colClasses = "character") ## Check that state and outcome arguments are valid if ((state %in% df$State == TRUE) && (outcome %in% c("heart attack","heart failure",...

5 sym R (9681 sym/28 pcs)

Exploratory Data Analysis in R - Week 4 Assignment

06.10.2020

# Load the raw data files. # These lines of code will take a little time to execute, so please be patient! NEI <- readRDS("exdata-data-NEI_data/summarySCC_PM25.rds") SCC <- readRDS("exdata-data-NEI_data/Source_Classification_Code.rds") merged_df <- merge(NEI,SCC,by="SCC") Questions We will address the following questions and tasks in our explor...

1393 sym R (3271 sym/11 pcs) 6 img

R Programming - Week 3 Assignment

06.10.2020

For large square matrices, it may take too long to compute the inverse, especially if it has to be computed repeatedly (e.g. in a loop). If the contents of the matrix do not change, it may make sense to cache the matrix inverse so that, when we need it again, it can be looked up in the cache rather than recomputed. In this Programming Assignment...

1449 sym R (1222 sym/21 pcs)

Getting and Cleaning Data in R - Code Book Project

06.10.2020

Fitbit devices use a 3-axis accelerometer to determine body movement and actions. This sensor also allows a device to determine the frequency, duration, intensity, and patterns of a person’s movement. Below is a table of tokens used in the dataset feature names and a brief description of each token. Token Description Body Signal based on the ...

1375 sym

Getting and Cleaning Data in R - Week 2 Quiz

06.10.2020

library(httr) library(jsonlite) # Question #1 # A. Find OAuth settings for github: # http://developer.github.com/v3/oauth/ oauth_endpoints("github") ## <oauth_endpoint> ## authorize: https://github.com/login/oauth/authorize ## access: https://github.com/login/oauth/access_token # B. To make your own application, register at # https:/...

5 sym R (3553 sym/23 pcs)

Getting and Cleaning Data in R - Week 3 Quiz

06.10.2020

# Question #1 - Create a logical vector that identifies the households on greater than 10 acres who sold more than $10,000 worth of agriculture products. Assign that logical vector to the variable agricultureLogical. Apply the which() function like this to identify the rows of the data frame where the logical vector is TRUE. which(agricultureL...

5 sym R (4362 sym/26 pcs)

Developing Data Products in R - Week 3 Project

05.10.2020

9/30/2020 Histogram of Housing Prices Box Plot of Saleprice vs. Home Condition Scatter Plot of Saleprice vs. Lot Area ...

135 sym