Publications by Joey Campbell
R Lab: Clustering
K-Means Clustering The function kmeans() performs K-means clustering in R. We begin with a simple simulated example in which there truly are two clusters in the data: the first 25 observations have a mean shift relative to the next 25 observations. set.seed(2) x=matrix(rnorm (50*2), ncol=2) x[1:25,1]=x[1:25,1]+3 x[1:25,2]=x[1:25,2]-4 We now pe...
10265 sym R (2332 sym/25 pcs) 4 img
Linear Regression R Lab
Libraries The library() function is used to load libraries, or groups of functions and data sets that are not included in the base R distribution. Basic functions that perform least squares linear regression and other simple analyses come standard with the base distribution, but more exotic functions require additional libraries. Here we load the...
31271 sym R (12488 sym/68 pcs) 11 img
Introduction to R Lab
In this lab, we will introduce some simple R commands. The best way to learn a new language is to try out the commands. R can be downloaded from http://cran.r-project.org/. Basic Commands R uses functions to perform operations. To run a function called funcname, we type funcname(input1, input2), where the inputs (or arguments) input1 and input2 ...
33556 sym R (7107 sym/106 pcs) 23 img
Case Study WHO Example Questions
This code is repeated from the Tidy Data Case Study because it is needed by the exercises. library(tidyverse) who1 <- who %>% gather(new_sp_m014:newrel_f65, key = "key", value = "cases", na.rm = TRUE) glimpse(who1) Observations: 76,046 Variables: 6 $ country [3m[38;5;246m<chr>[39m[23m "Afghanistan", "Afghanistan", "Afghanistan", "Afghanist...
11717 sym R (2463 sym/18 pcs) 1 img
Tidy Data Case Study
Let’s pull together everything you’ve learned to tackle a realistic data tidying problem. The tidyr::who dataset contains tuberculosis (TB) cases broken down by year, country, age, gender, and diagnosis method. The data comes from the 2014 World Health Organization Global Tuberculosis Report, available at http://www.who.int/tb/country/data/do...
11067 sym R (1048 sym/11 pcs)
googleVis
R Interface to Google Charts The googleVis package provides an interface between R and the Google’s charts tools. It allows users to create web pages with interactive charts based on R data frames. Charts are displayed locally via the R HTTP help server. A modern browser with Internet connection is required and for some charts a Flash player. T...
2073 sym R (2727 sym/16 pcs) 1 tbl
Plotly
Getting Started with Plotly for R Plotly is a free and open-source graphing library for R. Getting Started with Plotly for R Plotly is a free and open-source graphing library for R. Plotly’s R graphing library makes interactive, publication-quality graphs. Getting Started with Plotly for R Plotly is a free and open-source graphing library ...
4828 sym R (3609 sym/22 pcs) 1 img
SVM with CARET
The Support Vector Machine (or SVM) is a useful classification technique. Support vector machine methods can handle both linear and non-linear class boundaries. It can be used for both two-class and multi-class classification problems. In real life data, the separation boundary is generally nonlinear. Technically, the SVM algorithm perform a non-...
11331 sym R (8128 sym/22 pcs) 1 img
Lab 8 Trees Based Method
Fitting Classification Trees Recursive partitioning is a fundamental tool in data mining. It helps us explore the stucture of a set of data, while developing easy to visualize decision rules for predicting a categorical (classification tree). Classification (as described by Brieman, Freidman, Olshen, and Stone) can be generated through the rpart ...
25498 sym R (20193 sym/44 pcs) 8 img
STA4143_mt_caret
library('tidyverse') library('caret') library('modelr') set.seed(303) Part 2: Regression (40 Points) The table below displays catalog-spending data for the first few of 200 randomly selected individuals from a very large (over 20,000 households) data base.1 The variable of particular interest is catalog spending as measured by the Spending Ra...
21679 sym R (23729 sym/48 pcs) 4 img 2 tbl