Publications by Daniel Moscoe
DATA 607: Tidying and Transforming Data
Introduction Data is tidy if: Each variable occupies one column; Each observation occupies one row; Each cell contains exactly one value. Working with tidy data is advantageous, because tidy data is more amenable to transformation and analysis. Many widely-used statistical packages presume that input data will be more-or-less tidy. Combining a...
4336 sym R (3450 sym/14 pcs) 1 img
DATA607 Wk7: HTML, JSON, XML
Introduction Importing data from a variety of formats is an essential skill in R. In this assignment, I create a simple table on my favorite statistics textbooks, using a variety of formats (HTML, JSON, XML). Then I import these formats into R and store them as dataframes. I used the packages below. library(tidyverse) library(XML) library(xml2)...
1431 sym R (3145 sym/13 pcs)
Lubridate Vignette
Introduction Lubridate contains functions that make it easier to work with dates and times. In this vignette we’ll use Lubridate to assist with the following tasks: Create date/time objects from strings; Create date/time objects from individual components; Use accessors to get/set individual components of a date/time object; Use durations to p...
4385 sym R (5202 sym/34 pcs) 1 img
Using a Web API
Introduction In this assignment, I demonstrate a pair of functions that, together, return a dataframe containing information from a specified New York Times bestseller list. The functions interact with an NYT API to collect data in JSON format. They return a dataframe containing the contents of the bestseller list. Setup The jsonlite package pro...
2056 sym R (1089 sym/5 pcs) 2 tbl
DATA 605 Wk 12 discussion
library(tidyverse) ## -- Attaching packages --------------------------------------- tidyverse 1.3.0 -- ## v ggplot2 3.3.3 v purrr 0.3.4 ## v tibble 3.1.0 v dplyr 1.0.4 ## v tidyr 1.1.2 v stringr 1.4.0 ## v readr 1.4.0 v forcats 0.5.1 ## Warning: package 'tibble' was built under R version 4.0.4 ## -- Conflicts ---------...
3491 sym R (6136 sym/19 pcs) 3 img
DATA 607 Recommender Systems Discussion
Introduction Amazon.com uses recommender systems to customize their store for every user. “It’s as if you walked into a store and the shelves started rearranging themselves, with what you might want moving to the front, and what you’re unlikely to be interested in shuffling further away” (Two Decades of Recommender Systems at Amazon.com)....
7907 sym
DATA 607 Final Project: Churn Analysis
Introduction Churn analysis is a fundamental problem in data science. The investigator obtains information on customer behavior and attributes and uses this information to predict whether the customer will terminate a contract, or not. In this study, I conduct a churn analysis based on simulated cell phone customer data from a Kaggle competition,...
22161 sym R (31786 sym/82 pcs) 16 img
DATA 605 Final Exam
library(GGally) library(MASS) library(modelr) library(tidyverse) library(stats) set.seed(210514) Using R, generate a random variable \(X\) that has \(10,000\) random uniform numbers from \(1\) to \(N\), where \(N\) can be any number of your choosing greater than or equal to \(6\). Then generate a random variable \(Y\) that has \(10,000\) ra...
12558 sym R (40607 sym/86 pcs) 8 img
DATA 624 Proj 1
xlsx_path <- "raw_data.xlsx" raw <- readxl::read_xlsx(xlsx_path) [This file examines variables S05Var03, S06Var05, and S06Var07.] Exploratory Visualization This section contains initial visualizations of S05Var03, S06Var05, and S06Var07. These visualizations provide the basis for initial commentary and suggest a roadmap for the analysis that co...
7860 sym R (11918 sym/48 pcs) 22 img
DATA624_proj1
xlsx_path <- "raw_data.xlsx" raw <- readxl::read_xlsx(xlsx_path) [This file examines variables S05Var03, S06Var05, and S06Var07.] Exploratory Visualization This section contains initial visualizations of S05Var03, S06Var05, and S06Var07. These visualizations provide the basis for initial commentary and suggest a roadmap for the analysis that co...
7843 sym R (11433 sym/48 pcs) 22 img