Publications by Daniel Moscoe

DATA 607: Tidying and Transforming Data

06.03.2021

Introduction Data is tidy if: Each variable occupies one column; Each observation occupies one row; Each cell contains exactly one value. Working with tidy data is advantageous, because tidy data is more amenable to transformation and analysis. Many widely-used statistical packages presume that input data will be more-or-less tidy. Combining a...

4336 sym R (3450 sym/14 pcs) 1 img

DATA607 Wk7: HTML, JSON, XML

19.03.2021

Introduction Importing data from a variety of formats is an essential skill in R. In this assignment, I create a simple table on my favorite statistics textbooks, using a variety of formats (HTML, JSON, XML). Then I import these formats into R and store them as dataframes. I used the packages below. library(tidyverse) library(XML) library(xml2)...

1431 sym R (3145 sym/13 pcs)

Lubridate Vignette

07.04.2021

Introduction Lubridate contains functions that make it easier to work with dates and times. In this vignette we’ll use Lubridate to assist with the following tasks: Create date/time objects from strings; Create date/time objects from individual components; Use accessors to get/set individual components of a date/time object; Use durations to p...

4385 sym R (5202 sym/34 pcs) 1 img

Using a Web API

10.04.2021

Introduction In this assignment, I demonstrate a pair of functions that, together, return a dataframe containing information from a specified New York Times bestseller list. The functions interact with an NYT API to collect data in JSON format. They return a dataframe containing the contents of the bestseller list. Setup The jsonlite package pro...

2056 sym R (1089 sym/5 pcs) 2 tbl

DATA 605 Wk 12 discussion

20.04.2021

library(tidyverse) ## -- Attaching packages --------------------------------------- tidyverse 1.3.0 -- ## v ggplot2 3.3.3 v purrr 0.3.4 ## v tibble 3.1.0 v dplyr 1.0.4 ## v tidyr 1.1.2 v stringr 1.4.0 ## v readr 1.4.0 v forcats 0.5.1 ## Warning: package 'tibble' was built under R version 4.0.4 ## -- Conflicts ---------...

3491 sym R (6136 sym/19 pcs) 3 img

DATA 607 Recommender Systems Discussion

20.04.2021

Introduction Amazon.com uses recommender systems to customize their store for every user. “It’s as if you walked into a store and the shelves started rearranging themselves, with what you might want moving to the front, and what you’re unlikely to be interested in shuffling further away” (Two Decades of Recommender Systems at Amazon.com)....

7907 sym

DATA 607 Final Project: Churn Analysis

11.05.2021

Introduction Churn analysis is a fundamental problem in data science. The investigator obtains information on customer behavior and attributes and uses this information to predict whether the customer will terminate a contract, or not. In this study, I conduct a churn analysis based on simulated cell phone customer data from a Kaggle competition,...

22161 sym R (31786 sym/82 pcs) 16 img

DATA 605 Final Exam

16.05.2021

library(GGally) library(MASS) library(modelr) library(tidyverse) library(stats) set.seed(210514) Using R, generate a random variable \(X\) that has \(10,000\) random uniform numbers from \(1\) to \(N\), where \(N\) can be any number of your choosing greater than or equal to \(6\). Then generate a random variable \(Y\) that has \(10,000\) ra...

12558 sym R (40607 sym/86 pcs) 8 img

DATA 624 Proj 1

27.06.2021

xlsx_path <- "raw_data.xlsx" raw <- readxl::read_xlsx(xlsx_path) [This file examines variables S05Var03, S06Var05, and S06Var07.] Exploratory Visualization This section contains initial visualizations of S05Var03, S06Var05, and S06Var07. These visualizations provide the basis for initial commentary and suggest a roadmap for the analysis that co...

7860 sym R (11918 sym/48 pcs) 22 img

DATA624_proj1

25.06.2021

xlsx_path <- "raw_data.xlsx" raw <- readxl::read_xlsx(xlsx_path) [This file examines variables S05Var03, S06Var05, and S06Var07.] Exploratory Visualization This section contains initial visualizations of S05Var03, S06Var05, and S06Var07. These visualizations provide the basis for initial commentary and suggest a roadmap for the analysis that co...

7843 sym R (11433 sym/48 pcs) 22 img