Publications by atmathew

Weekly R-Tips: Visualizing Predictions

04.02.2016

Lets say that we estimated a linear regression model on time series data with lagged predictors. The goal is to estimate sales as a function of inventory, search volume, and media spend from two months ago. After using the lm function to perform linear regression, we predict sales using values from two month ago. frmla <- sales ~ inventory + sea...

1427 sym R (563 sym/2 pcs) 8 img

R Programming Notes

17.02.2016

I’ve been on a note taking binge recently. This post covers a variety of topics related to programming in R. The contents were gathered from many sources and structured in such a way that it provided the author with a useful reference guide for understanding a number of useful R functions. DO.CALL The do.call function executes a function call o...

3417 sym R (1448 sym/13 pcs) 4 img

Batch Forecasting in R

29.02.2016

Given a data frame with multiple columns which contain time series data, let’s say that we are interested in executing an automatic forecasting algorithm on a number of columns. Furthermore, we want to train the model on a particular number of observations and assess how well they forecast future values. Based upon those testing procedures, we ...

1829 sym R (1541 sym/3 pcs) 4 img

Introduction to the RMS Package

04.07.2016

The rms package offers a variety of tools to build and evaluate regression models in R. Originally named ‘Design’, the package accompanies the book “Regression Modeling Strategies” by Frank Harrell, which is essential reading for anyone who works in the ‘data science’ space. Over the past year or so, I have transitioned my personal mo...

3708 sym R (1275 sym/5 pcs) 4 img

Statistical Reading Rainbow

16.10.2016

For those of us who received statistical training outside of statistics departments, it often emphasized procedures over principles. This entailed that we learned about various statistical techniques and how to perform analysis in a particular statistical software, but glossed over the mechanisms and mathematical statistics underlying these pract...

7102 sym 4 img

R Programming Notes – Part 2

17.07.2017

In an older post, I discussed a number of functions that are useful for programming in R. I wanted to expand on that topic by covering other functions, packages, and tools that are useful. Over the past year, I have been working as an R programmer and these are some of the new learnings that have become fundamental in my work. IS TRUE and IS FALS...

2963 sym R (1408 sym/5 pcs) 4 img

Data.Table by Example – Part 1

26.09.2017

For many years, I actively avoided the data.table package and preferred to utilize the tools available in either base R or dplyr for data aggregation and exploration. However, over the past year, I have come to realize that this was a mistake. Data tables are incredible and provide R users with a syntatically concise and efficient data structure...

3009 sym R (786 sym/7 pcs) 2 img

Data.Table by Example – Part 2

26.09.2017

In part one, I provided an initial walk through of some nice features that are available within the data.table package. In particular, we saw how to filter data and get a count of rows by the date. dat = fread("rows.csv") names(dat) <- gsub(" ", "_", names(dat)) dat[1:3] Let us now add a few columns to our dataset on reported crimes in the ci...

1835 sym R (1325 sym/6 pcs) 4 img

Data.Table by Example – Part 3

30.09.2017

For this final post, I will cover some advanced topics and discuss how to use data tables within user generated functions. Once again, let’s use the Chicago crime data. dat = fread("rows.csv") names(dat) <- gsub(" ", "_", names(dat)) dat[, c("value1", "value2", "value3") := sample(1:50, nrow(dat), replace=TRUE)] dat[1:3] Let’s start by s...

3044 sym R (1295 sym/6 pcs) 8 img

Packages for Getting Started with Time Series Analysis in R

18.02.2018

A. Motivation During the recent RStudio Conference, an attendee asked the panel about the lack of support provided by the tidyverse in relation to time series data. As someone who has spent the majority of their career on time series problems, this was somewhat surprising because R already has a great suite of tools for visualizing, manipulating,...

5852 sym R (2013 sym/8 pcs)