Publications by John Mount

Some R Guides: tidyverse and data.table Versions

10.10.2018

Saghir Bashir of ilustat recently shared a nice getting started with R and tidyverse guide. In addition they were generous enough to link to Dirk Eddelbuette’s later adaption of the guide to use data.table. This type of cooperation and user choice is what keeps the R community vital. Please encourage it. (Heck, please insist on it!) Related...

739 sym 4 img

Piping into ggplot2

13.10.2018

In our wrapr pipe RJournal article we used piping into ggplot2 layers/geoms/items as an example. Being able to use the same pipe operator for data processing steps and for ggplot2 layering is a question that comes up from time to time (for example: Why can’t ggplot2 use %>%?). In fact the primary ggplot2 package author wishes that magrittr pipi...

5465 sym R (2051 sym/13 pcs) 4 img

Quasiquotation in R via bquote()

16.10.2018

In August of 2003 Thomas Lumley added bquote() to R 1.8.1. This gave R and R users an explicit Lisp-style quasiquotation capability. bquote() and quasiquotation are actually quite powerful. Professor Thomas Lumley should get, and should continue to receive, a lot of credit and thanks for introducing the concept into R. In fact bquote() is already...

7150 sym R (2990 sym/11 pcs)

Designing Transforms for Data Reshaping with cdata

25.10.2018

Authors: John Mount, and Nina Zumel 2018-10-25 As a followup to our previous post, this post goes a bit deeper into reasoning about data transforms using the cdata package. The cdata packages demonstrates the “coordinatized data” theory and includes an implementation of the “fluid data” methodology for general data re-shaping. cdata adher...

5879 sym R (1339 sym/3 pcs) 12 img

Conway’s Game of Life in R: Or On the Importance of Vectorizing Your R Code

28.10.2018

R is an interpreted programming language with vectorized data structures. This means a single R command can ask for very many arithmetic operations to be performed. This also means R computation can be fast. We will show an example of this using Conway’s Game of Life. Conway’s Game of Life is one of the most interesting examples of cellul...

2723 sym R (1392 sym/2 pcs) 4 img

Use Pseudo-Aggregators to Add Safety Checks to Your Data-Wrangling Workflow

30.10.2018

One of the concepts we teach in both Practical Data Science with R and in our theory of data shaping is the importance of identifying the roles of columns in your data. For example, to think in terms of multi-row records it helps to identify: Which columns are keys (together identify rows or records). Which columns are data/payload (are consider...

3890 sym R (2878 sym/28 pcs)

The blocks and rows theory of data shaping

01.11.2018

We have our latest note on the theory of data wrangling up here. It discusses the roles of “block records” and “row records” in the cdata data transform tool. With that and the theory of how to design transforms, we think we have a pretty complete description of the system. Related To leave a comment for the author, please follow the ...

679 sym 2 img

coalesce with wrapr

03.11.2018

coalesce is a classic useful SQL operator that picks the first non-NULL value in a sequence of values. We thought we would share a nice version of it for picking non-NA R with convenient operator infix notation wrapr::coalesce(). Here is a short example of it in action: library("wrapr") NA %?% 0 # [1] 0 A more substantial application is the f...

726 sym R (806 sym/2 pcs)

R tip: Make Your Results Clear with sigr

04.11.2018

R is designed to make working with statistical models fast, succinct, and reliable. For instance building a model is a one-liner: model <- lm(Petal.Length ~ Sepal.Length, data = iris) And producing a detailed diagnostic summary of the model is also a one-liner: summary(model) # Call: # lm(formula = Petal.Length ~ Sepal.Length, data = iris) # #...

1153 sym R (841 sym/4 pcs)

How to de-Bias Standard Deviation Estimates

12.11.2018

This note is about attempting to remove the bias brought in by using sample standard deviation estimates to estimate an unknown true standard deviation of a population. We establish there is a bias, concentrate on why it is not important to remove it for reasonable sized samples, and (despite that) give a very complete bias management solution. ...

8536 sym R (493 sym/3 pcs) 16 img