Publications by John Mount

dplyr in Context

06.05.2017

Introduction Beginning R users often come to the false impression that the popular packages dplyr and tidyr are both all of R and sui generis inventions (in that they might be unprecedented and there might no other reasonable way to get the same effects in R). These packages and their conventions are high-value, but they are results of evolution ...

8051 sym R (3740 sym/22 pcs) 6 img

On indexing operators and composition

18.05.2017

In this article I will discuss array indexing, operators, and composition in depth. If you work through this article you should end up with a very deep understanding of array indexing and the deep interpretation available when we realize indexing is an instance of function composition (or an example of permutation groups or semigroups: some very...

16435 sym R (1457 sym/16 pcs) 6 img

New series: R and big data (concentrating on Spark and sparklyr)

20.05.2017

Win-Vector LLC has recently been teaching how to use R with big data through Spark and sparklyr. We have also been helping clients become productive on R/Spark infrastructure through direct consulting and bespoke training. I thought this would be a good time to talk about the power of working with big-data using R, share some hints, and even ad...

4592 sym 2 img

Managing Spark data handles in R

26.05.2017

When working with big data with R (say, using Spark and sparklyr) we have found it very convenient to keep data handles in a neat list or data_frame. Please read on for our handy hints on keeping your data handles neat. When using R to work over a big data system (such as Spark) much of your work is over “data handles” and not actual data (...

2561 sym R (4261 sym/11 pcs) 2 img

Summarizing big data in R

30.05.2017

Our next “R and big data tip” is: summarizing big data. We always say “if you are not looking at the data, you are not doing science”- and for big data you are very dependent on summaries (as you can’t actually look at everything). Simple question: is there an easy way to summarize big data in R? The answer is: yes, but we suggest you u...

1019 sym R (1505 sym/21 pcs)

In defense of wrapr::let()

01.06.2017

Saw this the other day: In defense of wrapr::let() (originally part of replyr, and still re-exported by that package) I would say: let() was deliberately designed for a single real-world use case: working with data when you don’t know the column names when you are writing the code (i.e., the column names will come later in a variable). We ca...

4061 sym R (267 sym/4 pcs) 2 img

R summary() got better!

04.06.2017

Here is a really nice feature found in the current 3.4.0 version of R: summary() has become a lot more reasonable. summary(15555) # Min. 1st Qu. Median Mean 3rd Qu. Max. # 15555 15555 15555 15555 15555 15555 Please read on for some background. In older versions of R (say R 3.3.1) the above code gave the following undesi...

2837 sym R (584 sym/3 pcs)

There is usually more than one way in R

05.06.2017

Python has a fairly famous design principle (from “PEP 20 — The Zen of Python”): There should be one– and preferably only one –obvious way to do it. Frankly in R (especially once you add many packages) there is usually more than one way. As an example we will talk about the common R functions: str(), head(), and the tibble package‘s...

2021 sym R (4426 sym/5 pcs)

More on safe substitution in R

07.06.2017

Let’s worry a bit about substitution in R. Substitution is very powerful, which means it can be both used and mis-used. However, that does not mean every use is unsafe or a mistake. From Advanced R : substitute: We can confirm the above code performs no substitution: a <- 1 b <- 2 substitute(a + b + z) ## a + b + z And it appears the effect...

2615 sym R (2199 sym/26 pcs) 2 img

Campaign Response Testing no longer published on Udemy

08.06.2017

Our free video course Campaign Response Testing is no longer published on Udemy. It remains available for free on YouTube with all source code available from GitHub. I’ll try to correct bad links as I find them. Please read on for the reasons. Udemy recently unilaterally instituted a new policy on free courses: “When a free course has a ...

2419 sym