Publications by John Mount

R Tip: Force Named Arguments

22.02.2018

R tip: force the use of named arguments when designing function signatures. R’s named function argument binding is a great aid in writing correct programs. It is a good idea, if practical, to force optional arguments to only be usable by name. To do this declare the additional arguments after “...” and enforce that none got lost in the �...

2197 sym R (475 sym/1 pcs)

Is R base::subset() really that bad?

23.02.2018

Is R base::subset() really that bad? Notes discussing subset() often refer to the following text (from help(subset), referred to in examples: 1, 2): Warning This is a convenience function intended for use interactively. For programming it is better to use the standard sub-setting functions like [, and in particular the non-standard evaluation o...

5466 sym R (1835 sym/3 pcs) 2 img

Wanted: cdata Test Pilots

25.02.2018

I need a few volunteers to please “test pilot” the development version of the R package cdata, please. Jacqueline Cochran: at the time of her death, no other pilot held more speed, distance, or altitude records in aviation history than Cochran. Our cdata package has an upcoming new feature called “build_frame()” that allows for the very...

2141 sym R (513 sym/4 pcs) 2 img

R Tip: Use drop = FALSE with data.frames

27.02.2018

Another R tip. Get in the habit of using drop = FALSE when indexing (using [ , ] on) data.frames. Prince Rupert’s drops (img: Wikimedia Commons) In R, single column data.frames are often converted to vectors when manipulated. For example: d <- data.frame(x = seq_len(3)) print(d) #> x #> 1 1 #> 2 2 #> 3 3 # not a data frame! d[order(-d$x)...

2416 sym R (261 sym/3 pcs) 2 img

R Tip: Make Arguments Explicit in magrittr/dplyr Pipelines

01.03.2018

I think this is the R Tip that is going to be the most controversial yet. Its potential pitfalls include: it is a style prescription (which makes it different than and less immediately useful than something of the nature of R Tip: Force Named Arguments), and it is heterodox (this is not how magrittr/dplyr is taught by the original authors, and n...

2362 sym R (207 sym/2 pcs)

Speaking on New Tools for R at Big Data Scale

03.03.2018

I would like to thank LinkedIn for letting me speak with some of their data scientists and analysts. John Mount discussing rquery SQL generation at LinkedIn. If you have a group using R at database or Spark scale, please reach out ( jmount at win-vector.com ). We (Win-Vector LLC) have some great new tools I’d love to speak on and share. I’...

961 sym 2 img

R Tip: Get Out of the Habit of Calling View() Directly

04.03.2018

R tip: get out of the habit of calling View() directly. View() only works correctly in interactive environments, not currently in RMarkdown contexts. It is better to call something else that safely dispatches to View(), or to something else depending if you are in an interactive or non-interactive session. The following code will work interactive...

976 sym R (792 sym/1 pcs)

R Tip: Use vector(mode = “list”) to Pre-Allocate Lists

06.03.2018

Another R tip. Use vector(mode = "list") to pre-allocate lists. result <- vector(mode = "list", 3) print(result) #> [[1]] #> NULL #> #> [[2]] #> NULL #> #> [[3]] #> NULL The above used to be critical for writing performant R code (R seems to have greatly improved incremental list growth over the years). It remains a convenient thing to know....

1422 sym R (227 sym/2 pcs)

R Tip: Introduce Indices to Avoid for() Class Loss Issues

08.03.2018

Here is an R tip. Use loop indices to avoid for()-loops damaging classes. Below is an R annoyance that occurs again and again: vectors lose class attributes when you iterate over them in a for()-loop. d <- c(Sys.time(), Sys.time()) print(d) #> [1] "2018-02-18 10:16:16 PST" "2018-02-18 10:16:16 PST" for(di in d) { print(di) } #> [1] 1518977777...

1829 sym R (287 sym/2 pcs)

R Tip: Use the vtreat Package For Data Preparation

11.03.2018

If you are working with predictive modeling or machine learning in R this is the R tip that is going to save you the most time and deliver the biggest improvement in your results. R Tip: Use the vtreat package for data preparation in predictive analytics and machine learning projects. When attempting predictive modeling with real-world data you ...

2608 sym 6 img