Publications by John Mount
R annoyances
Readers returning to our blog will know that Win-Vector LLC is fairly “pro-R.” You can take that to mean “in favor or R” or “professionally using R” (both statements are true). Some days we really don’t feel that way. Consider the following snippet of R code where we create a list with a single element named “x” that refers ...
5289 sym R (560 sym/8 pcs)
Must Have Software
Having worked with Unix (BSD, HPUX, IRIX, Linux and OSX), Windows (NT4, 2000, XP, Vista and 7) for quite a while I have seen a lot of different software tools. I would like to quickly exhibit my “must have” list. These are the packages that I find to be the single “must have offerings” in a number of categories. I have avoided some cat...
3519 sym
Learn Logistic Regression (and beyond)
One of the current best tools in the machine learning toolbox is the 1930s statistical technique called logistic regression. We explain how to add professional quality logistic regression to your analytic repertoire and describe a bit beyond that. A statistical analyst working on data tends to deliberately start simple move cautiously to more co...
16537 sym R (2776 sym/8 pcs) 12 img
The cranky guide to trying R packages
This is a tutorial on how to try out a new package in R. The summary is: expect errors, search out errors and don’t start with the built in examples or real data. Suppose you want to try out a novel statistical technique? A good fraction of the time R is your best bet for a first trial. Take as an example general additive models (“Generali...
8319 sym R (2769 sym/11 pcs) 8 img
Your Data is Never the Right Shape
One of the recurring frustrations in data analytics is that your data is never in the right shape. Worst case: you are not aware of this and every step you attempt is more expensive, less reliable and less informative than you would want. Best case: you notice this and have the tools to reshape your data. There is no final “right shape.” ...
12822 sym R (1529 sym/6 pcs) 8 img
Programmers Should Know R
Programmers should definitely know how to use R. I don’t mean they should switch from their current language to R, but they should think of R as a handy tool during development.Again and again I find myself working with Java code like the following. public class SomeBigProject1 { public static double logStirlingApproximation(final int n) { ...
4409 sym R (1731 sym/9 pcs) 6 img
Win-Vector starts submitting content to r-bloggers.com
We have been consistently impressed by and enjoyed the wealth of R wisdom available on the R-bloggers aggregation site. Therefore Win-Vector LLC is granting the right to reformat and redistribute (with attribution and link) our blog‘s R content in the R-bloggers site and feeds. We hope to see our R content shared through this network. Related...
790 sym
Why I don’t like Dynamic Typing
A lot of people consider the static typing found in languages such as C, C++, ML, Java and Scala as needless hairshirtism. They consider the dynamic typing of languages like Lisp, Scheme, Perl, Ruby and Python as a critical advantage (ignoring other features of these languages and other efforts at generic programming such as the STL). I strongly...
9314 sym R (1028 sym/5 pcs)
Modeling Trick: the Signed Pseudo Logarithm
Much of the data that the analyst uses exhibits extraordinary range. For example: incomes, company sizes, popularity of books and any “winner takes all process”; (see: Living in A Lognormal World). Tukey recommended the logarithm as an important “stabilizing transform” (a transform that brings data into a more usable form prior to gener...
7757 sym 4 img
How to remember point shape codes in R
I suspect I am not unique in not being able to remember how to control the point shapes in R. Part of this is a documentation problem: no package ever seems to write the shapes down. All packages just use the “usual set” that derives from S-Plus and was carried through base-graphics, to grid, lattice and ggplot2. The quickest way out of th...
1269 sym 4 img