Publications by John Mount

Separating Code from Presentation in Jupyter Notebooks

30.04.2022

One of the great conveniences of performing a data science style analysis using Jupyter is that Jupyter notebooks are literate containers that combine code, text, results, and graphs. This is also one of the pain points in working with Jupyter notebooks with partners or with source control. That is: Jupyter notebooks are JSON (which rapidly becom...

3921 sym R (249 sym/4 pcs) 2 img

Survive R

29.09.2009

New PDF slides version (presented at the Bay Area R Users Meetup October 13, 2009). We at Win-Vector LLC appear to like R a bit more than some of our, perhaps wiser, colleagues ( see: Choose your weapon: Matlab, R or something else? and R and data ). While we do like R (see: Exciting Technique #1: The “R” language ) we also understand the ne...

4527 sym

R examine objects tutorial

21.11.2009

This article is quick concrete example of how to use the techniques from Survive R to lower the steepness of The R Project for Statistical Computing‘s learning curve (so an apology to all readers who are not interested in R). What follows is for people who already use R and want to achieve more control of the software. I am a fan of the R. T...

7447 sym R (1917 sym/11 pcs) 4 img

CRU graph yet again (with R)

13.12.2009

IowaHawk has a excellent article attempting to reproduce the infamous CRU climate graph using OpenOffice: Fables of the Reconstruction. We thought we would show how to produced similarly bad results using R. If the re-constructed technique is close to what was originally done then so many bad moves were taken that you can’t learn much of any...

4631 sym R (3279 sym/5 pcs) 8 img

R annoyances

20.03.2010

Readers returning to our blog will know that Win-Vector LLC is fairly “pro-R.” You can take that to mean “in favor or R” or “professionally using R” (both statements are true). Some days we really don’t feel that way. Consider the following snippet of R code where we create a list with a single element named “x” that refers ...

5289 sym R (560 sym/8 pcs)

Must Have Software

28.05.2010

Having worked with Unix (BSD, HPUX, IRIX, Linux and OSX), Windows (NT4, 2000, XP, Vista and 7) for quite a while I have seen a lot of different software tools. I would like to quickly exhibit my “must have” list. These are the packages that I find to be the single “must have offerings” in a number of categories. I have avoided some cat...

3519 sym

Learn Logistic Regression (and beyond)

23.11.2010

One of the current best tools in the machine learning toolbox is the 1930s statistical technique called logistic regression. We explain how to add professional quality logistic regression to your analytic repertoire and describe a bit beyond that. A statistical analyst working on data tends to deliberately start simple move cautiously to more co...

16537 sym R (2776 sym/8 pcs) 12 img

The cranky guide to trying R packages

13.02.2011

This is a tutorial on how to try out a new package in R. The summary is: expect errors, search out errors and don’t start with the built in examples or real data. Suppose you want to try out a novel statistical technique? A good fraction of the time R is your best bet for a first trial. Take as an example general additive models (“Generali...

8319 sym R (2769 sym/11 pcs) 8 img

Your Data is Never the Right Shape

31.07.2011

One of the recurring frustrations in data analytics is that your data is never in the right shape. Worst case: you are not aware of this and every step you attempt is more expensive, less reliable and less informative than you would want. Best case: you notice this and have the tools to reshape your data. There is no final “right shape.” ...

12822 sym R (1529 sym/6 pcs) 8 img

Programmers Should Know R

06.08.2011

Programmers should definitely know how to use R. I don’t mean they should switch from their current language to R, but they should think of R as a handy tool during development.Again and again I find myself working with Java code like the following. public class SomeBigProject1 { public static double logStirlingApproximation(final int n) { ...

4409 sym R (1731 sym/9 pcs) 6 img