Publications by John Mount
Automatic bias correction doesn’t fix omitted variable bias
Page 94 of Gelman, Carlin, Stern, Dunson, Vehtari, Rubin “Bayesian Data Analysis” 3rd Edition (which we will call BDA3) provides a great example of what happens when common broad frequentist bias criticisms are over-applied to predictions from ordinary linear regression: the predictions appear to fall apart. BDA3 goes on to exhibit what migh...
21589 sym 10 img
Reading the Gauss-Markov theorem
What is the Gauss-Markov theorem? From “The Cambridge Dictionary of Statistics” B. S. Everitt, 2nd Edition: A theorem that proves that if the error terms in a multiple regression have the same variance and are uncorrelated, then the estimators of the parameters in the model produced by least squares estimation are better (in the sense of ha...
19338 sym R (2931 sym/1 pcs) 6 img
Factors are not first-class citizens in R
The primary user-facing data types in the R statistical computing environment behave as vectors. That is: one dimensional arrays of scalar values that have a nice operational algebra. There are additional types (lists, data frames, matrices, environments, and so-on) but the most common data types are vectors. In fact vectors are so common in R...
20196 sym 6 img
Excel spreadsheets are hard to get right
Any practicing data scientist is going to eventually have to work with a data stored in a Microsoft Excel spreadsheet. A lot of analysts use this format, so if you work with others you are going to run into it. We have already written how we Excel-like-formats-to-exchange-data/”>don’t recommend using Excel-line formats to exchange data. Bu...
6074 sym 16 img
A comment on preparing data for classifiers
I have been working through (with some honest appreciation) a recent article comparing many classifiers on many data sets: “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?” Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, Dinani Amorim; 15(Oct):3133−3181, 2014 (which we will call “the DWN paper” in ...
12384 sym 4 img
Is there a Kindle edition of Practical Data Science with R?
We have often been asked “why is there no Kindle edition of Practical Data Science with R on Amazon.com?” The short answer is: there is an edition you can read on your Kindle: but it is from the publisher Manning (not Amazon.com). The long answer is: when Amazon.com supplies a Kindle edition readers have to deal with the following: Amazon....
4132 sym 4 img
R bracket is a bit irregular
While skimming Professor Hadley Wickham’s Advanced R I got to thinking about nature of the square-bracket or extract operator in R. It turns out “[,]” is a bit more irregular than I remembered. The subsetting section of Advanced R has a very good discussion on the subsetting and selection operators found in R. In particular it raises the ...
10150 sym 6 img
Check your return types when modeling in R
Just a warning: double check your return types in R, especially when using different modeling packages. We consider ourselves pretty familiar with R. We have years of experience, many other programming languages to compare R to, and we have taken Hadley Wickham’s Master R Developer Workshop (highly recommended). We already knew R’s predict...
7323 sym 2 img
Announcing: Introduction to Data Science video course
Win-Vector LLC’s Nina Zumel and John Mount are proud to announce their new data science video course Introduction to Data Science is now available on Udemy. We designed the course as an introduction to an advanced topic. The course description is: Use the R Programming Language to execute data science projects and become a data scientist. Imp...
3823 sym 6 img
The Win-Vector R data science value pack
Win-Vector LLC is proud to announce the R data science value pack. 50% off our video course Introduction to Data Science (available at Udemy) and 30% off Practical Data Science with R (from Manning). Pick any combination of video, e-book, and/or print-book you want. Instructions below. Please share and Tweet! For 50% off the video course Intr...
1361 sym 6 img