Publications by John Mount
You don’t need to understand pointers to program using R
R is a statistical analysis package based on writing short scripts or programs (versus being based on GUIs like spreadsheets or directed workflow editors). I say “writing short scripts” because R’s programming language (itself called S) is a bit of an oddity that you really wouldn’t be using except it gives you access to superior analyti...
6842 sym 2 img
Old tails: a crude power law fit on ebook sales
We use R to take a very brief look at the distribution of e-book sales on Amazon.com. Recently Hugh Howey shared some eBook sales data spidered from Amazon.com: The 50k Report. The data is largely a single scrape of statistics about various anonymized books. Howey’s analysis tries to break sales down by declared category and source, but ther...
5174 sym 4 img
A bit of the agenda of Practical Data Science with R
The goal of Zumel/Mount: Practical Data Science with R is to teach, through guided practice, the skills of a data scientist. We define a data scientist as the person who organizes client input, data, infrastructure, statistics, mathematics and machine learning to deploy useful predictive models into production. Our plan to teach is to: Order t...
8141 sym 2 img
A clear picture of power and significance in A/B tests
A/B tests are one of the simplest reliable experimental designs. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. “Practical guide to controlled experiments on the web: listen to your customers not to the HIPPO” Ron Kohavi, Randa...
9381 sym 6 img
R has some sharp corners
R is definitely our first choice go-to analysis system. In our opinion you really shouldn’t use something else until you have an articulated reason (be it a need for larger data scale, different programming language, better data source integration, or something else). The advantages of R are numerous: Single integrated work environment. Powe...
6628 sym 2 img
Save 45% on Practical Data Science with R (expires May 21, 2013)
Please share this generous deal from Manning publications: save 45% on Practical Data Science with R through May 21, 2014. Please tweet, forward and share! Related posts: A bit of the agenda of Practical Data Science with R Data Science, Machine Learning, and Statistics: what is in a name? Data science project planning Related To leave a c...
730 sym 2 img
How does Practical Data Science with R stand out?
There are a lot of good books on statistics, machine learning, analytics, and R. So it is valid to ask: how does Practical Data Science with R stand out? Why should a data scientist or an aspiring data scientist buy it? We admit, it isn’t the only book we own. Some relevant books from the Win-Vector LLC company library include: And a few mo...
6365 sym 6 img
R style tip: prefer functions that return data frames
While following up on Nina Zumel’s excellent Trimming the Fat from glm() Models in R I got to thinking about code style in R. And I realized: you can make your code much prettier by designing more of your functions to return data.frames. That may seem needlessly heavy-weight, but it has a lot of down-stream advantages. The usual mental model...
4558 sym 2 img
R minitip: don’t use data.matrix when you mean model.matrix
A quick R mini-tip: don’t use data.matrix when you mean model.matrix. If you do so you may lose (without noticing) a lot of your model’s explanatory power (due to poor encoding). For some modeling tasks you end up having to prepare a special expanded data matrix before calling a given machine learning algorithm. For example the randomFores...
5364 sym 2 img 2 tbl
Frequentist inference only seems easy
Two of the most common methods of statistical inference are frequentism and Bayesianism (see Bayesian and Frequentist Approaches: Ask the Right Question for some good discussion). In both cases we are attempting to perform reliable inference of unknown quantities from related observations. And in both cases inference is made possible by introdu...
30492 sym R (197 sym/1 pcs) 16 img