Publications by Joseph Rickert
Plotting Time Series in R using Yahoo Finance data
by Joseph Rickert I recently rediscovered the Timely Portfolio post on R Financial Time Series Plotting. If you are not familiar with this gem, it is well-worth the time to stop and have a look at it now. Not only does it contain some useful examples of time series plots mixing different combinations of time series packages (ts, zoo, xts) with m...
3054 sym R (1387 sym/3 pcs) 2 img
Looking after Datasets
by Antony UnwinUniversity of Augsburg, Germany David Moore's definition of data: numbers that have been given a context. Here is some context for the finch dataset: Fig 1: Illustrations of the beaks of four of Darwin's finches from “The Voyage of the Beagle”. Note that only one of these (fortis) is included in the dataset. R's package sys...
7599 sym 6 img
How do you know if your model is going to work? Part 1: The Problem
by John Mount (more articles) and Nina Zumel (more articles) of Win-Vector LLC “Essentially, all models are wrong, but some are useful.” George Box Here's a caricature of a data science project: your company or client needs information (usually to make a decision). Your job is to build a model to predict that information. You fit a model, per...
1863 sym 2 img
How do you know if your model is going to work? Part 2: In-training set measures
by John Mount (more articles) and Nina Zumel (more articles) When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it's better than the models that you rejected? In this Part 2 of our four part mini-series “How do you know if your model is going to work?” we devel...
3962 sym 4 img
The New Microsoft Data Science User Group Program
by Joseph Rickert We are very pleased to announce that Microsoft will not only continue the Revolution Analytics’ tradition of supporting R user groups worldwide, but is expanding the scope of the user group program. The new 2016 Microsoft Data Science User Group Sponsorship Program is open to all user groups that are passionate about open-sour...
2488 sym
Reading Financial Time Series Data with R
by Joseph Rickert In a recent post focused on plotting time series with the new dygraphs package, I did not show how easy it is to read financial data into R. However, in a thoughtful comment to the post, Achim Zeileis pointed out a number of features built into the basic R time series packages that everyone ought to know. In this post, I will j...
3587 sym R (1849 sym/6 pcs) 6 img
How do you know if your model is going to work? Part 4: Cross-validation techniques
by John Mount (more articles) and Nina Zumel (more articles). In this article we conclude our four part series on basic model testing. When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it's better than the models that you rejected? In this concluding Part 4 of our...
3237 sym 2 img
The R Consortium Gears Up For Business
by Joseph Rickert This week, the Infrastructure Steering Committee (ISC) of the R Consortium unanimously elected Hadley Wickham as its chair thereby also giving Hadley a seat on the R Consortium board of directors. Congratulations Hadley!! This is a major step forward towards putting the R Consortium in business. Not only is the ISC the group th...
3932 sym 2 img
Why Big Data? Learning Curves
by Bob HortonMicrosoft Senior Data Scientist Learning curves are an elaboration of the idea of validating a model on a test set, and have been widely popularized by Andrew Ng’s Machine Learning course on Coursera. Here I present a simple simulation that illustrates this idea. Imagine you use a sample of your data to train a model, then use the...
5578 sym R (1520 sym/7 pcs) 2 img
R User Groups Highlight R Creativity
by Joseph Rickert I have been a big fan of R user groups since I attended my first meeting. There is just something about the vibe of being around people excited about what they are doing that feels good. From a speaker's perspective, presenting at an R user Group meeting must be the rough equivalent of doing “stand-up” at a club where ...
3428 sym 6 img