Publications by Joseph Rickert

How do you know if your model is going to work? Part 2: In-training set measures

08.09.2015

by John Mount (more articles) and Nina Zumel (more articles) When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it's better than the models that you rejected? In this Part 2 of our four part mini-series “How do you know if your model is going to work?” we devel...

3962 sym 4 img

The New Microsoft Data Science User Group Program

10.09.2015

by Joseph Rickert We are very pleased to announce that Microsoft will not only continue the Revolution Analytics’ tradition of supporting R user groups worldwide, but is expanding the scope of the user group program. The new 2016 Microsoft Data Science User Group Sponsorship Program is open to all user groups that are passionate about open-sour...

2488 sym

Reading Financial Time Series Data with R

17.09.2015

by Joseph Rickert In a recent post focused on plotting time series with the new dygraphs package, I did not show how easy it is to read financial data into R. However, in a thoughtful comment to the post, Achim Zeileis pointed out a number of features built into the basic R time series packages that everyone ought to know. In this post, I will j...

3587 sym R (1849 sym/6 pcs) 6 img

How do you know if your model is going to work? Part 4: Cross-validation techniques

22.09.2015

by John Mount (more articles) and Nina Zumel (more articles). In this article we conclude our four part series on basic model testing. When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it's better than the models that you rejected? In this concluding Part 4 of our...

3237 sym 2 img

The R Consortium Gears Up For Business

24.09.2015

by Joseph Rickert This week, the Infrastructure Steering Committee (ISC) of the R Consortium unanimously elected Hadley Wickham as its chair thereby also giving Hadley a seat on the R Consortium board of directors. Congratulations Hadley!! This is a major step forward towards putting the R Consortium in business. Not only is the ISC the group th...

3932 sym 2 img

Why Big Data? Learning Curves

29.09.2015

by Bob HortonMicrosoft Senior Data Scientist Learning curves are an elaboration of the idea of validating a model on a test set, and have been widely popularized by Andrew Ng’s Machine Learning course on Coursera. Here I present a simple simulation that illustrates this idea. Imagine you use a sample of your data to train a model, then use the...

5578 sym R (1520 sym/7 pcs) 2 img

R User Groups Highlight R Creativity

01.10.2015

by Joseph Rickert I have been a big fan of R user groups since I attended my first meeting. There is just something about the vibe of being around people excited about what they are doing that feels good. From a speaker's perspective, presenting at an R user Group meeting must be the rough equivalent of doing “stand-up” at a club where ...

3428 sym 6 img

Learning R: Index of Online R Courses, October 2015

08.10.2015

by Joseph Rickert Early October: somewhere the leaves are turning brilliant colors, temperatures are cooling down and that back to school feeling is in the air. And for more people than ever before, it is going to seem to be a good time to commit to really learning R. I have some suggestions for R courses below, but first: What does it mean to...

5520 sym 2 img

Using miniCRAN in Azure ML

13.10.2015

by Michele UsuelliMicrosoft Data Scientist Azure Machine Learning Studio is a drag-and-drop tool to deploy data-driven solutions. It contains pre-built items including data preparation tools and Machine Learning algorithms. In addition, it allows to include R and Python custom scripts. In order to build powerful R tools, you might want to use som...

3936 sym 8 img

The 5th Tribe, Support Vector Machines and caret

15.10.2015

by Joseph Rickert In his new book, The Master Algorithm, Pedro Domingos takes on the heroic task of explaining machine learning to a wide audience and classifies machine learning practitioners into 5 tribes*, each with its own fundamental approach to learning problems. To the 5th tribe, the analogizers, Pedro ascribes the Support Vector Machine...

5082 sym R (7135 sym/5 pcs) 2 img