Publications by Joseph Rickert
Fun with Simpson’s Paradox: Simulating Confounders
Bob HortonSr Data Scientist, Microsoft Wikipedia describes Simpson’s paradox as “a trend that appears in different groups of data but disappears or reverses when these groups are combined.” Here is the figure from the top of that article (you can click on the image in Wikipedia then follow the “more details” link to find the R code use...
4577 sym R (1970 sym/8 pcs) 6 img 1 tbl
Mapping out Marriott’s Starwood Acquisition
by Michael Helbraun The software business includes travel, and that means hotels. The news that Marriott was acquiring Starwood was of particular interest to me – especially since more than 75% of my 95 nights so far this year on the road have been spent with one of those two companies. While other folks can evaluate if the deal makes sense f...
3947 sym 10 img
R User Group Activity 2015
by Joseph Rickert 2015 has been a good year for R user groups, both in terms of activity and the number of new groups founded. The plot below which runs 12/30/2012 through the week beginning with Monday 11/23/2015 shows that the number of weekly meeting continues to drift up to the right. You can see the seasonal pattern of fewer meetings in the ...
2017 sym 4 img
Exploring Recursive CTEs with sqldf
by Bob HortonSr. Data Scientist at Microsoft Common table expressions (CTEs, or “WITH clauses”) are a syntactic feature in SQL that makes it easier to write and use subqueries. They act as views or temporary tables that are only available during the lifetime of a single query. A more sophisticated feature is the “recursive CTE”, which is...
3384 sym R (3318 sym/13 pcs) 2 img 1 tbl
Feature Selection with caret’s Genetic Algorithm Option
by Joseph Rickert If there is anything that experienced machine learning practitioners are likely to agree on, it would be the importance of careful and thoughtful feature engineering. The judicious selection of which predictor variables to include in a model often has a more beneficial effect on overall classifier performance than the choice of ...
7286 sym R (4682 sym/5 pcs) 4 img
Fun with ddR: Using Distributed Data Structures in R
by Edward Ma and Vishrut Gupta (Hewlett Packard Enterprise) A few weeks ago, we revealed ddR (Distributed Data-structures in R), an exciting new project started by R-Core, Hewlett Packard Enterprise, and others that provides a fresh new set of computational primitives for distributed and parallel computing in R. The package sets the seed for what...
5604 sym 2 img
Wald’s graphical sequential inspection procedure
by John Mount Ph.D.Data Scientist at Win-Vector LLC Our most recent article was a dynamic programming solution to the A/B test problem. Explicitly solving such dynamic programs is a long and tedious process, so you are well served by finding and introducing clever invariants to track (something better than just raw win-rates). This clever idea, ...
6954 sym 6 img
Trade-offs to consider when reading a large dataset into R using the RevoScaleR package
by Seth Mottaghinejad, Data Scientist at Microsoft R and big data There are many R packages dedicated to letting users (or useRs if you prefer) deal with big data in R. (We will intentionally avoid using proper case for 'big data', because (1) the term has been somewhat hackneyed, and (2) for the sake of this article we can think of big data as a...
17064 sym 2 img 1 tbl
Looking forward to 2016
by Joseph Rickert The following map of all of the R user groups listed in Microsoft's Local R User Group Directory is good way to visualize the R world as we rocket into 2016. As a member of the useR!2016 planning committee, foremost in my mind right now is that in just a few months people will be coming to Stanford from all points plotted and al...
3099 sym 2 img
7th Meeting of Spanish R Users. 5-6 November 2015. Salamanca (Spain)
By Virgilio Gómez Rubio, Spanish R Users Organizing Committee As every autumn since 2009, Spanish R users gathered at their annual meeting. It is organised by Spanish R users group ‘Comunidad R-Hispano’and took place in 5-6 November in the historic city of Salamanca. The 7th Meeting of Spanish R Users attracted more than 100 R entusiasts and...
2964 sym 2 img