Publications by Joseph Rickert

Using Azure as an R data source, Part 1

12.05.2015

by Gregory VandenbrouckSoftware Engineer at Microsoft This post is the first in a series that covers pulling data from various Windows Azure hosted storage solutions (such as MySQL, or Microsoft SQL Server) to an R client on Windows or Linux. We’ll start with a relatively simple case of pulling data from SQL Azure to an R client on Windows. C...

4713 sym R (1846 sym/5 pcs) 4 img

A first look at htmlwidgets

14.05.2015

by Joseph Rickert A strong case can be made that base R graphics supplemented with either the lattice library or ggplot2 for plotting by subgroups provides everything a statistician might need for both exploratory data analysis and for developing clear, crisp for communicating results. However, it is abundantly clear that web based graphics, driv...

3775 sym R (970 sym/2 pcs)

Fast parallel computing with Intel Phi coprocessors

19.05.2015

by Andrew EkstromRecovering physicist, applied mathematician and graduate student in applied Stats and systems engineering We know that R is a great system for performing statistical analysis. The price is quite nice too 😉 . As a graduate student, I need a cheap replacement for Matlab and/or Maple. Well, R can do that too. I’m running a larg...

10296 sym 2 img

First Day Highlights from the Extremely Large Databases Conference

21.05.2015

by Joseph Rickert The 8th XLDB (Extremely Large Databases) Conference open at Stanford on Tuesday with an outstanding program. This conference has been providing leadership in the “Big Data” world since its first workshop which was held in 2007. For example, the summary report for that year notes: “Both communities (industry and scie...

4243 sym 4 img

Situational Baseball: Analyzing Runs Potential Statistics

26.05.2015

By Mark Malter A few weeks ago, I wrote about my Baseball Stats R shiny application, where I demonstrated how to calculate runs expectancies based on the 24 possible bases/outs states for any plate appearance.  In this article, I’ll explain how I expanded on that to calculate the probability of winning the game, based on the current score/inni...

3481 sym

RevoScaleR’s Naive Bayes Classifier rxNaiveBayes()

28.05.2015

by Joseph Rickert, Because of its simplicity and good performance over a wide spectrum of classification problems the Naïve Bayes classifier ought to be on everyone's short list of machine learning algorithms. Now, with version 7.4 we have a high performance Naïve Bayes classifier in Revolution R Enterprise too. Like all Parallel External Mem...

4148 sym R (6255 sym/9 pcs) 2 img

Using Azure as an R datasource: Part 2 – Pulling data from MySQL/MariaDB

02.06.2015

by Gregory VandenbrouckSoftware Engineer, Mirosoft This post is the second in a series that covers pulling data from various Windows Azure hosted storage solutions (such as MySQL, or Microsoft SQL Server) to an R client on Windows or Linux. Last time we covered pulling data from SQL Azure to an R client on Windows. This time we’ll be pulling d...

4921 sym R (1700 sym/6 pcs) 4 img

Some Impressions from R Finance 2015

04.06.2015

by Joseph Rickert The R/Finance 2015 Conference wrapped up last Saturday at UIC. It has been seven years already, but R/Finance still has the magic! – mostly very high quality presentations and the opportunity to interact and talk shop with some of the most accomplished R developers, financial modelers and even a few industry legends such a...

5327 sym 2 img 1 tbl

R User Groups are Everywhere

11.06.2015

by Joseph Rickert In a little over three weeks useR! 2015 will convene in Aalborg, Denmark and I am looking forward to being there and learning and talking about R user groups. The following map shows the big picture for R User Groups around the world. However, it is very difficult to keep it up to date. Just after the map “went to press” I ...

2038 sym R (2226 sym/2 pcs) 2 img

Pairwise-complete correlation considered dangerous

16.06.2015

by B. W. Lewis This note warns about potentially misleading results when using the use=pairwise.complete.obs and related options in R’s cor and cov functions. Pitfalls are illustrated using a very simple pathological example followed by a brief list of alternative ways to deal with missing data and some references about them. Known unknowns R ...

5908 sym R (1194 sym/6 pcs)