Publications by Joseph Rickert
A simple statnet model of CRAN
by Joseph Rickert In a recent post on creating JavaScript network graphs directly from R, my colleague and fellow blogger, Andrie de Vries, included a link to a saved graph of CRAN. Here, I will use that same graph (network) to build a simple exponential random graph model using functions from the igraph package, and the network and ergm pac...
5097 sym R (887 sym/1 pcs) 4 img
How do you know if your data has signal?
by Nina ZumelData Scientist Win-Vector LLC Image by Liz Sullivan, Creative Commons. Source: Wikimedia An all too common approach to modeling in data science is to throw all possible variables at a modeling procedure and “let the algorithm sort it out.” This is tempting when you are not sure what are the true causes or predictors of the pheno...
1326 sym 2 img
R News From JSM 2015
by Joseph Rickert We can declare 2015 the year that R went mainstream at the JSM. There is no doubt about it, the calculations, visualizations and deep thinking of a great many of the world's statisticians are rendered or expressed in R and the JSM is with the program. In 2013 I was happy to have stumbled into a talk where an FDA statistici...
7371 sym 2 img
Using Azure as an R datasource, Part 4 – Pulling data from SQL Server to Linux
by Gregory VandenbrouckSoftware Engineer, Microsoft This post is the fourth in a series that covers pulling data from Microsoft SQL Server or MySQL/MariaDB on Azure to an R client on Windows or Linux. In the previous posts, we covered pulling data from SQL Server to Windows and from MySQL/MariaDB to both Windows and Linux. This time we’ll be p...
9060 sym R (5751 sym/13 pcs) 4 img
5 New R Packages for Data Scientists
by Joseph Rickert One great beauty of the R ecosystem, and perhaps the primary reason for R’s phenomenal growth, is the system for contributing new packages. This, coupled to the rock solid stability of CRAN, R’s primary package repository, gives R a great advantage. However, anyone with enough technical knowhow to formulate a proper submis...
3941 sym 4 img
Following up on news stories with choroplethr and R
by Ari Lamstein, consultant specializing in software engineering and data analysis and author of the free email course Learn to Map Census Data in R. One of my favorite things about R is that it allows me to follow up on interesting news stories. Consider this interview on EconTalk about the history of fracking in America. Russ Roberts interview...
5374 sym 10 img
Plotting Time Series in R using Yahoo Finance data
by Joseph Rickert I recently rediscovered the Timely Portfolio post on R Financial Time Series Plotting. If you are not familiar with this gem, it is well-worth the time to stop and have a look at it now. Not only does it contain some useful examples of time series plots mixing different combinations of time series packages (ts, zoo, xts) with m...
3054 sym R (1387 sym/3 pcs) 2 img
Looking after Datasets
by Antony UnwinUniversity of Augsburg, Germany David Moore's definition of data: numbers that have been given a context. Here is some context for the finch dataset: Fig 1: Illustrations of the beaks of four of Darwin's finches from “The Voyage of the Beagle”. Note that only one of these (fortis) is included in the dataset. R's package sys...
7599 sym 6 img
How do you know if your model is going to work? Part 1: The Problem
by John Mount (more articles) and Nina Zumel (more articles) of Win-Vector LLC “Essentially, all models are wrong, but some are useful.” George Box Here's a caricature of a data science project: your company or client needs information (usually to make a decision). Your job is to build a model to predict that information. You fit a model, per...
1863 sym 2 img
How do you know if your model is going to work? Part 2: In-training set measures
by John Mount (more articles) and Nina Zumel (more articles) When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it's better than the models that you rejected? In this Part 2 of our four part mini-series “How do you know if your model is going to work?” we devel...
3962 sym 4 img