Publications by David Smith
Superheat: supercharged heatmaps for R
The heatmap is a useful graphical tool in any data scientist's arsenal. It's a useful way of representing data that naturally aligns to numeric data in a 2-dimensional grid, where the value of each cell in the grid is represented by a color. It's a natural fit for data that's in a grid already (say, a correlation matrix). But it's also useful for...
2631 sym 6 img
In case you missed it: January 2017 roundup
In case you missed them, here are some articles from January of particular interest to R users. The Data Science Virtual Machine on Azure has been updated with the latest Microsoft R Server, and adds RStudio and JuliaPro. A crowdsourced list of local R user groups and community events, maintained by Colin Gillespie. Resources for searching R pa...
2603 sym
Retail customer analytics with SQL Server R Services
In the hyper-competitive retail industry, intelligence about your customers is key. You need to be able to find the right customers, understand what types of customers you have, and know how to keep the best ones. Three solutions based around R and SQL Server R Services will help you do exactly that. To find the right customers, you need to ...
2828 sym 4 img
ModernDive: A free introduction to statistics and data science with R
If you're thinking about teaching a course on statistics and data science using R, Chester Ismay and Albert Kim have created an online, open-source textbook for just that purpose. ModernDive is a textbook for that instructs students how to: use R to explore and visualize data; use randomization and simulation to build inferential ideas; effect...
1724 sym 2 img
Job trends for R and Python
When we last looked at job trends from indeed.com, job listings for “R statistics” were on the rise but were still around half the volume of listings for “SAS statistics”. Three-and-a-half years later, R has overtaken SAS in job listings for “statistics”. I added Python to the search this time; job listings for “Python statistic...
1257 sym 4 img
Update on R Consortium Projects
On January 31, the R Consortium presented a webinar with updates on various projects that have been funded (thanks to the R Consortium member dues) and are underway. Each project was presented by the project leader, a member of the R community. You can watch the recording of the webinar here, but here's a brief summary of what was covered, groupe...
4043 sym 14 img
A comparison of deep learning packages for R
Oksana Kutina and Stefan Feuerriegel fom University of Freiburg recently published an in-depth comparison of four R packages for deep learning. The packages reviewed were: MXNet: The R interface to the MXNet deep learning library. (The blog post refers to an older name for the package, MXNetR.) darch: An R package for deep architectures and r...
2340 sym
Galaxy classification with deep learning and SQL Server R Services
One of the major “wow!” moments in the keynote where SQL Server 2016 was first introduced was a demo that automated the process classifying images of galaxies in a huge database of astronomical images. The SQL Server Blog has since published a step-by-step tutorial on implementing the galaxy classifier in SQL Server (and the code is also ...
1329 sym 2 img
Performance improvements coming to R 3.4.0
R 3.3.3 (codename: “Another Canoe”) is scheduled for release on March 6. This is the “wrap-up” release of the R 3.3 series, which means it will include minor bug fixes and improvements, but eschew major new features. Major changes are coming though, with the subsequent release of R 3.4.0. While the NEWS file announcing updates in 3.4.0...
4192 sym
Six Articles on using R with SQL Server
Tomaž Kaštrun is developer and data analyst working for the IT group at SPAR (the ubiquitous European chain of convenience stores) in Austria. He blogs regularly about using Microsoft R and SQL Server for data analyis, and recently published a roundup of his articles about R and SQL Server. Follow the link below for the an overview of the arti...
1142 sym