Publications by David Smith
Three reasons to learn R today
If you're just getting started with data science, the Sharp Sight Labs blog argues that R is the best data science language to learn today. The blog post gives several detailed reasons, but the main arguments are: R is an extremely popular (arguably the most popular) data progamming language, and ranks highly in several popularity surveys. Lear...
1657 sym
Analyzing emotions in video with R
In the run-up to the election last year, Ben Heubl from The Economist used the Emotion API to chart the emotions portrayed by the candidates during the debates (note: auto-play video in that link). In his walkthrough of the implementation, Ben used Python to process the video files, and R to create the charts from the sentiment scores generated ...
1539 sym 2 img
What can we learn from StackOverflow data?
StackOverflow, the popular Q&A site for programmers, provides useful information to nearly 5 million programmers worldwide with its database of questions and answers — not to mention the additional comments that other programmers provide. (You might be interested in the architecture, based SQL Server 2016, required to deliver the 8.5 billion p...
2203 sym 10 img 1 tbl
The anatomy of a useful chart: NOAA’s flood forecasts
With thanks to NOAA's incredible data gathering and forecasting activities, I've been obsessed with this chart for the past few days: We used to live near the Napa river where this river gage is located, and still have many friends in the area. We were in the area last weekend, when a “pineapple express” weather event brought an atmospheric ...
5099 sym 6 img
In case you missed it: December 2016 roundup
In case you missed them, here are some articles from December of particular interest to R users. Power BI now has a gallery of custom visualizations built with R. Chicago's Department of Public Health uses R to prioritize health inspections at restaurants. A beautiful map of Switzerland municipalities combined with a relief map of the mountains...
2728 sym
Microsoft R Server tips from the Tiger Team
The Microsoft R Server Tiger Team assists customers around the world to implement large-scale analyytic solutions. Along the way, they discover useful tips and best practices, and share them on the Tiger Team blog. Here are a few recent tips from the Tiger Team on using Microsoft R Server: Gather metadata and exlore numeric summaries of large ...
1561 sym
The fivethirtyeight R package
Andrew Flowers, quantitiative editor of FiveThirtyEight.com, announced at last weeks' RStudio conference the availability of a new R package containing data and analyses from some of their data journalism features: the fivethirtyeight package. (Andrew's talk isn't yet online, but you can see him discuss several of these stories in his UseR!2016 p...
1926 sym
Git Gud with Git and R
If you're doing any kind of in-depth programming in the R language (say, creating a report in Rmarkdown, or developing a package) you might want to consider using a version-control system. And if you collaborate with another person (or a team) on the work, it makes things infinitely easier when it comes to coordinating changes. Amongst other bene...
2937 sym 2 img
Diversity in the R Community
In the follow-up to the useR! conference in Stanford last year, the Women in R Task force took the opportunity to survey the 900-or-so participants about their backgrounds, experiences and interests. With 455 responses, the recently-published results provide an interesting snapshot about the R community (or at least that subset able to travel to ...
1775 sym 2 img
Microsoft R Server in the News
Since the release of Microsoft R Server 9 last month, there's been quite a bit of news in the tech press about the capabilities it provides for using R in production environments. Infoworld's article, Microsoft’s R tools bring data science to the masses, takes a look back at Microsoft's vision for R since acquiring Revolution Analytics two year...
1889 sym