Publications by David Smith
Using R for Map-Reduce applications in Hadoop
Data Scientist Antonio Piccolboni recently published this comparison of the various language and interfaces available for programming Big Data analysis tasks in the map-reduce framework. The interfaces he reviewed included: Java Hadoop (mature and efficient, but verbose and difficult to program) Cascading (brings an SQL-like flavor to Java prog...
2349 sym
Mapping airline flight networks with R
Inspired by the Facebook Social Network chart, FlowingData's Nathan Yau also turns to R to create a beautiful chart of the network of all flight connections between major airlines in the US: Like the Facebook chart, the chart reflects the intensity of the connections (here, the number of flights) between pairs of cities. Nathan explains: Bright...
1615 sym 8 img
How to access databases from R
From his presentation at the Greater Boston useR Group[*], R user Jeffrey Breen has shared some useful slides detailing how to bring data from relational databases like MySQL and Oracle. In fact, data from just about any relational database is accessible from R by sending an SQL query to the standard ODBC or JDBC interfaces. R packages also offer...
1209 sym
Propagation of the news of OBL’s death via Twitter
SocialFlow's blog has a great case study today on how news from a single tweet — in this case, speculation made an hour before the President's announcement that Osama bin Laden had been killed — can propagate through social networks. At 10:24 p.m. EST on Sunday May 1, Keith Urbahn tweeted: “So I'm told by a reputable person they have kill...
1382 sym 4 img
Registration open for Rmetrics Workshop on Computational Finance
The Rmetrics Association is once again holding its annual Workshop and Summer School on Computational Finance and Financial Engineering at Meielisalp (on Lake Thune in Switzerland) from June 26-30. Now in its fifth year, the workshop consists of Summer School-like tutorial sessions and a user/developer meeting: Both focus on topics from “Co...
1921 sym
Data Science Toolset discussion at Data Scientist Summit
Heads-up to anyone attending the sold-out Data Science Summit in Las Vegas this week: I'll be there tomorrow and Thursday for the conference and to discuss R on the panel discussion “Data Science Toolset – Recipes That Win” (more details about the panel discussion below.) I'm looking forward to meeting with the other R users there — tweet...
1791 sym
An essential vocabulary for the R language
The Oxford English Dictionary includes more than 600,000 words, yet most of us get by in our day-to-day lives with a vocabulary of just a few thousand. In a similar vein, the R language includes thousands of functions: when you start up R 2.13, you have 2832 functions at your disposal: > length(apropos(".", mode="function")) [1] 2382 This inclu...
1659 sym R (48 sym/1 pcs)
The R-Files: Martin Morgan
“The R-Files” is an occasional series from Revolution Analytics, where we profile prominent members of the R Community. Name: Martin Morgan Profession: Senior Staff Scientist at Fred Hutchinson Cancer Research Center Nationality: Canadian Years Using R: 7 Known for: Director of the Bioconductor project Martin Morgan is a Senior Staff Scien...
2877 sym 4 img 1 tbl
Reflections on Data Science Summit 2011
The Data Science Summit held in Las Vegas this week was outstanding – kudos and thanks to EMC/Greenplum for organizing the event. The energy of 150+ data scientists coupled with a well-curated agenda of talks created a real sense of being at the cusp of a real revolution in the applications of data analysis. Here are just a few of the highligh...
4169 sym
Get Daily R tips on Twitter
John D Cook, editor of the always-interesting and eclectic blog The Endeavour, has been posting facts about Statistics and distribution theory to the StatFact Twitter account on a daily basis for over a year now. He also curates a number of other daily tip services and the newest one — RLangTip — promises daily tips about using the R lang...
911 sym