Publications by David Smith

Using R for Map-Reduce applications in Hadoop

04.05.2011

Data Scientist Antonio Piccolboni recently published this comparison of the various language and interfaces available for programming Big Data analysis tasks in the map-reduce framework. The interfaces he reviewed included: Java Hadoop (mature and efficient, but verbose and difficult to program) Cascading (brings an SQL-like flavor to Java prog...

2349 sym

Mapping airline flight networks with R

05.05.2011

Inspired by the Facebook Social Network chart, FlowingData's Nathan Yau also turns to R to create a beautiful chart of the network of all flight connections between major airlines in the US: Like the Facebook chart, the chart reflects the intensity of the connections (here, the number of flights) between pairs of cities. Nathan explains: Bright...

1615 sym 8 img

How to access databases from R

05.05.2011

From his presentation at the Greater Boston useR Group[*], R user Jeffrey Breen has shared some useful slides detailing how to bring data from relational databases like MySQL and Oracle. In fact, data from just about any relational database is accessible from R by sending an SQL query to the standard ODBC or JDBC interfaces. R packages also offer...

1209 sym

Propagation of the news of OBL’s death via Twitter

06.05.2011

SocialFlow's blog has a great case study today on how news from a single tweet — in this case, speculation made an hour before the President's announcement that Osama bin Laden had been killed — can propagate through social networks. At 10:24 p.m. EST on Sunday May 1, Keith Urbahn tweeted: “So I'm told by a reputable person they have kill...

1382 sym 4 img

Registration open for Rmetrics Workshop on Computational Finance

09.05.2011

The Rmetrics Association is once again holding its annual Workshop and Summer School on Computational Finance and Financial Engineering at Meielisalp (on Lake Thune in Switzerland) from June 26-30. Now in its fifth year, the workshop consists of Summer School-like tutorial sessions and a user/developer meeting: Both focus on topics from “Co...

1921 sym

Data Science Toolset discussion at Data Scientist Summit

10.05.2011

Heads-up to anyone attending the sold-out Data Science Summit in Las Vegas this week: I'll be there tomorrow and Thursday for the conference and to discuss R on the panel discussion “Data Science Toolset – Recipes That Win” (more details about the panel discussion below.) I'm looking forward to meeting with the other R users there — tweet...

1791 sym

An essential vocabulary for the R language

11.05.2011

The Oxford English Dictionary includes more than 600,000 words, yet most of us get by in our day-to-day lives with a vocabulary of just a few thousand. In a similar vein, the R language includes thousands of functions: when you start up R 2.13, you have 2832 functions at your disposal: > length(apropos(".", mode="function")) [1] 2382 This inclu...

1659 sym R (48 sym/1 pcs)

The R-Files: Martin Morgan

12.05.2011

“The R-Files” is an occasional series from Revolution Analytics, where we profile prominent members of the R Community. Name: Martin Morgan Profession: Senior Staff Scientist at Fred Hutchinson Cancer Research Center Nationality: Canadian Years Using R: 7 Known for: Director of the Bioconductor project Martin Morgan is a Senior Staff Scien...

2877 sym 4 img 1 tbl

Reflections on Data Science Summit 2011

13.05.2011

The Data Science Summit held in Las Vegas this week was outstanding – kudos and thanks to EMC/Greenplum for organizing the event. The energy of 150+ data scientists coupled with a well-curated agenda of talks created a real sense of being at the cusp of a real revolution in the applications of data analysis. Here are just a few of the highligh...

4169 sym

Get Daily R tips on Twitter

16.05.2011

John D Cook, editor of the always-interesting and eclectic blog The Endeavour, has been posting facts about Statistics and distribution theory to the StatFact Twitter account on a daily basis for over a year now. He also curates a number of other daily tip services and the newest one — RLangTip — promises daily tips about using the R lang...

911 sym