Publications by csgillespie

Paul Murrell – Introduction to Grid graphics (useR! 2011)

15.08.2011

Typically, I’m very bad at taking notes in conference. This time around, I intend to make notes for each some of the talks I attend at this year’s useR! 2011 conference. Below are my notes that I made during this afternoon’s tutorial. Note: these are just notes I made and aren’t meant to be a full introduction to Grid graphics. If you are...

3332 sym 16 img

Brian Ripley – The R Development Process (useR! 2011)

16.08.2011

There are my notes on the User2011 invited talk. Brian Ripley has been a member of R core since 1998 The R Development Process – A insideR’s view R Timeline: JCGS paper submitted in 1995. 1997: CRAN(Mar), Core team(Aug), CVS (Sept) R 1.0.0 Feb 2000 – 2.8MB. Many people don’t take 0.X.X seriously R 2.0.0 Oct 2004, 10MB (actually 1.10.0) ...

4343 sym 16 img

Kaleidoscope Ic (useR! 2011)

16.08.2011

These are my rough notes on the Kaleidoscope Ic session. David Smith – The R Ecosystem (useR! 2011) David Smith works for Revolution Analytics. Quick overview of the R project – useR, r-journal, and r-forge. Social media starting to play a part in R – Google+, twitter, stackoverflow, and the traditional R mailing list. The developer communi...

3875 sym 18 img

High Performance Computing

16.08.2011

Wilem Ligtenberg – GPU computing and R Why GPU computing – theoretical GFLOPs for a GPU is three times greater than a CPU. Use GPUs for same instruction multiple data problems (SIMD). Initially GPUs were developed for texture problems. For example, a wall smashed into lots of pieces. Each core handled a single piece. CUDA and FireStream are b...

3919 sym 16 img

Ulrike Gromping – Design of Experiments in R

16.08.2011

Example: Car seat occupation: Algorithm must decide whether airbag opens: Must open for adult but not for small child or if the seat if empty a few others I missed. Key questions are: What type of design: 32 run regular fractional factorial Response measurement – depends on dummy position, so repeat for 3 different dummy places Precision –...

2625 sym 16 img

Jonathan Rougier – Nomograms for visualising relationships between three variables (useR! 2011)

16.08.2011

Background: Example of Nomogram taken from wikipedia Donkeys in Kenya. Tricky to find the weight of a donkey in the “field” – no pun intended! So using a few measurements,  estimate the weight. Other covariates include age. Standard practice is to fit: for adult donkeys, and other slightly different models for young/old and ill donkeys. W...

1733 sym 10 img

Lee E. Edlefsen – Scalable Data Analysis in R (useR! 2011)

17.08.2011

The RevoScaleR package isn’t open source, but it is free for academic users. Collect and storing data has outpaced our ability to analyze it. Can R cope with this challenge? The RevoScaleR package is part of the revolution R Enterprise. This package provides data management and data analysis. Uses multiple cores and should scale. Scalability Wh...

2601 sym 4 img

Kaleidoscope IIb (useR! 2011)

17.08.2011

L Collingwood – RTextTools RTextTools. A machine learning library for automated text classification. This package builds on previous packages such as tm and random forests. Use case: undergrad labels congressional bills but then quits. Using the previously labelled data, automatically classify the remaining documents. The speaker gave a nice o...

1805 sym 4 img

Programming (useR! 2011)

17.08.2011

Ray Brownrigg – Tips and Tricks for young R programmers Problem: Calculate the distribution function of a bivariate Kolomogorov Smirnoff statistic. Essentially three loops. Basic exhaustive search is O(N^3). Fortran gives a single order of magnitude speed-up. Restructuring in R using a single loop is an order faster than fortran. Further improv...

2242 sym 4 img

Big data (useR! 2011)

18.08.2011

Unfortunatley, I missed the first and last talks. My notes from a session on Thursday morning J. Demmler – Challenges of working with a large database of routinely collected health data The SAIL data bank holds over 1.9 billion (anonymous) entries. To use the data for research, they need to ensure that proper data security is observed. For exa...

1673 sym 4 img