Publications by csgillespie
Paul Murrell – Introduction to Grid graphics (useR! 2011)
Typically, I’m very bad at taking notes in conference. This time around, I intend to make notes for each some of the talks I attend at this year’s useR! 2011 conference. Below are my notes that I made during this afternoon’s tutorial. Note: these are just notes I made and aren’t meant to be a full introduction to Grid graphics. If you are...
3332 sym 16 img
Brian Ripley – The R Development Process (useR! 2011)
There are my notes on the User2011 invited talk. Brian Ripley has been a member of R core since 1998 The R Development Process – A insideR’s view R Timeline: JCGS paper submitted in 1995. 1997: CRAN(Mar), Core team(Aug), CVS (Sept) R 1.0.0 Feb 2000 – 2.8MB. Many people don’t take 0.X.X seriously R 2.0.0 Oct 2004, 10MB (actually 1.10.0) ...
4343 sym 16 img
Kaleidoscope Ic (useR! 2011)
These are my rough notes on the Kaleidoscope Ic session. David Smith – The R Ecosystem (useR! 2011) David Smith works for Revolution Analytics. Quick overview of the R project – useR, r-journal, and r-forge. Social media starting to play a part in R – Google+, twitter, stackoverflow, and the traditional R mailing list. The developer communi...
3875 sym 18 img
High Performance Computing
Wilem Ligtenberg – GPU computing and R Why GPU computing – theoretical GFLOPs for a GPU is three times greater than a CPU. Use GPUs for same instruction multiple data problems (SIMD). Initially GPUs were developed for texture problems. For example, a wall smashed into lots of pieces. Each core handled a single piece. CUDA and FireStream are b...
3919 sym 16 img
Ulrike Gromping – Design of Experiments in R
Example: Car seat occupation: Algorithm must decide whether airbag opens: Must open for adult but not for small child or if the seat if empty a few others I missed. Key questions are: What type of design: 32 run regular fractional factorial Response measurement – depends on dummy position, so repeat for 3 different dummy places Precision –...
2625 sym 16 img
Jonathan Rougier – Nomograms for visualising relationships between three variables (useR! 2011)
Background: Example of Nomogram taken from wikipedia Donkeys in Kenya. Tricky to find the weight of a donkey in the “field” – no pun intended! So using a few measurements, estimate the weight. Other covariates include age. Standard practice is to fit: for adult donkeys, and other slightly different models for young/old and ill donkeys. W...
1733 sym 10 img
Lee E. Edlefsen – Scalable Data Analysis in R (useR! 2011)
The RevoScaleR package isn’t open source, but it is free for academic users. Collect and storing data has outpaced our ability to analyze it. Can R cope with this challenge? The RevoScaleR package is part of the revolution R Enterprise. This package provides data management and data analysis. Uses multiple cores and should scale. Scalability Wh...
2601 sym 4 img
Kaleidoscope IIb (useR! 2011)
L Collingwood – RTextTools RTextTools. A machine learning library for automated text classification. This package builds on previous packages such as tm and random forests. Use case: undergrad labels congressional bills but then quits. Using the previously labelled data, automatically classify the remaining documents. The speaker gave a nice o...
1805 sym 4 img
Programming (useR! 2011)
Ray Brownrigg – Tips and Tricks for young R programmers Problem: Calculate the distribution function of a bivariate Kolomogorov Smirnoff statistic. Essentially three loops. Basic exhaustive search is O(N^3). Fortran gives a single order of magnitude speed-up. Restructuring in R using a single loop is an order faster than fortran. Further improv...
2242 sym 4 img
Big data (useR! 2011)
Unfortunatley, I missed the first and last talks. My notes from a session on Thursday morning J. Demmler – Challenges of working with a large database of routinely collected health data The SAIL data bank holds over 1.9 billion (anonymous) entries. To use the data for research, they need to ensure that proper data security is observed. For exa...
1673 sym 4 img