Publications by civilstat

Very gentle resource for speeding up R code

05.03.2015

Nathan Uyttendaele has written a great beginner’s guide to speeding up your R code. Abstract: Most calculations performed by the average R user are unremarkable in the sense that nowadays, any computer can crush the related code in a matter of seconds. But more and more often, heavy calculations are also performed using R, something especially ...

2982 sym 2 img

Small Area Estimation 101: old materials posted

03.04.2015

I never got around to polishing my Small Area Estimation (SAE) “101” tutorial materials that I promised a while ago. So here they are, though still unedited and not as clean / self-explanatory as I’d like. The slides introduce a few variants of the simplest area-level (Fay-Herriot) model, analyzing the same dataset in a few different ways. ...

2367 sym 2 img

Reader Morghulis

07.04.2015

TL;DR: Memento mori. After reading too much Seneca, I’m meditating on death like a statistician, by counting how many of GRRM’s readers did not even survive to see the HBO show (much less the end of the book series). Rough answer: around 40,000. No disrespect meant to Martin, his readers, or their families—it’s just a thought exercise tha...

7584 sym Python (1571 sym/2 pcs) 6 img

DotCity: a game written in R? and other statistical computer games?

28.06.2015

A while back I recommended Nathan Uyttendaele’s beginner’s guide to speeding up R code. I’ve just heard about Nathan’s computer game project, DotCity. It sounds like a statistician’s minimalist take on SimCity, with a special focus on demographic shifts in your population of dots (baby booms, aging, etc.). Furthermore, he’s planning t...

2448 sym 2 img

Two principles approaches to data visualization

09.07.2015

Yesterday I spoke at Stat Bytes, our student-run statistical computing seminar. My goal was to introduce two principled frameworks for thinking about data visualization: human visual perception and the Grammar of Graphics. (We also covered some relevant R packages: RColorBrewer, directlabels, and a gentle intro to ggplot2.) These are not the only...

7252 sym 4 img

“Don’t invert that matrix” – why and how

13.07.2015

The first time I read John Cook’s advice “Don’t invert that matrix,” I wasn’t sure how to follow it. I was familiar with manipulating matrices analytically (with pencil and paper) for statistical derivations, but not with implementation details in software. For reference, here are some simple examples in MATLAB and R, showing what to av...

6266 sym R (2996 sym/3 pcs) 36 img

About to teach Statistical Graphics and Visualization course at CMU

31.08.2015

I’m pretty excited for tomorrow: I’ll begin teaching the Fall 2015 offering of 36-721, Statistical Graphics and Visualization. This is a half-semester course designed primarily for students in our MSP program (Masters in Statistical Practice). A large part of the focus will be on useful principles and frameworks: human visual perception, the ...

3920 sym 2 img

Statistical Graphics and Visualization course materials

28.10.2015

I’ve just finished teaching the Fall 2015 session of 36-721, Statistical Graphics and Visualization. Again, it is a half-semester course designed primarily for students in the MSP program (Masters of Statistical Practice) in the CMU statistics department. I’m pleased that we also had a large number of students from other departments taking th...

3495 sym 4 img

Why bother with magrittr

31.10.2015

I’ve seen R users swooning over the magrittr package for a while now, but I couldn’t make heads or tails of all these scary %>% symbols. Finally I had time for a closer look, and it seems potentially handy indeed. Here’s the idea and a simple toy example. So, it can be confusing and messy to write (and read) functions from the inside out. T...

1901 sym 2 img

Data sanity checks: Data Proofer (and R analogues?)

20.05.2016

I just heard about Data Proofer (h/t Nathan Yau), a test suite of sanity-checks for your CSV dataset. It checks a few basic things you’d really want to know but might forget to check yourself, like whether any rows are exact duplicates, or whether any columns are totally empty. There are things I always forget to check until they cause a bug, l...

1762 sym