Publications by Corey Chivers

Open Data Exchange 2013, April 6. Montreal

29.03.2013

UPDATE: The day was great! There are many people doing really amazing things with open data and it was amazing to meet them. Here are my slides from the panel talk. Next Saturday, I’ll be sitting on a panel discussing future avenues for open data at ODX13. From the odx13 site: Odx13 is a mini-conference to discuss the successes and challenge...

2880 sym 4 img

A quick guide to non-transitive Grime Dice

07.04.2013

A very special package that I am rather excited about arrived in the mail recently. The package contained a set of 6-sided dice. These dice, however, don’t have the standard numbers one to six on their faces. Instead, they have assorted numbers between zero and nine. Here’s the exact configuration: red<-c(4,4,4,4,4,9) blue<-c(2,2,2,7,7,7) oli...

3685 sym R (110 sym/1 pcs) 6 img

Mathematical abstraction and the robustness to assumptions

12.04.2013

I’ve been showing my new favourite toys to just about anyone foolish enough to actually engage me in conversation. I described how my shiny new set of non-transitive dice work here, complete with a map showing all the relevant probabilities. All was neat and tidy and wonderful until fellow ecologist, Aaron Ball, tried to burst my bubble. Too sm...

4380 sym 4 img

CAISN

07.05.2013

Reblogged from Zero to R Hero: Canadian Aquatic Invasive Species Networks Annual General Meeting in Kananaskis, Alberta. May 03, 3:25-5:30. This 2-hour workshop will focus on how and why we do numerical simulation in R. Time permitting, we will also look at how to build and fit likelihood based statistical models. We ask that you bring your la...

970 sym 4 img

What is probabilistic truth?

18.05.2013

I am currently working on a validation metric for binary prediction models. That is, models which make predictions about outcomes that can take on either of two possible states (eg Dead/not dead, heads/tails, cat in picture/no cat in picture, etc.) The most commonly used metric for this class of models is AUC, which assesses the relative error ra...

2892 sym 6 img

What is probabilistic truth? Part 2 – Everything is conditional

24.05.2013

Read Part 1 When making a statement of the form “1/2 is the correct probability that this coin will land tails”, there are a few things which are left unsaid, but which are typically implied. The statement is one about the probability of an unknown event occurring, and it would seem reasonable to write this statement using probability notatio...

2650 sym 6 img

How likely is the NSA PRISM program to catch a terrorist?

06.06.2013

Recent revelations about PRISM, the NSA’s massive program of surveillance of civilian communications have caused quite a stir. And rightfully so, as it appears that the agency has been granted warrantless direct access to just about any form of digital communication engaged in by American citizens, and that their access to such data has been gr...

3309 sym 6 img

From Whale Calls to Dark Matter: Competitive Data Science with R and Python

12.07.2013

Back in June I gave a fun talk at Montreal Python on some of my dabbling in the competitive data science scene. The good people at Savior-fair Linux recorded the talk and have edited it all together into a pretty slick video. If you can spare twenty-minutes or so, have a look. If you want the slides, head on over to my speakerdeck page. Related...

752 sym 6 img

Time-series forecasting: Bike Accidents

20.08.2013

About a year ago I posted this video visualization of all the reported accidents involving bicycles in Montreal between 2006 and 2010. In the process I also calculated and plotted the accident rate using a monthly moving average. The results followed a pattern that was for the most part to be expected. The rate shoots up in the spring, and decli...

2022 sym 10 img

Calculating AUC the hard way

10.10.2013

The Area Under the Receiver Operator Curve is a commonly used metric of model performance in machine learning and many other binary classification/prediction problems. The idea is to generate a threshold independent measure of how well a model is able to distinguish between two possible outcomes. Threshold independent here just means that for any...

2695 sym R (642 sym/1 pcs) 6 img