Publications by ALT

Mickey Mouse Models

18.04.2011

My statistics professor once drew a little Markov chain on the board and called it “just a Mickey Mouse model,” because it was too simple to represent anything serious. Related To leave a comment for the author, please follow the link and comment on their blog: mickeymousemodels. R-bloggers.com offers daily e-mail updates abou...

580 sym 2 img

Flu Trends

18.04.2011

Not a model, but certainly Mickey Mousey: here’s some R code that plots Google’s US flu data:df <- read.csv(url("http://www.google.org/flutrends/us/data.txt"), skip=11) df$Date <- as.Date(df$Date) dev.new(height=8, width=12) # Leave a thin outer margin par(oma=c(0.5, 0.5, 0.5, 0.5)) # Plot data; suppress x-axis plot(df$Date,...

1541 sym 4 img

Logistic Regression & Factors in R

24.04.2011

Factors are R’s enumerated type. Suppose you define the variable cities — a vector of strings — whose possible values are “New York,” “Paris,” “London” and “Beijing.” Instead of representing each city as a string of characters, you might prefer to define an encoding, eg {1=”New York”, 2=”Paris”, 3=”Lo...

3702 sym 6 img

Of Height and Speed in Tennis, or Fuzziness and Techiness in College

24.04.2011

I thought of this after reading this post and perhaps also this one, one the Cheap Talk blog. Here’s the puzzle: in general, being tall does not make you slow; but among professional tennis players, the tall athletes do tend to be relatively sluggish. Why does this happen? Cheap Talk gives a perfectly good written explanation,...

4152 sym 6 img

A Tiny Model of Evolution

25.04.2011

I’ve always wanted to write a(n overly) simple model of evolution. The assumptions are minimalistic: only one species, for which each individual’s genotype is represented as a one-dimensional real number, e.g. 7.4. Now, the fun stuff: I define a function mapping genotype to probability of reproduction, like this:You might wond...

6597 sym 4 img

Schelling’s Neighborhood Model

30.04.2011

The New York Times has created a beautiful visualization of the Census Bureau’s 2005-2009 American Community Survey data. The distribution of racial and ethnic groups in New York City is particularly fascinating:Chinatown appears in red toward the south-eastern end of Manhattan; Harlem, above Central Park, is solidly blue; nearby, ...

2792 sym R (3909 sym/1 pcs) 8 img

A Little R Counter

11.06.2011

I recently read a great post about environments in R, which featured this little bit of code:> createCounter <- function(value) { function(i) { value <<- value+i} } > counter <- createCounter(0) > counter(1) > a <- counter(0) > a [1] 1 > counter(1) > counter(1) > a <- counter(1) > a [1] 4 > a <- counter(5) > a [1] 9I found this partic...

713 sym R (495 sym/3 pcs) 2 img

On Crows

12.06.2011

Today I made the mistake of clicking on the “Next Blog” button, which took me to a rather inane post complaining that crows are (obviously) stupid (because they are sometimes hit by cars). I was reminded that crows are actually quite smart. Related To leave a comment for the author, please follow the link and comment on their b...

653 sym 2 img

Dependence and Correlation

13.06.2011

In everyday life I hear the word “correlation” thrown around far more often than “dependence.” What’s the difference? Correlation, in its most common form, is a measure of linear dependence; the catch is that not all dependencies are linear. The set of correlated random variables lies entirely within of the larger set of ...

1519 sym R (1392 sym/6 pcs) 14 img

A Little Sampling Puzzle

18.06.2011

Suppose you have 10 objects from which you take a sample of size 20 (with replacement, or you’re in trouble). What’s the probability that each object was chosen at least once? Getting an answer via simulation is pleasantly easy:f <- function(n=10, k=20) { x <- 1:n x.sample <- sample(x, size=k, replace=TRUE) return(length(u...

812 sym R (313 sym/2 pcs) 2 img