Publications by Christopher Bare

Using R for Introductory Statistics, Chapter 5, hypergeometric distribution

21.02.2011

This is a little digression from Chapter 5 of Using R for Introductory Statistics that led me to the hypergeometric distribution. Question 5.13 A sample of 100 people is drawn from a population of 600,000. If it is known that 40% of the population has a specific attribute, what is the probability that 35 or fewer in the sample have that attribut...

2892 sym R (608 sym/7 pcs) 10 img

Using R for Introductory Statistics, The Geometric distribution

13.03.2011

We’ve already seen two discrete probability distributions, the binomial and the hypergeometric. The binomial distribution describes the number of successes in a series of independent trials with replacement. The hypergeometric distribution describes the number of successes in a series of independent trials without replacement. Chapter 6 of Usin...

2457 sym R (252 sym/1 pcs) 12 img

Using R for Introductory Statistics 6, Simulations

21.03.2011

R can easily generate random samples from a whole library of probability distributions. We might want to do this to gain insight into the distribution’s shape and properties. A tricky aspect of statistics is that results like the central limit theorem come with caveats, such as “…for sufficiently large n…”. Getting a feel for how large ...

3341 sym R (1613 sym/7 pcs) 16 img

Environments in R

04.06.2011

One interesting thing about R is that you can get down into the insides fairly easily. You’re allowed to see more of how things are put together than in most languages. One of the ways R does this is by having first-class environments. At first glance, environments are simple enough. An environment is just a place to store variables – a set o...

9117 sym R (2674 sym/19 pcs) 4 img

Drawing heatmaps in R

24.06.2011

A while back, while reading chapter 4 of Using R for Introductory Statistics, I fooled around with the mtcars dataset giving mechanical and performance properties of cars from the early 70’s. Let’s plot this data as a hierarchically clustered heatmap. # scale data to mean=0, sd=1 and convert to matrix mtscaled <- as.matrix(scale(mtcars)) # c...

3170 sym R (667 sym/3 pcs) 8 img

Notes on Engineering Data Analysis (with R and ggplot2)

08.07.2011

Hadley Wickham gave a Google Tech Talk a couple weeks back titled Engineering Data Analysis (with R and ggplot2). These are my notes. The data analysis cycle is to iteratively transform, visualize and model. Leading into the cycle is data access and the output of the process is knowledge, insight and understanding which can be communicated to oth...

2949 sym 2 img

MySQL and R

15.08.2011

Using MySQL with R is pretty easy, with RMySQL. Here are a few notes to keep me straight on a few things I always get snagged on. Typically, most folks are going to want to analyze data that’s already in a MySQL database. Being a little bass-ackwards, I often want to go the other way. One reason to do this is to do some analysis in R and make t...

3385 sym R (1917 sym/5 pcs)

String functions in R

25.08.2011

Here’s a quick cheat-sheet on string manipulation functions in R, mostly cribbed from Quick-R’s list of String Functions with a few additional links. substr(x, start=n1, stop=n2) grep(pattern,x, value=FALSE, ignore.case=FALSE, fixed=FALSE) gsub(pattern, replacement, x, ignore.case=FALSE, fixed=FALSE) gregexpr(pattern, text, ignore.case=FA...

1116 sym

Hipster programming languages

26.09.2011

If you look at the programming languages that are popular these days, a few patterns emerge. I’m not talking about languages that have the most hits on the job sites. I’m talking about what the cool kids are coding in – the folks that hang out on hacker-news or at Strange Loop. Languages like Clojure, Scala and CoffeeScript. What do these d...

5469 sym 6 img

International Open Data Hackathon

05.12.2011

This past Saturday, I hung out at the Seattle branch of the International Open Data Hackathon. The event was hosted at the Pioneer Square office of Socrata, a small company that helps governments provide public open data. A pair of data analysts from Tableau were showing off a visualization for the Washington Post’s FactChecker blog called Comp...

2663 sym R (593 sym/3 pcs) 4 img