Publications by Mollie

Word Clouds in R

13.09.2012

Thanks to the wordcloud package, it’s super easy to make a word cloud or tag cloud in R.In this case, the words have been counted already. If you are starting with plain text, you can use the text mining package tm to obtain the counts. Other bloggers have provided good examples of this. I’ll just be covering the simple case wher...

1526 sym R (675 sym/3 pcs) 8 img 4 tbl

Descriptive Statistics of Groups in R

20.09.2012

The sleep data set—provided by the datasets package—shows the effects of two different drugs on ten patients. Extra is the increase in hours of sleep; group is the drug given, 1 or 2; and ID is the patient ID, 1 to 10.I’ll be using this data set to show how to perform descriptive statistics of groups within a data set, when the ...

1518 sym R (759 sym/2 pcs)

Histogram + Density Plot Combo in R

27.09.2012

Plotting a histogram using hist from the graphics package is pretty straightforward, but what if you want to view the density plot on top of the histogram? This combination of graphics can help us compare the distributions of groups.Let’s use some of the data included with R in the package datasets. It will help to have two things...

1237 sym R (1094 sym/3 pcs) 6 img

Adding Measures of Central Tendency to Histograms in R

04.10.2012

Building on the basic histogram with a density plot, we can add measures of central tendency (in this case, mean and median) and a legend.Like last time, we’ll use the beaver data from the datasets package.hist(beaver1$temp, # histogram col = "peachpuff", # column color border = "black", prob = TRUE, # show densities instead o...

1145 sym R (664 sym/5 pcs) 4 img

Random Name Generator in R

11.10.2012

Just for the heck of it, let’s recreate my Reality TV Show Name Generator in R. This isn’t really the sort of thing you’d normally do in R, but we can try out a bunch of different functions this way: random integers/sampling, concatenation, sorting, and determining the length of an object.First, let’s create a dictionary for R...

2338 sym R (3304 sym/8 pcs)

Palettes in R

25.10.2012

In its simplest form, a palette in R is simply a vector of colors. This vector can be include the hex triplet or R color names.The default palette can be seen through palette(): > palette("default") # you'll only need this line if you've previously changed the palette from the default > palette() [1] "black"   "red"     "green3"...

2853 sym R (1150 sym/10 pcs) 16 img 1 tbl

Sorting Within Lattice Graphics in R

29.11.2012

DefaultBy default, lattice sorts the observations by the axis values, starting at the bottom left.For example,library(lattice) colors = c("#1B9E77", "#D95F02", "#7570B3") dotplot(rownames(mtcars) ~ mpg, data = mtcars, col = colors[1], pch = 1)produces:Default lattice dotplot(Note: The rownames(cars) bit is just because of how this data set...

1329 sym R (678 sym/5 pcs) 10 img 6 tbl

Mapping GPS Tracks in R

13.12.2012

This is an explanation of how I used R to combine all my GPS cycling tracks from my Garmin Forerunner 305.Converting to CSVYou can convert pretty much any GPS data to .csv by using GPSBabel. For importing directly from my Garmin, I used the command:gpsbabel -t -i garmin -f usb: -o unicsv -F out.csv[Note: you’ll probably need to wor...

2105 sym R (486 sym/5 pcs) 2 img 1 tbl

Stacked Bar Charts in R

10.01.2013

Reshape Wide to LongLet’s use the Loblolly dataset from the datasets package. These data track the growth of some loblolly pine trees.> Loblolly[1:10,]    height age Seed 1    4.51   3  301 15  10.89   5  301 29  28.72  10  301 43  41.74  15  301 57  52.70  20  301 71  60.92  25  301 2    4.55   3  303 16  10.92   5...

1438 sym R (1568 sym/5 pcs) 2 img 1 tbl

Calculating a Gini Coefficients for a Number of Locales at Once in R

17.01.2013

The Gini coefficient is a measure of the inequality of a distribution, most commonly used to compare inequality in income or wealth among countries.Let’s first generate some random data to analyze. You can download my random data or use the code below to generate your own. Of course, if you generate your own, your graphs and results...

1691 sym R (716 sym/6 pcs) 2 img 1 tbl