Publications by Mollie
Word Clouds in R
Thanks to the wordcloud package, it’s super easy to make a word cloud or tag cloud in R.In this case, the words have been counted already. If you are starting with plain text, you can use the text mining package tm to obtain the counts. Other bloggers have provided good examples of this. I’ll just be covering the simple case wher...
1526 sym R (675 sym/3 pcs) 8 img 4 tbl
Descriptive Statistics of Groups in R
The sleep data set—provided by the datasets package—shows the effects of two different drugs on ten patients. Extra is the increase in hours of sleep; group is the drug given, 1 or 2; and ID is the patient ID, 1 to 10.I’ll be using this data set to show how to perform descriptive statistics of groups within a data set, when the ...
1518 sym R (759 sym/2 pcs)
Histogram + Density Plot Combo in R
Plotting a histogram using hist from the graphics package is pretty straightforward, but what if you want to view the density plot on top of the histogram? This combination of graphics can help us compare the distributions of groups.Let’s use some of the data included with R in the package datasets. It will help to have two things...
1237 sym R (1094 sym/3 pcs) 6 img
Adding Measures of Central Tendency to Histograms in R
Building on the basic histogram with a density plot, we can add measures of central tendency (in this case, mean and median) and a legend.Like last time, we’ll use the beaver data from the datasets package.hist(beaver1$temp, # histogram col = "peachpuff", # column color border = "black", prob = TRUE, # show densities instead o...
1145 sym R (664 sym/5 pcs) 4 img
Random Name Generator in R
Just for the heck of it, let’s recreate my Reality TV Show Name Generator in R. This isn’t really the sort of thing you’d normally do in R, but we can try out a bunch of different functions this way: random integers/sampling, concatenation, sorting, and determining the length of an object.First, let’s create a dictionary for R...
2338 sym R (3304 sym/8 pcs)
Palettes in R
In its simplest form, a palette in R is simply a vector of colors. This vector can be include the hex triplet or R color names.The default palette can be seen through palette(): > palette("default") # you'll only need this line if you've previously changed the palette from the default > palette() [1] "black" "red" "green3"...
2853 sym R (1150 sym/10 pcs) 16 img 1 tbl
Sorting Within Lattice Graphics in R
DefaultBy default, lattice sorts the observations by the axis values, starting at the bottom left.For example,library(lattice) colors = c("#1B9E77", "#D95F02", "#7570B3") dotplot(rownames(mtcars) ~ mpg, data = mtcars, col = colors[1], pch = 1)produces:Default lattice dotplot(Note: The rownames(cars) bit is just because of how this data set...
1329 sym R (678 sym/5 pcs) 10 img 6 tbl
Mapping GPS Tracks in R
This is an explanation of how I used R to combine all my GPS cycling tracks from my Garmin Forerunner 305.Converting to CSVYou can convert pretty much any GPS data to .csv by using GPSBabel. For importing directly from my Garmin, I used the command:gpsbabel -t -i garmin -f usb: -o unicsv -F out.csv[Note: you’ll probably need to wor...
2105 sym R (486 sym/5 pcs) 2 img 1 tbl
Stacked Bar Charts in R
Reshape Wide to LongLet’s use the Loblolly dataset from the datasets package. These data track the growth of some loblolly pine trees.> Loblolly[1:10,] height age Seed 1 4.51 3 301 15 10.89 5 301 29 28.72 10 301 43 41.74 15 301 57 52.70 20 301 71 60.92 25 301 2 4.55 3 303 16 10.92 5...
1438 sym R (1568 sym/5 pcs) 2 img 1 tbl
Calculating a Gini Coefficients for a Number of Locales at Once in R
The Gini coefficient is a measure of the inequality of a distribution, most commonly used to compare inequality in income or wealth among countries.Let’s first generate some random data to analyze. You can download my random data or use the code below to generate your own. Of course, if you generate your own, your graphs and results...
1691 sym R (716 sym/6 pcs) 2 img 1 tbl