Publications by Ralph
Useful functions for data frames in R
This post will consider some useful functions for dealing with data frames during data processing and validation. Consider an artifical data set create using the expand.grid function where there are duplicate rows in the data frame. > des = expand.grid(A = c(2,2,3,4), B = c(1,3,5,5,7)) > des A B 1 2 1 2 2 1 3 3 1 4 4 1 5 2 3 6 2 3 7 3 3...
1210 sym R (916 sym/4 pcs) 4 tbl
Melt
There are many situations where data is presented in a format that is not ready to dive straight to exploratory data analysis or to use a desired statistical method. The reshape2 package for R provides useful functionality to avoid having to hack data around in a spreadsheet prior to import into R. The melt function takes data in wide format and ...
1741 sym R (1956 sym/4 pcs) 4 tbl
Theme Elements in ggplot2
This website provides a simple summary of the theme elements that can be set within ggplot2. There should be sufficient information here to change the default settings for graphs within the ggplot2 package. Related To leave a comment for the author, please follow the link and comment on their blog: Software for Exploratory Data Analysis and St...
669 sym
Split strings based on a character in the string
R has various facilities for string manipulation including the strsplit function to divide a string into substrings based on matching to another string. A simple example is shown below > strsplit("<td class=\"objectName\"><a href=\"/path/test.html\" target=\"\" title=\"An Object\" class=\"myObject\">Stuff</a></td>", "<") [[1]] [1] "" [2] "td cl...
803 sym R (296 sym/1 pcs) 1 tbl
Seasonal Trend Decomposition in R
The Seasonal Trend Decomposition using Loess (STL) is an algorithm that was developed to help to divide up a time series into three components namely: the trend, seasonality and remainder. The methodology was presented by Robert Cleveland, William Cleveland, Jean McRae and Irma Terpenning in the Journal of Official Statistics in 1990. The STL is ...
2150 sym R (1522 sym/4 pcs) 2 img 4 tbl
Word Clouds using Text Mining
There was an interesting post on a blog which showed how straightforward it is to use the text mining tools (tm) from R along with the wordcloud package to create Word Clouds. Following the example from this page I processed the text of the Golden Asse book (found at Project Guttenberg) to generate a word cloud. aFile = readLines("goldenasse.txt"...
861 sym R (490 sym/2 pcs) 2 img 2 tbl
Google Maps and ggmap
The ggmap package can be used to access maps from the Google Maps API and there are a number of examples on various statistics related blogs. These include here, here and here. The ggmap package has a function get_map that can download maps from various sources including Google Maps. require(ggmap) The first example specifies the longitude and la...
1435 sym R (1011 sym/5 pcs) 8 img 5 tbl
Installing R on Ubuntu
The R statistical software is provided either as source code or pre-compiled binary files. In the majority of cases the binaries are sufficient but there may be situations where it is necessary to compile the software from source code and this post describes the steps required on an Ubuntu Linux system. This post has taken information from variou...
1859 sym R (1196 sym/9 pcs) 9 tbl
Data Visualisation and Communication Assignment 2
Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed. Original Source: Reddit/dataisbeautiful (2022) Objective The objective of the original visualisation is to present the fluctuating US inflation rate, emphasising the relative magnitude of current inflation rates. The target audience of this visua...
2314 sym R (2905 sym/7 pcs) 2 img