Publications by inkhorn82
Bar Graph Colours That Work Well
Ever since I started using ggplot2 more often at work in order to do graphs, I’ve realized something about the use of colour in bar graphs vs. dot plots: When I’m looking at a graph displayed on the brilliant Viewsonic monitor I’m using at work, the same relatively intense colours that work well in a dot plot start to bother me in a bar gra...
3389 sym 24 img
Are scatterplots too complex for lay folks?
Usually, I like to write about the solutions to problems I’ve had, but today I only have a problem to write about. This is the second research job I’ve had outside of academia, and in both cases I’ve met with resistance when I’ve tried to display bivariate relations using scatterplot. For example, a colleague came past my work computer ...
1871 sym 16 img
Load Packages Automatically in RStudio
I recently finished a long stretch of work on a particular project that required me to draw upon four R packages. Each time I got back to my work on the project, I’d have to load the packages manually, as needed. It got really annoying and constantly made me wonder whether there was some way that I could just get these packages loaded autom...
1367 sym R (188 sym/1 pcs) 16 img
ggplot2: Creating a custom plot with two different geoms
This past week for work I had to create some plots to show the max, min, and median of a measure across the levels of a qualitative variable, and show the max and min of the same variable within a subgroup of the dataset. To illustrate what I mean, I took a fun dataset from the data and story library and recreated the plot that I made at work....
2420 sym 18 img
Using R from Inside Statistica
I’ve been spending a lot of time in the last month or so doing projects at work not statistics related, hence the lack of posts! In the interim, I had to do some serious research on handling datasets bigger than the last one I worked with (the one that kept threatening to max out my 8 gigs of RAM!). I kept trying to practice working with R ...
3940 sym 6 img
Processing Data from a Statistica Worksheet Using R
Context: I work with data from non-profit organizations, and so a big concern in many of my analyses is if and how much people are donating from one year to the next. One of the things I normally like to do in my analyses is get a value for each person that represents how much their yearly donations are increasing or decreasing on average fo...
2092 sym 4 img
A Return to Reliable R
The saga with Statistica continues: Statistica kept crashing on me while doing my data processing. One of the big problems was a wonderful bug that occurred when some of my text data variables were coded (unsurprisingly) as text! Under this condition, I would only be able to add a certain small number of extra variables when I needed to make ...
2480 sym 4 img
Big data analysis, for free, in R (or “How I learned to load, manipulate, and save data using the ff package”)
Before choosing to support the purchase of Statistica at my workplace, I came across the ff package as an option for working with really big datasets (with special attention paid to ff dataframes, or ffdf). It looked like a good option to use, allowing dataframes with multiple data types and way more rows than if I were loading such a dataset int...
4351 sym 8 img
A function to find the “Penultimax”
Penulti-what? Let me explain: Today I had to iteratively go through each row of a donor history dataset and compare a donor’s maximum yearly donation total to the second highest yearly donation total. In even more concrete terms, for each row I had to compare the maximum value across 5 columns against the next highest number. This seemed ...
1356 sym 4 img
Know Your Dataset: Specifying colClasses to load up an ffdf
When I finally figured out how to successfully use the ff package to load data into R, I was apparently working with relatively pain free data to load up through read.csv.ffdf (see my previous post). Just this past Sunday, I naively followed my own post to load a completely new dataset (over 400,000 rows and about 180 columns) for analysis. U...
2826 sym R (278 sym/5 pcs) 4 img