Publications by Very statisticious on Very statisticious

A closer look at replicate() and purrr::map() for simulations

04.06.2018

I’ve done a couple of posts so far on simulations, here and here, where I demonstrate how to build a function for simulating data from a defined linear model and then explore long-run behavior of models fit to the simulated datasets. The focus of those posts was on the general simulation process, and I didn’t go into much detail on the specif...

9942 sym R (6255 sym/16 pcs) 2 img

Time after time: calculating the autocorrelation function for uneven or grouped time series

26.06.2018

I first learned how to check for autocorrelation via autocorrelation function (ACF) plots in R in a class on time series However, the examples we worked on were all single, long term time series with no missing values and no groups. I figured out later that calculating the ACF when the sampling through time is uneven or there are distinct time se...

11059 sym R (4813 sym/18 pcs) 4 img

Simulate! Simulate! – Part 3: The Poisson edition

17.07.2018

One of the things I like about simulations is that, with practice, they can be a quick way to check your intuition about a model or relationship. My most recent example is based on a discussion with a student about quadratic effects. I’ve never had a great grasp on what the coefficients that define a quadratic relationship mean. Luckily there i...

8182 sym R (599 sym/9 pcs) 4 img

Creating legends when aesthetics are constants in ggplot2

18.07.2018

In general, if you want to map an aesthetic to a variable and get a legend in ggplot2 you do it inside aes(). If you want to set an aesthetic to a constant value, like making all your points purple, you do it outside aes(). However, there are situations where you might want to set an aesthetic for a layer to a constant but you also want a legend ...

6372 sym R (5384 sym/9 pcs) 14 img

Automating exploratory plots with ggplot2 and purrr

19.08.2018

When you have a lot of variables and need to make a lot exploratory plots it’s usually worthwhile to automate the process in R instead of manually copying and pasting code for every plot. However, the coding approach needed to automate plots can look pretty daunting to a beginner R user. It can look so daunting, in fact, that it can appear easi...

12863 sym R (4360 sym/26 pcs) 26 img

Getting started simulating data in R: some helpful functions and how to use them

28.08.2018

I’ve been trying to participate a little more in the R community outside of my narrow professional world, so when the co-organizer of the Eugene R Users Group invited me to come talk at one of their meet-ups I agreed (even though it involved public speaking! ????). I started out thinking I’d talk about doing simulations. But could I do that i...

25730 sym R (8212 sym/51 pcs) 6 img

The log-0 problem: analysis strategies and options for choosing c in log(y + c)

18.09.2018

I periodically find myself having long conversations with consultees about 0’s. Why? Well, the basic suite of statistical tools many of us learn first involves the normal distribution (for the errors). The log transformation tends to feature prominently for working with right-skewed data. Since log(0) returns -Infinity, a common first reaction ...

13939 sym R (7131 sym/17 pcs) 2 img

Analysis essentials: An example directory structure for an analysis using R

28.10.2018

There are a lot of practical skills involved in doing an analysis that are essential but that I rarely (never?) see included in the curriculum, statistics or otherwise. These are skills like how to organize your data, how to approach QAQC, and how to set up a naming algorithm for files. We all need to do these things, but too often we end up lear...

8727 sym R (421 sym/4 pcs)

How to plot fitted lines with ggplot2

15.11.2018

Most analyses aren’t really done until we’ve found a way to visualize the results graphically, and I’ve recently been getting some questions from students on how to plot fitted lines from models. There are some R packages that are made specifically for this purpose; see packages effects and visreg, for example. If using the ggplot2 package ...

11168 sym R (8908 sym/26 pcs) 14 img

Lots of zeros or too many zeros?: Thinking about zero inflation in count data

05.03.2019

In a recent lecture I gave a basic overview of zero-inflation in count distributions. My main take-home message to the students that I thought worth posting about here is that having a lot of zero values does not necessarily mean you have zero inflation. Zero inflation is when there are more 0 values in the data than the distribution allows for. ...

7061 sym R (3312 sym/16 pcs) 4 img