Publications by Maxwell B. Joseph
Quantifying uncertainty around R-squared for generalized linear mixed models
People love $R^2$. As such, when Nakagawa and Schielzeth published an article in the journal Methods in Ecology and Evolution earlier this year, ecologists (amid increasing use of generalized linear mixed models (GLMMs)) rejoiced. Now there’s an R function that automates $R^2$ calculations for GLMMs fit with the lme4 package. $R^2$ is usually ...
4257 sym R (9403 sym/14 pcs) 4 img 7 tbl
Animating the Metropolis algorithm
The Metropolis algorithm, and its generalization (Metropolis-Hastings algorithm) provide elegant methods for obtaining sequences of random samples from complex probability distributions. When I first read about modern MCMC methods, I had trouble visualizing the convergence of Markov chains in higher dimensional cases. So, I thought I might put to...
1310 sym R (2998 sym/4 pcs) 2 img 2 tbl
How heavy is the Siberut macaque? A Bayesian phylogenetic approach
Among-species comparisons can include phylogenetic information to account for non-independence arising from shared evolutionary history. Often, phylogenetic topologies and branch lengths are not known exactly, but are estimated with uncertainty. This uncertainty can be accounted for using methods recently described in a neat paper called Bayesia...
3298 sym R (2135 sym/8 pcs) 6 img 4 tbl
R and my divorce from Word
Being in grad school, I do a lot of scholarly writing that requires associated or embedded R analyses, figures, and tables, plus bibliographies. Microsoft Word makes this unnecessarily difficult. Many tools are now available to break free from the tyranny of Word. The ones I like involve writing an article in markdown format, integrating all da...
1991 sym 2 img
Errors-in-variables models in stan
In a previous post, I gave a cursory overview of how prior information about covariate measurement error can reduce bias in linear regression. In the comments, Rasmus Bååth asked about estimation in the absence of strong priors. Here, I’ll describe a Bayesian approach for estimation and correction for covariate measurement error using a lat...
3883 sym R (4319 sym/12 pcs) 12 img 6 tbl
Better living through zero-one inflated beta regression
Dealing with proportion data on the interval $[0, 1]$ is tricky. I realized this while trying to explain variation in vegetation cover. Unfortunately this is a true proportion, and can’t be made into a binary response. Further, true 0’s and 1’s rule out beta regression. You could arcsine square root transform the data (but shouldn’t; Wart...
3123 sym R (3529 sym/4 pcs) 2 img 2 tbl
Stochastic search variable selection in JAGS
Stochastic search variable selection (SSVS) identifies promising subsets of multiple regression covariates via Gibbs sampling (George and McCulloch 1993). Here’s a short SSVS demo with JAGS and R. Assume we have a multiple regression problem: We suspect only a subset of the elements of $\boldsymbol{\beta}$ are non-zero, i.e. some of the covari...
1800 sym R (2583 sym/6 pcs) 2 img 3 tbl
Shiny variance inflation factor sandbox
In multiple regression, strong correlation among covariates increases the uncertainty or variance in estimated regression coefficients. Variance inflation factors (VIFs) are one tool that has been used as an indicator of problematic covariate collinearity. In teaching students about VIFs, it may be useful to have some interactive supplementary ...
1292 sym R (3082 sym/4 pcs) 2 tbl
Multilevel modeling of community composition with imperfect detection
This is a guest post generously provided by Joe Mihaljevic. A common goal of community ecology is to understand how and why species composition shifts across space. Common techniques to determine which environmental covariates might lead to such shifts typically rely on ordination of community data to reduce the amount of data. These techniques i...
4796 sym R (2749 sym/4 pcs) 6 img 2 tbl
Spatial data extraction around buffered points in R
Quantifying spatial data (e.g. land cover) around points can be done in a variety of ways, some of which require considerable amounts of patience, clicking around, and/or cash for a license. Here’s a bit of code that I cobbled together to quickly extract land cover data from the National Land Cover Database for buffered regions around points (...
1183 sym R (2772 sym/4 pcs) 2 tbl