Publications by kjytay
Attributes in R
In R, objects are allowed to have attributes, which is a way for users to tag additional information to an R object. There are a few reasons why one might want to use attributes. One reason that I encountered recently was to ensure that the type of object returned from a function remains consistent across a range of function options. For example,...
2173 sym R (1225 sym/9 pcs)
How is the F-statistic computed in anova() when there are multiple models?
Background In the linear regression context, it is common to use the F-test to test whether a proposed regression model fits the data well. Say we have predictors, and we are comparing the model fit for Linear regression where are allowed to vary freely but are fixed at zero, vs. Linear regression where are allowed to vary freely. ( is some...
3916 sym R (985 sym/3 pcs) 32 img
Some notes when using dot-dot-dot (…) in R
When writing functions R, the ... argument is a special argument useful for passing an unknown number of arguments to another function. This is widely used in R, especially in generic functions such as plot(), print(), and apply(). Hadley Wickham’s Advanced R has a nice short section on the uses of ... and some potential pitfalls when using ......
2527 sym R (503 sym/6 pcs)
A shiny app for exploratory data analysis
I recently learnt how to build basic R Shiny apps. To practice using Shiny, I created a simple app that you can use to perform simple exploratory data analysis. You can use the app here to play around with the diamonds dataset from the ggplot2 package. To use the app for your own dataset, download the raw R code here (just the app.R file) and ass...
3507 sym 20 img
Exploring the game “First Orchard” with simulation in R
My daughter received the board game First Orchard as a Christmas present and she’s hooked on it so far. In playing the game with her, a few probability/statistics questions came to mind. This post outlines how I answered some of them using simulation in R. All code for this blog post can be found here. (In my googling I found that Matt Lane has...
6811 sym R (567 sym/1 pcs) 14 img
Simulating the dice game “Toss Up!” in R
Toss Up! is a very simple dice game that I’ve always wanted to simulate but never got around to doing so (until now!). This post outlines how to simulate a Toss Up! game in R, as well as how to evaluate the effectiveness of different game strategies. All the code for this blog post is available here. Rules The official rules for Toss Up! are a...
7421 sym R (7837 sym/9 pcs) 2 img
glmnet v4.1: regularized Cox models for (start, stop] and stratified data
My latest work on the glmnet package has just been pushed to CRAN! In this release (v4.1), we extend the scope of regularized Cox models to include (start, stop] data and strata variables. In addition, we provide the survfit method for plotting survival curves based on the model (as the survival package does). Why is this a big deal? As explaine...
1472 sym
The Mendoza line
The Mendoza Line is a term from baseball. Named after Mario Mendoza, it refers to the threshold of incompetent hitting. It is frequently taken to be a batting average of .200, although all the sources I looked at made sure to note that Mendoza’s career average was actually a little better: .215. This post explores a few questions related to the...
3970 sym R (3776 sym/11 pcs) 8 img
covidcast package for COVID-19-related data
(This is a PSA post, where I share a package that I think that might be of interest to the community but I haven’t looked too deeply into myself.) Today I learnt of the covidcast R package, which provides access to the COVIDcast Epidata API published by the Delphi group at Carnegie Mellon University. According to the covidcast R package websi...
2634 sym R (1145 sym/5 pcs) 6 img
What is a sunflower plot?
A sunflower plot is a type of scatterplot which tries to reduce overplotting. When there are multiple points that have the same (x, y) values, sunflower plots plot just one point there, but has little edges (or “petals”) coming out from the point to indicate how many points are really there. It’s best to see this via an example. Here is a p...
2292 sym R (838 sym/8 pcs) 16 img