Publications by Mollie
Truncate by Delimiter in R
Sometimes, you only need to analyze part of the data stored as a vector. In this example, there is a list of patents. Each patent has been assigned to one or more patent classes. Let’s say that we want to analyze the dataset based on only the first patent class listed for each patent.patents <- data.frame( patent = 1:30, class =...
1407 sym R (666 sym/2 pcs)
Perform a Function on Each File in R
Sometimes you might have several data files and want to use R to perform the same function across all of them. Or maybe you have multiple files and want to systematically combine them into one file without having to open each file and manually copy the data out.Fortunately, it’s not complicated to use R to systematically iterate acr...
619 sym R (45 sym/1 pcs)
Custom Legend in R
This particular custom legend was designed with three purposes:To effectively bin values based on a theoretical minimum and maximum value for that variable (e.g. -1 and 1 or 0 and 100)To use a different interval notation than the defaultTo handle NA valuesEven though this particular legend was designed with those needs, it should be s...
3060 sym R (2313 sym/10 pcs) 4 img 2 tbl
Line Breaks Between Words in Axis Labels in ggplot in R
Sometimes when plotting factor variables in R, the graphics can look pretty messy thanks to long factor levels. If the level attributes have multiple words, there is an easy fix to this that often makes the axis labels look much cleaner.Without Line BreaksHere’s the messy looking example:No line breaks in axis labelsAnd here’s the...
1413 sym R (535 sym/3 pcs) 8 img 4 tbl
Check if a Variable Exists in R
If you use attach, it is easy to tell if a variable exists. You can simply use exists to check:>attach(df) >exists("varName") [1] TRUE However, if you don’t use attach (and I find you generally don’t want to), this simple solution doesn’t work.> detach(df) > exists("df$varName") [1] FALSE Instead of using exists, you can use ...
929 sym R (242 sym/4 pcs)
Compare Regression Results to a Specific Factor Level in R
Including a series of dummy variables in a regression in R is very simple. For example,ols <- lm(weight ~ Time + Diet, data = ChickWeight) summary(ols) The above regression automatically includes a dummy variable for all but the first level of the factor of the Diet variable.Call: lm(formula = weight ~ Time + Diet, data = ChickWeig...
1578 sym R (1763 sym/5 pcs)
ggplot Fit Line and Lattice Fit Line in R
Let’s add a fit line to a scatterplot!Fit Line in Base GraphicsHere’s how to do it in base graphics:ols <- lm(Temp ~ Solar.R, data = airquality) summary(ols) plot(Temp ~ Solar.R, data = airquality) abline(ols) Fit line in base graphics in RFit Line in ggplotAnd here's how to do it in ggplot:library(ggplot2) ggplot(data = air...
915 sym R (376 sym/3 pcs) 6 img 3 tbl
Merge by City and State in R
Often, you’ll need to merge two data frames based on multiple variables. For this example, we’ll use the common case of needing to merge by city and state.First, you need to read in both your data sets:# import city coordinate data: coords <- read.csv("cities-coords.csv", header = TRUE, sep = ",") # import population data: da...
1835 sym R (1915 sym/5 pcs)
Deaths
library('tidyverse') ## -- Attaching packages --------------------------------------- tidyverse 1.3.1 -- ## v ggplot2 3.3.5 v purrr 0.3.4 ## v tibble 3.1.6 v dplyr 1.0.7 ## v tidyr 1.1.4 v stringr 1.4.0 ## v readr 2.1.0 v forcats 0.5.1 ## -- Conflicts ------------------------------------------ tidyverse_conflicts() -- ...
38 sym R (15199 sym/33 pcs)