Publications by Unknown
H is for haven
The tidyverse includes many packages meant to make importing, wrangling, analyzing, and visualizing data easier. The haven package allows you to important files from other statistical software, such as SPSS, SAS, and Stata. I learned SPSS in college and used it extensively in grad school. I ended up switching to R because SPSS was get...
2879 sym R (1454 sym/1 pcs) 2 img
I is for I Want to Learn More
This could have easily been a post about a function beginning with the letter I. But I wanted to take the opportunity to share some the resources that really helped me learn R as well as I do.Obviously, practice and looking things up on stackoverflow and github as I encountered problems was incredibly useful. But those resources gave ...
1524 sym 2 img
J is for Join
Today, we’ll start digging into the wonderful world of joins! The tidyverse offers several different types of joins between two datasets, X and Y:left_join – keeps all rows from X and adds columns from Y to any that match cases in X; if there is no matching record from Y, the Y columns will be NA for those casesright_join – keep...
3496 sym R (2003 sym/2 pcs) 2 img
K is for Keep or Drop Variables
A few times in this series, I’ve wanted to display part of a dataset, such as key variables, like Title, Rating, and Pages. The tidyverse allows you to easily keep or drop variables, either temporarily or permanently, with the select function. For instance, we can use select along with other tidyverse functions to create a quick des...
1859 sym R (2154 sym/3 pcs) 2 img
L is for Log Transformation
When visualizing data, outliers and skewed data can have a huge impact, potentially making your visualization difficult to understand. We can use many of the tricks covered so far to deal with those issues, such as using filters to remove extreme values. But what if you want to display all values, even extreme ones? A log transformati...
1002 sym R (1743 sym/2 pcs) 2 img
M is for mutate
Today, we finally talk about the mutate function! I’ve used it a lot throughout the series so far, so it’s nice to get to discuss what it is and how it works.The mutate function is used anytime you want create or modify a variable. It works with pretty much any R function that creates/modifies variables, so you can wrap it around ...
2765 sym R (2254 sym/2 pcs) 2 img
N is for n_distinct
Today, we’ll start digging into some of the functions used to summarise data. The full summarise function will be covered for the letter S. For now, let’s look at one function from the tidyverse that can give some overall information about a dataset: n_distinct.This function counts the number of unique values in a vector or variab...
2136 sym R (3538 sym/5 pcs) 2 img
O is for order_by
This will be a quick post on another tidyverse function, order_by. I’ll admit, I don’t use this one as often as arrange. It can be useful, though, if you don’t want to permanently change the order of your dataset but want to use functions that require ordering the data. One example is the cumulative sum function (cumsum).library...
842 sym R (2079 sym/4 pcs) 2 img
P is for percent
We’ve used ggplots throughout this blog series, but today, I want to introduce another package that helps you customize scales on your ggplots – the scales package. I use this package most frequently to format scales as percent. There aren’t a lot of good ways to use percents with my dataset, but one example would be to calculat...
676 sym R (1780 sym/2 pcs) 2 img
Q is for qplot versus ggplot
Two years ago, when I did Blogging A to Z of R, I talked about qplots. qplots are great for quick plots – which is why they’re named as such – because they use variable types to determine the best plot to generate. For instance, if I give it a single continuous variable, it will generate a histogram.library(tidyverse) ## -- Att...
316 sym R (1349 sym/1 pcs) 2 img