Publications by Unknown

H is for haven

09.04.2020

The tidyverse includes many packages meant to make importing, wrangling, analyzing, and visualizing data easier. The haven package allows you to important files from other statistical software, such as SPSS, SAS, and Stata. I learned SPSS in college and used it extensively in grad school. I ended up switching to R because SPSS was get...

2879 sym R (1454 sym/1 pcs) 2 img

I is for I Want to Learn More

10.04.2020

This could have easily been a post about a function beginning with the letter I. But I wanted to take the opportunity to share some the resources that really helped me learn R as well as I do.Obviously, practice and looking things up on stackoverflow and github as I encountered problems was incredibly useful. But those resources gave ...

1524 sym 2 img

J is for Join

11.04.2020

Today, we’ll start digging into the wonderful world of joins! The tidyverse offers several different types of joins between two datasets, X and Y:left_join – keeps all rows from X and adds columns from Y to any that match cases in X; if there is no matching record from Y, the Y columns will be NA for those casesright_join – keep...

3496 sym R (2003 sym/2 pcs) 2 img

K is for Keep or Drop Variables

13.04.2020

A few times in this series, I’ve wanted to display part of a dataset, such as key variables, like Title, Rating, and Pages. The tidyverse allows you to easily keep or drop variables, either temporarily or permanently, with the select function. For instance, we can use select along with other tidyverse functions to create a quick des...

1859 sym R (2154 sym/3 pcs) 2 img

L is for Log Transformation

14.04.2020

When visualizing data, outliers and skewed data can have a huge impact, potentially making your visualization difficult to understand. We can use many of the tricks covered so far to deal with those issues, such as using filters to remove extreme values. But what if you want to display all values, even extreme ones? A log transformati...

1002 sym R (1743 sym/2 pcs) 2 img

M is for mutate

15.04.2020

Today, we finally talk about the mutate function! I’ve used it a lot throughout the series so far, so it’s nice to get to discuss what it is and how it works.The mutate function is used anytime you want create or modify a variable. It works with pretty much any R function that creates/modifies variables, so you can wrap it around ...

2765 sym R (2254 sym/2 pcs) 2 img

N is for n_distinct

16.04.2020

Today, we’ll start digging into some of the functions used to summarise data. The full summarise function will be covered for the letter S. For now, let’s look at one function from the tidyverse that can give some overall information about a dataset: n_distinct.This function counts the number of unique values in a vector or variab...

2136 sym R (3538 sym/5 pcs) 2 img

O is for order_by

17.04.2020

This will be a quick post on another tidyverse function, order_by. I’ll admit, I don’t use this one as often as arrange. It can be useful, though, if you don’t want to permanently change the order of your dataset but want to use functions that require ordering the data. One example is the cumulative sum function (cumsum).library...

842 sym R (2079 sym/4 pcs) 2 img

P is for percent

18.04.2020

We’ve used ggplots throughout this blog series, but today, I want to introduce another package that helps you customize scales on your ggplots – the scales package. I use this package most frequently to format scales as percent. There aren’t a lot of good ways to use percents with my dataset, but one example would be to calculat...

676 sym R (1780 sym/2 pcs) 2 img

Q is for qplot versus ggplot

20.04.2020

Two years ago, when I did Blogging A to Z of R, I talked about qplots. qplots are great for quick plots – which is why they’re named as such – because they use variable types to determine the best plot to generate. For instance, if I give it a single continuous variable, it will generate a histogram.library(tidyverse) ## -- Att...

316 sym R (1349 sym/1 pcs) 2 img