Publications by Unknown

R is for read_

21.04.2020

The tidyverse is full of functions for reading data, beginning with “read_”. The read_csv I’ve used to access my reads2019 data is one example, falling under the read_delim functions. read_tsv allows you to quickly read in tab-delimited files. And you can also read in files with other delimiters, using read_delim and specifying ...

1664 sym 2 img

S is for summarise

22.04.2020

Today, we’ll finally talk about summarise! It’s very similar to mutate, but instead of adding or altering a variable in a dataset, it aggregates your data, creating a new tibble with the columns containing your requested summary data. The number of rows will be equal to the number of groups from group_by (if you don’t specify an...

1892 sym R (4373 sym/4 pcs) 1 tbl

T is for Themes

23.04.2020

One of the easiest ways to make a beautiful ggplot is by using a theme. ggplot2 comes with a variety of pre-existing themes. I’ll use the genre statistics summary table I created in yesterday’s post, and create the same chart with different themes.library(tidyverse) ## -- Attaching packages ---------------------------------------...

1042 sym R (2205 sym/2 pcs) 6 img

U is for Useful Trick

24.04.2020

This will be a very short post for a line of code I’ve found unbelievably useful as I analyze data for work. I’m working with datasets containing millions of rows of data. (The most recent one I worked with had about 13 million records.) Because R loads datasets into memory, you can run out of RAM pretty quickly when working with ...

1417 sym R (38 sym/1 pcs) 2 img

V is for Verbs

25.04.2020

In this series, I’ve covered five terms for data manipulation:arrangefiltermutateselectsummariseThese are the verbs that make up the grammar of data manipulation. They all work with group_by to perform these functions groupwise.There are scoped versions of these verbs, which add _all, _if, or _at, that allow you to perform these ver...

1581 sym R (2071 sym/3 pcs) 2 img

W is for Write and Read Data – Fast

27.04.2020

Once again, I’m dipping outside of the tidyverse, but this package and its functions have been really useful in getting data quickly in (and out) of R.For work, I have to pull in data from a few different sources, and manipulate and work with them to give me the final dataset that I use for much of my analysis. So that I don’t hav...

1958 sym R (2738 sym/2 pcs)

X is for scale_x

28.04.2020

These next two posts will deal with formatting scales in ggplot2 – x-axis, y-axis – so I’ll try to limit the amount of overlap and repetition.Let’s say I wanted to plot my reading over time, specifically as a cumulative sum of pages across the year. My x-axis will be a date. Since my reads2019 file initially formats my dates a...

525 sym R (1539 sym/2 pcs) 2 img

Y is for scale_y

29.04.2020

Yesterday, I talked about scale_x. Today, I’ll continue on that topic, focusing on the y-axis.The key to using any of the scale_ functions is to know what sort of data you’re working with (e.g., date, continuous, discrete). Yesterday, I talked about scale_x_date and scale_x_discrete. We often put these types of data on the x-axis,...

829 sym R (1970 sym/1 pcs) 2 img

Z is for Additional Axes

30.04.2020

Here we are at the last post in Blogging A to Z! Today, I want to talk about adding additional axes to your ggplot, using the options for fill or color. While these aren’t true z-axes in the geometric sense, I think of them as a third, z, axis.Some of you may be surprised to learn that fill and color are different, and that you coul...

1289 sym R (2293 sym/3 pcs) 2 img

Statistics Sunday: My 2019 Reading

03.05.2020

I’ve spent the month of April blogging my way through the tidyverse, while using my reading dataset from 2019 as the example. Today, I thought I’d bring many of those analyses and data manipulation techniques together to do a post about my reading habits for the year.library(tidyverse) ## -- Attaching packages -------------------...

503 sym R (2052 sym/4 pcs) 2 img