Publications by Unknown
Statistics Sunday: Mixed Effects Meta-Analysis
As promised, how to conduct mixed effects meta-analysis in R:Code used in the video is available here. And I’d recommend the following posts to provide background for this video:What is meta-analysis?Introduction to effect sizesVariance and weights in meta-analysisVideo on conducting fixed and random effects meta-analysis in RAnd th...
777 sym
Visualizing the Tallest Building in Each State
Via Digg:This data visualization, put together by takeasecond on Reddit, shows the tallest building in all 50 states in 2020. As the graph demonstrates, the current tallest building in America is New York’s One World Trade Center at 1,776 feet tall. In contrast, the shortest building on the list is the Decker Towers in Vermont at ju...
760 sym
Blogging A to Z: The A to Z of tidyverse
Announcing my theme for this year’s blogging A to Z!The tidyverse is a set of R packages for data science. The big thing about the tidyverse is making sure your data are tidy. What does that mean?Each row is an observationEach column is a variableEach cell contains only one valueWhen I first learned about the tidy approach, I though...
1349 sym 2 img
A is for arrange
The arrange function allows you to sort a dataset by one or more variable, either ascending or descending. This function is especially helpful if you plan on aggregating your data with summarize (which, we’ll get to later), so you can select specific rows in that command.It’s similar to the Excel complex sort, where the order of e...
737 sym R (19 sym/1 pcs)
B is for bind_rows
Moving on to the letter B, today we’ll talk about merging datasets that contain the same variables but add new cases. This is easily done with bind_rows. Let’s say I realized I forgot to log some of the books I read last year, and I wanted to merge those in to my existing dataset. I selected a handful of books from my to-read list...
1334 sym R (2868 sym/3 pcs) 2 img
C is for coalesce
For the letter C, we’ll talk about the coalesce function. If you’re familiar with SQL, you may have seen this function before. It combines two or more variables into a single column, and is a way to deal with missing data. When you give it a list of variables, it selects the first non-missing value it finds. Because of that, order...
2182 sym R (2670 sym/3 pcs) 2 img
D is for dummy_cols
For the letter D, I’m going to talk about the dummy_cols functions, which isn’t actually part of the tidyverse, but hey: my posts, my rules. This function is incredibly useful for creating dummy variables, which are used in a variety of ways, including multiple regression with categorical variables. When conducting linear regressi...
3094 sym R (1823 sym/2 pcs) 2 img
E is for Exposition Pipe
For the letter E, I want to talk about a set of operators provided by tidyverse (specifically the magrittr package) that makes for much prettier, easier-to-read code: pipes. The main pipe %>% pushes the object to the left of it forward into functions on the right, so that instead of coding f(x), it would be x %>% f(). This lets you ch...
2426 sym R (1616 sym/2 pcs) 2 img
F is for filter
For the letter F – filters! Filters are incredibly useful, especially when combined with the main pipe %>%. I frequently use filters along with ggplot functions, to chart a specific subgroup or remove missing cases or outliers. As one example, I could use a filter to chart only fiction books from my reading dataset.library(tidyverse...
483 sym R (4502 sym/3 pcs) 2 img 1 tbl
G is for group_by
For the letter G, I’d like to introduce a very useful function: group_by. This function lets you group data by one or more variables. By itself, it may not seem very useful, but it’s great when you start manipulating and summarizing data. That’s because many of the functions applied to data after you used group_by are done group...
2355 sym R (2499 sym/3 pcs) 2 img