Publications by Slawa Rokicki
The infamous apply function
For R beginners, the apply() function seems like a secret doorway into programming bliss. It seems so powerful, and yet, beyond reach. For those just starting out, examples of how to use apply() can really help with the intuition of how to harness its power. Here are some great ways to use apply() that can really help make R programming enjo...
3799 sym 8 img
From continuous to categorical
During data analysis, it is often super useful to turn continuous variables into categorical ones. In Stata you would do something like this:gen catvar=0replace catvar=1 if contvar>0 & contvar<=3replace catvar=2 if contvar>3 & contvar<=5 etc. And then you would label your values like so:label define agelabel 0 “0” 1 “1-3” 2 “3-5�...
2696 sym 6 img
Quick and Easy Subsetting
Public health datasets can be enormous and difficult to look at. Often it is great to be able to only look at specific parts of the dataset, or to only run analysis on a specific part of a dataset. There are two ways that you can subset a dataset in R:Using the subset() functionUsing matrix indexingThe first way may sound easier, but the seco...
3371 sym 4 img
Summarizing Data
In this post, I’ll go over four functions that you can use to nicely summarize your data. Before any regression analysis, a descriptive analysis is key to understanding your variables and the relationships between them. Next week, I’ll have a post on plotting, so this post is limited to the summary(), table(), and aggregate() ...
3830 sym 18 img
What a nice looking scatterplot!
This week, we look at plotting data using scatterplots. I’ll definitely have a post on other ways of plotting data, like boxplots or histograms.Our data from last week remains the same:First, a quick way to look at all of your continuous variables at once is just to do a plot command of your data. Here, I will subset the data to just take thr...
6939 sym 12 img
Getting data in and out of R
One of the great advantages of R is that it recognizes almost any data format that you can throw at it. There are a myriad of different possible file formats but I’ll concentrate on the four files that we see almost exclusively in public health: Excel files, Stata .dta files, SAS transport, and sas7bdat files. Luckily, there are a f...
4208 sym 4 img
Data types, part 1: Ways to store variables
I’ve been alluding to different R data types, or classes, in various posts, so I want to go over them in more detail. This is part 1 of a 3 part series on data types. In this post, I’ll describe and give a general overview of useful data types. In parts 2 and 3, I’ll show you in more detailed examples how you can use these data types to y...
4785 sym 20 img 1 tbl
Data types part 2: Using classes to your advantage
Last week I talked about objects including scalars, vectors, matrices, dataframes, and lists. This post will show you how to use the objects (and their corresponding classes) you create in R to your advantage.First off, it’s important to remember that columns of dataframes are vectors. That is, if I have a dataframe called mydata, the colum...
5965 sym 14 img
Data types, part 3: Factors!
In this third part of the data types series, I’ll go an important class that I skipped over so far: factors.Factors are categorical variables that are super useful in summary statistics, plots, and regressions. They basically act like dummy variables that R codes for you. So, let’s start off with some data:and let’s check out what kinds o...
4462 sym 22 img
Data types part 4: Logical class
First, an update: A commentator has asked me to post my code so that it is easier to practice the examples I show here. It will take me a little bit of time to get all of my code for past posts well-documented and readable, but I have uploaded the code and data for the last 4 posts, including this one, here:Code and Data download siteUnfortun...
5508 sym 22 img