Publications by R Views

Momentum Investing with R

28.05.2019

After an extended hiatus, Reproducible Finance is back! We’ll celebrate by changing focus a bit and coding up an investment strategy called Momentum. Before we even tiptoe in that direction, please note that this is not intended as investment advice and it’s not intended to be a script that can be implemented for trading. The goal is to explo...

12671 sym R (11895 sym/23 pcs) 6 img

April 2019: “Top 40” New CRAN Packages

29.05.2019

One hundred eighty-seven new packages made it to CRAN in April. Here are my picks for the “Top 40”, organized into ten categories: Biotechnology, Data, Econometrics, Machine Learning, Medicine, Science, Statistics, Time Series, Utilities, and Visualization. Biotechnology genpwr v1.00: Provides functions for power and sample size calculations...

11102 sym 44 img

Introducing DeclareDesign, a Platform for Research Design

03.06.2019

Graeme Blair is an Assistant Professor of Political Science at UCLA. Jasper Cooper is a Postdoctoral Research Associate at the Kahneman-Treisman Center for Behavioral Science and Public Policy at Princeton University. Alexander Coppock is an Assistant Professor of Political Science at Yale University. Macartan Humphreys is a Professor of Politica...

10128 sym R (3486 sym/10 pcs) 2 img 1 tbl

reticulate, virtualenv, and Python in Linux

09.06.2019

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. reticulate is an R package that allows us to use Python modules from within RStudio. I recently found this functionality useful while trying to compare the results of different uplift models. Though I did have R’s uplift package producing Qini charts and metrics...

4064 sym R (1632 sym/4 pcs) 4 img

Equal Size kmeans

12.06.2019

We were recently presented with a problem where the decision maker wanted to understand how their data would naturally group together. The classic technique of k-means clustering was a natural choice; it’s well known, computationally efficient, and implemented in base R via the kmeans() function. Our problem has a slight wrinkle: the decision m...

5171 sym R (2239 sym/7 pcs) 4 img

A Gentle Introduction to tidymodels

18.06.2019

Recently, I had the opportunity to showcase tidymodels in workshops and talks. Because of my vantage point as a user, I figured it would be valuable to share what I have learned so far. Let’s begin by framing where tidymodels fits in our analysis projects. The diagram above is based on the R for Data Science book, by Wickham and Grolemund. The ...

9931 sym R (6403 sym/20 pcs) 8 img

May 2019: “Top 40” New CRAN Packages

24.06.2019

Two hundred twenty-two new packages made it to CRAN in May, and it was more of an effort than usual to select the “Top 40”. Nevertheless, here they are in nine categories, Computational Methods, Data, Machine Learning, Mathematics, Medicine, Science, Statistics, Utilities and Visualization. Computational Methods dde v1.0.0: Implements a Dorma...

9846 sym 50 img

Imagine your Data Before You Collect It

30.06.2019

As data scientists, we are often presented with a dataset and are asked to use it to produce insights. We use R to wrangle, visualize, model, and produce tables and plots for sharing or publication. When we focus on the data in hand in this way, we don’t get to consider where the data came from. The sample size and the set of variables and thei...

5225 sym R (1949 sym/7 pcs) 4 img 3 tbl

Dividend Sleuthing with R

08.07.2019

Welcome to a mid-summer edition of Reproducible Finance with R. Today, we’ll explore the dividend histories of some stocks in the S&P 500. By way of history for all you young tech IPO and crypto investors out there: way back, a long time ago in the dark ages, companies used to take pains to generate free cash flow and then return some of that f...

9659 sym R (11284 sym/19 pcs) 12 img

Three Strategies for Working with Big Data in R

16.07.2019

For many R users, it’s obvious why you’d want to use R with big data, but not so obvious how. In fact, many people (wrongly) believe that R just doesn’t work very well for big data. In this article, I’ll share three strategies for thinking about how to use big data in R, as well as some examples of how to execute each of them. By default ...

10997 sym R (4809 sym/12 pcs) 8 img