Publications by kjytay

Using the scales package to change alpha in base R plots

17.08.2018

When you are plotting many points on one plot, changing the transparency, or alpha, of the points is often a good idea. For example, the plot below plots price vs. carat for each of 53,940 diamonds: library(ggplot2) ggplot(diamonds, aes(x = carat, y = price)) + geom_point() We get a mess/mass of points! We can see the general trend betwee...

1447 sym R (299 sym/3 pcs) 6 img

Exploring point distribution in the English Premier League

17.08.2018

I recently got a hold of table standings for the English Premier League (EPL) for the past 10 years. In this post, I want to explore the question: How similar are the point distributions across seasons? Code for the figures found in this post can be found here. The first plot I made was a histogram for each Season: The default setting of 30 bins...

1729 sym 8 img

Different winners under different criteria

21.08.2018

A few posts ago (see here), I noted that there was a group of 7 teams in the English Premier League (EPL) that seem to be a cut above the rest: Arsenal Chelsea Everton Liverpool Manchester City Manchester United Tottenham Hotspur The team logos. (Photo credit: kadanewsmag.com.ng) Who is the best among these 7 teams? In thinking about this quest...

2402 sym R (1963 sym/11 pcs) 4 img

Subsetting in the presence of NAs

06.10.2018

In R, we can subset a data frame df easily by putting the conditional in square brackets after df. For example, if I want all the rows in df which have value equal to 1 in the column colA, all I have to do is df[df$colA == 1, ] Recently, I realized that this approach can be problematic when there are NAs present in the data! For example, let df...

1220 sym R (445 sym/5 pcs)

Obtaining the number of components from cross validation of principal components regression

14.10.2018

Principal components (PC) regression is a common dimensionality reduction technique in supervised learning. The R lab for PC regression in James et al.’s Introduction to Statistical Learning is a popular intro for how to perform PC regression in R: it is on p256-257 of the book (p270-271 of the PDF). As in the lab, the code below runs PC regre...

2240 sym R (1336 sym/4 pcs) 4 img

Getting started Stamen maps with ggmap

25.10.2018

Spatial visualizations really come to life when you have a real map as a background. In R, ggmap is the package that you’ll want to use to get these maps. In what follows, we’ll demonstrate how to use ggmap with the Sacramento dataset in the caret package. For each city, we are going to keep track of the median price of houses in that city, t...

3118 sym R (2004 sym/9 pcs) 14 img

A deep dive into glmnet: penalty.factor

13.11.2018

The glmnet function (from the package of the same name) is probably the most used function for fitting the elastic net model in R. (It also fits the lasso and ridge regression, since they are special cases of elastic net.) The glmnet function is very powerful and has several function options that users may not know about. In a series of posts, I ...

2049 sym R (1169 sym/5 pcs) 14 img

A deep dive into glmnet: standardize

15.11.2018

I’m writing a series of posts on various function options of the glmnet function (from the package of the same name), hoping to give more detail and insight beyond R’s documentation. In this post, we will focus on the standardize option. For reference, here is the full signature of the glmnet function: glmnet(x, y, family=c("gaussian","binomi...

3089 sym R (2021 sym/8 pcs) 54 img

Scraping NBA game data from basketball-reference.com

11.12.2018

I’m a casual NBA fan: I don’t have time to watch the games but enjoy viewing the highlights on Instagram/Youtube (especially Shaqtin’ A Fool!); I sometimes read game articles and analyses (e.g. Blogtable). Apart from the game being an amazing visual spectacle, it’s fun to drink in the deluge of stats that each game brings. I’m not even ...

5147 sym R (4528 sym/11 pcs) 6 img

Recreating the NBA lead tracker graphic

13.12.2018

For each NBA game, nba.com has a really nice graphic which tracks the point differential between the two teams throughout the game. Here is the lead tracker graphic for the game between the LA Clippers and the Phoenix Suns on 10 Dec 2018: Taken from https://www.nba.com/games/20181210/LACPHX#/matchup I thought it would be cool to try recreating th...

4640 sym R (4293 sym/9 pcs) 28 img