Publications by R on Data & The World

Matrix to LaTeX

15.08.2020

I recently had to go through some matrix operations in R and then write up the results in LaTeX. Formatting the R output to get it into a form for LaTeX isn’t particularly hard, but it’s tedious and it has a regular structure, so it seemed like it would be easy to code it up. So I decided to try it for R, Python, and Julia. Matrices in LaTeX ...

4172 sym R (1562 sym/9 pcs)

Hotelling’s T^2 in Julia, Python, and R

14.10.2020

The t-test is a common, reliable way to check for differences between two samples. When dealing with multivariate data, one can simply run t-tests on each variable and see if there are differences. This could lead to scenarios where individual t-tests suggest that there is no difference, although looking at all variables jointly will show a diffe...

5776 sym R (2512 sym/3 pcs) 2 img

LDA vs QDA vs Logistic Regression

28.11.2020

There are plenty of methods to choose from for classification problems, all with their own strengths and weaknesses. This post will try to compare three of the more basic ones: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and logistic regression. Theory: LDA and QDA Both LDA and QDA result from the same ideas, apart ...

8965 sym R (4703 sym/6 pcs) 4 img

Partial Regression Plots in Julia, Python, and R

13.03.2021

Partial regression plots – also called added variable plots, among other things – are a type of diagnostic plot for multivariate linear regression models. More specifically, they attempt to show the effect of adding a new variable to an existing model by controlling for the effect of the predictors already in use. They’re useful for spottin...

3423 sym R (1641 sym/6 pcs) 10 img

Booleans & NAs

06.05.2021

Missing values are inevitable in data science, and handling them is a constant issue. In the case of Boolean logic, it can behave fairly differently depending on the order of arguments and exactly how it is set up, unlike a lot of other data types. Whether this is useful or not depends on the scenario, but the behavior is something to keep in min...

2512 sym Python (677 sym/4 pcs)

The Four Pipes of magrittr

06.09.2021

The magrittr package is a part of the extended tidyverse – i.e., not one of the ones normally loaded. It is the one that supplies the pipe operator (%>%), but it turns out that the package actually contains four pipe operators in total. All are intended to streamline and improve the readability of code, though the three non-basic ones are a bit...

4725 sym R (1669 sym/9 pcs) 4 img

Examination of the K-Means Broken-Line Method

10.09.2021

I recently encountered a 2018 paper called “The next-generation \(k\)-means algorithm”. It proposes and compiles advancements and theoretical justifications for \(k\)-means and \(k\)-medians clustering. One part that caught my eye was the proposed “broken-line algorithm” for finding the optimal number of clusters in \(k\)-means. Though it...

10424 sym R (5977 sym/7 pcs) 22 img

Markov Transition (Animated) Plots

06.11.2021

This is a quick post intended for animating how the transition matrix of a Markov chain changes between larger time steps, as well as showing the probability of the chain being in any specified state over time. This post uses the tidyverse, along with gganimate. > library(tidyverse) > library(magrittr) ## using some aliases not loaded by default ...

7160 sym R (3014 sym/14 pcs) 12 img

Binary Missing Value Imputation

20.11.2021

A few datasets that I’ve seen have come with several different columns representing binary responses to questions. Naturally, there are missing values scattered throughout, so some amount of imputation had to occur. I decided to try coding up a way to do this by picking the mode of rows that were as similar as possible to the row with missing v...

5799 sym R (3380 sym/5 pcs)