Publications by Florian Privé

Whether to use a data frame in R?

19.07.2018

In this post, I try to show you in which situations using a data frame is appropriate, and in which it’s not. Learn more with the Advanced R book. What is a data frame? A data frame is just a list of vectors of the same length, each vector being a column. This may convince you: str(iris) ## 'data.frame': 150 obs. of 5 variables: ## $ Sep...

2238 sym R (2695 sym/5 pcs)

Fast R functions to get first principal components

29.08.2018

In this post, I compare different approaches to get first principal components of large matrices in R. Comparison library(bigstatsr) library(tidyverse) Data # Create two matrices, one with some structure, one without n <- 20e3 seq_m <- c(1e3, 3e3, 10e3) sizes <- seq_along(seq_m) X <- E <- list() for (i in sizes) { m <- seq_m[i] U <- matrix(...

1898 sym R (4570 sym/8 pcs) 2 img

Predicting height based on DNA mutations

07.10.2018

In this post, I show some results of predicting height based on DNA mutations. This analysis aims at reproducing the analysis of this paper using my own analysis tools in. I use a new dataset composed of 500,000 adults from UK, and genotyped over hundreds of thousands of DNA positions. This dataset is called the UK biobank, and also provide some ...

2895 sym R (298 sym/1 pcs) 6 img

Choosing hyper-parameters in penalized regression

22.11.2018

In this post, I’m evaluating some ways of choosing hyper-parameters (\(\alpha\) and \(\lambda\)) in penalized linear regression. The same principles can be applied to other types of penalized regresions (e.g. logistic). Model In penalized linear regression, we find regression coefficients \(\hat{\beta}_0\) and \(\hat{\beta}\) that minimize th...

5509 sym R (3004 sym/3 pcs) 4 img

Using clustering to find points in an image

26.11.2018

In this post, I present my new package {img2coord}. This package can be used to retrieve coordinates from a scatter plot (as an image). devtools::install_github("privefl/img2coord") Have you ever made a plot, saved it as a png and moved on? When you come back to it, it is sometimes difficult to read the values from this plot, especially if there ...

3207 sym R (6597 sym/20 pcs) 30 img

Detecting outlier samples in PCA

21.08.2019

In this post, I present something I am currently investigating (feedback welcome!) and that I am implementing in my new package {bigutilsr}. This package can be used to detect outlier samples in Principal Component Analysis (PCA). remotes::install_github("privefl/bigutilsr") library(bigutilsr) I present three different statistics of outlierness ...

7919 sym R (3930 sym/29 pcs) 40 img