Publications by INWT-Blog-RBloggers

INWT’s guidelines for R code

25.01.2018

“It turns out that style matters in programming for the same reason that it matters in writing. It makes for better reading.“ Douglas Crockford in JavaScript: The Good Parts Why do we need yet another style guide? “The reason to care about a style guide is just one thing: We want that our source code is not only interpretable by a comp...

6407 sym R (1364 sym/1 pcs)

Introducing the Kernelheaping Package

06.02.2018

In this blog article I’d like to introduce the univariate kernel density estimation for heaped (i.e. rounded or interval censored) data with the Kernelheaping package. It is not unusual to have interval censored data such as in income surveys due to anonymisation or simplification issues. However, a simple task like plotting a density may fail...

2793 sym R (2018 sym/6 pcs) 8 img

smoothScatter with ggplot2

05.03.2018

The motivation for this plot is the function: graphics::smoothScatter, basically a plot of a two dimensional density estimator. In the following I want to reproduce the features with ggplot2. smoothScatter To have some data I draw some random numbers from a two dimensional normal distribution: <pre class ="r"><code>library(ggplot2) library(MA...

1900 sym R (1045 sym/6 pcs) 8 img

Design Patterns in R

04.04.2018

<script type="text/javascript" async src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML"> </script> These notes are inspired by a talk by Stuart Sierra on Design Patterns in Functional Programming and some thoughts I found on F# for fun an profit and are reflection on how I use different strategies to solve things...

10053 sym R (4429 sym/18 pcs)

Introducing the Kernelheaping Package II

13.07.2018

In the first part of <a target="_blank" style = "color:#527BBA; text-decoration: underline;" href="https://www.inwt-statistics.com/read-blog/introducing-the-kernelheaping-package-512.html">Introducing the Kernelheaping Package</a> I showed how to compute and plot kernel density estimates on rounded or interval censored data using the Kernelheapin...

614 sym R (4574 sym/3 pcs) 4 img

Do GPU-based Basic Linear Algebra Subprograms (BLAS) improve the performance of standard modeling techniques in R?

06.08.2018

Introduction The speed or run-time of models in R can be a critical factor, especially considering the size and complexity of modern datasets. The number of data points as well as the number of features can easily be in the millions. Even relatively trivial modeling procedures can consume a lot of time, which is critical both for optimization an...

7998 sym R (197 sym/4 pcs) 6 img

Introducing the Kernel Heaping Package III

25.09.2018

In the second part of this blog series, I showed how to compute spatial kernel density estimates based on area-level data. The Kernelheaping package also supports boundary-corrected kernel density estimation, which allows us to exclude certain areas, where we know that the density must be zero. One example is estimating the population density wh...

3918 sym R (8504 sym/16 pcs) 6 img

Optimize your R Code using Memoization

11.10.2018

This article describes how you can apply a programming technique, called Memoization, to speed up your R code and solve performance bottlenecks. Wikipedia says: In computing, […] memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached res...

8438 sym R (3278 sym/7 pcs)

Cluster Analysis – Part 1: Introduction

06.11.2018

What is Cluster Analysis? Cluster analysis is a collective term for various algorithms to find group structures in data. The groups are called clusters and are usually not known a priori. In contrast, classification procedures assign the observations to already known groups (e.g., buyers and non-buyers). A classification is often performed with ...

6229 sym R (492 sym/3 pcs) 4 img

Cluster Analysis – Part 2: Hands On

21.11.2018

<code><pre class="r">library(broom) library(cluster) library(dplyr) library(ggplot2) library(ggdendro) In the first part of this blog series, we examined the theoretical foundations of cluster analysis. In the following article we put the theory into practice using R. For the analysis in R, we will use the variables mpg (fuel consumption in mile...

4260 sym R (2852 sym/12 pcs) 12 img