Publications by Bogumił Kamiński
Comparing Banzhaf and Shapley-Shubik power indices
Last week I analyzed Shapley-Shubik power index in R. I got several requests to write a code calculating Banzhaf power index. Here is the proposed code.Again I use data from Warsaw School of Economics rector elections (the details are in my last post). I give the code for calculation of Shapley-Shubik and Banzhaf power indices below.# Constituenc...
2252 sym 4 img
Plotting conditional densities
Recently I have read a post on Comparing all quantiles of two distributions simultaneously on R-bloggers. In the post author plots two conditional density plots on one graph. I often use such a plot to visualize conditional densities of scores in binary prediction. After several times I had a problem with appropriate scaling of the ...
2383 sym 4 img
Generating all subsets of a set
Recently I have calculated Banzhaf power index. I required generation of all subsets of a given set. The code given there was a bit complex and I have decided to write a simple function calculating it. As an example of its application I reproduce Figure 3.5 from Hastie et al. (2009).The figure shows RSS for all possible linear regress...
2064 sym 4 img
Animating Schelling’s segregation model
Recent blog post on Animations in R inspired me to write a code that generates animations of simulation model. For this task I have chosen Schelling’s segregation model.Having written the code I have found that one year ago a similar code has been proposed. However, the implementation model is different so I thought it is a nice comparison.Her...
2900 sym 4 img
Visualizing tables in ggplot2
Recently I wanted to recreate assocplot using ggplot2. In the end I propose a simple way to visualize data arranged two-way tables using geom_tile.I used Titanic data set as an example combining age and sex dimensions to get two-way data.I plot residuals of Chi-squared test (like in assocplot) on the left and probability o...
1758 sym 4 img
Porting cdplot to ggplot2
Last week I published a post on plotting tables in ggplot2. So the next natural step is to port cdplot to allow simple visualization of categorical variables against a numerical predictor.First part of the story covers binary variables. In this case the solution does not require using cdplot as one can use gam smoother. Here is ...
2024 sym 6 img
Emulating local static variables in R
Recently I was writing a code allowing to plot multiple ggplot2 plots on one page. I wanted to replicate standard behavior of plot function that plots graphs in sequence according to mfrow/ mfcol option in par. The solution lead me to think of emulating C-like local static variables in R.There are several solutions to this ...
2476 sym 2 img
Cross-valitation variability example, part I
Recently I had a discussion with a student about variability of results obtained from cross-validation procedure. While the subject is well known there are not many examples on the web showing it, so I have written its simple presentation.Results from cross-validation are reported as a standard by rpart procedure (printcp and plot...
2384 sym 2 img
You should not use split in production code
Recently I have stumbled on a problem with split function applied on list of factors. The issue is that it might produce wrong splits when splitting factors contain dots.Here is the example of the problem. Invoking the following code:df data.frame(x = rep(c(“a”, “a.b”), 3), y = rep(c(“b.c�...
1917 sym
Optimal sorting using rpart
Some time ago I read a nice post Solving easy problems the hard way where linear regression is used to solve an interesting puzzle. Following the idea I used rpart to find optimal decision tree sorting five elements.It is well known that in order to sort five elements it is enough to use seven comparisons. Interestingly it is possi...
3026 sym 2 img