Publications by Shirin's playgRound

New features in World Gender Statistics app

29.01.2017

In my last post, I built a shiny app to explore World Gender Statistics. To make it a bit nicer and more convenient, I added a few more features: The drop-down menu for Years is now reactive, i.e. it only shows options with data (all NA years are removed) You can click on any country on the map to get information about which country it is, its p...

918 sym

Scratching the Surface of Gender Biases

05.02.2017

Today, I want to share my analysis of the World Gender Statistics dataset. Last week I already introduced my Shiny app, where you can explore 160 measurements for 164 countries over 56 years. This week I’ve included a statistical analysis of these countries and measurements and put some finishing touches on the app. Just in case anybody needed ...

5302 sym R (16418 sym/22 pcs) 12 img

Conditional ggplot2 geoms in functions (QTL plots)

11.02.2017

When running an analysis, I am usually combining functions from multiple packages. Most of these packages come with their own plotting functions. And while they are certainly convenient in that they allow me to get a quick glance at the data or the output, they all have their own style. If I want to prepare a report, proposal or a paper though, I...

5955 sym R (13367 sym/28 pcs) 20 img

Predicting food preferences with sparklyr (machine learning)

18.02.2017

This week I want to show how to run machine learning applications on a Spark cluster. I am using the sparklyr package, which provides a handy interface to access Apache Spark functionalities via R. The question I want to address with machine learning is whether the preference for a country’s cuisine can be predicted based on preferences of othe...

11674 sym R (12060 sym/21 pcs) 18 img

Building deep neural nets with h2o and rsparkling that predict arrhythmia of the heart

27.02.2017

Last week, I introduced how to run machine learning applications on Spark from within R, using the sparklyr package. This week, I am showing how to build feed-forward deep neural networks or multilayer perceptrons. The models in this example are built to classify ECG data into being either from healthy hearts or from someone suffering from arrhyt...

14213 sym R (17426 sym/67 pcs) 26 img

Hyper-parameter Tuning with Grid Search for Deep Learning

06.03.2017

Last week I showed how to build a deep neural network with h2o and rsparkling. As we could see there, it is not trivial to optimize the hyper-parameters for modeling. Hyper-parameter tuning with grid search allows us to test different combinations of hyper-parameters and find one with improved accuracy. Keep in mind though, that hyper-parameter t...

7280 sym R (9597 sym/16 pcs) 8 img

Plotting trees from Random Forest models with ggraph

15.03.2017

Today, I want to show how I use Thomas Lin Pederson’s awesome ggraph package to plot decision trees from Random Forest models. I am very much a visual person, so I try to plot as much of my results as possible because it helps me get a better feel for what is going on with my data. A nice aspect of using tree-based machine learning, like Random...

2523 sym R (6158 sym/6 pcs) 4 img

Building meaningful machine learning models for disease prediction

30.03.2017

Webinar for the ISDS R Group This document presents the code I used to produce the example analysis and figures shown in my webinar on building meaningful machine learning models for disease prediction. My webinar slides are available on Github Description: Dr Shirin Glander will go over her work on building machine-learning models to predict th...

5660 sym R (36009 sym/124 pcs) 68 img

Dealing with unbalanced data in machine learning

01.04.2017

In my last post, where I shared the code that I used to produce an example analysis to go along with my webinar on building meaningful models for disease prediction, I mentioned that it is advised to consider over- or under-sampling when you have unbalanced data sets. Because my focus in this webinar was on evaluating model performance, I did not...

7136 sym R (6673 sym/23 pcs) 4 img

Data on tour: Plotting 3D maps and location tracks

08.04.2017

Recently, I was on Gran Canaria for a vacation. So, what better way to keep up the holiday spirit a while longer than to visualize all the places we went in R!? I am combining location data collected by our car GPS, Google location data from my phone and the hiking tracks we followed. Obviously, the data itself is not public this time but the p...

4494 sym R (8615 sym/17 pcs) 20 img