Publications by Shirin's playgRound
New features in World Gender Statistics app
In my last post, I built a shiny app to explore World Gender Statistics. To make it a bit nicer and more convenient, I added a few more features: The drop-down menu for Years is now reactive, i.e. it only shows options with data (all NA years are removed) You can click on any country on the map to get information about which country it is, its p...
918 sym
Scratching the Surface of Gender Biases
Today, I want to share my analysis of the World Gender Statistics dataset. Last week I already introduced my Shiny app, where you can explore 160 measurements for 164 countries over 56 years. This week I’ve included a statistical analysis of these countries and measurements and put some finishing touches on the app. Just in case anybody needed ...
5302 sym R (16418 sym/22 pcs) 12 img
Conditional ggplot2 geoms in functions (QTL plots)
When running an analysis, I am usually combining functions from multiple packages. Most of these packages come with their own plotting functions. And while they are certainly convenient in that they allow me to get a quick glance at the data or the output, they all have their own style. If I want to prepare a report, proposal or a paper though, I...
5955 sym R (13367 sym/28 pcs) 20 img
Predicting food preferences with sparklyr (machine learning)
This week I want to show how to run machine learning applications on a Spark cluster. I am using the sparklyr package, which provides a handy interface to access Apache Spark functionalities via R. The question I want to address with machine learning is whether the preference for a country’s cuisine can be predicted based on preferences of othe...
11674 sym R (12060 sym/21 pcs) 18 img
Building deep neural nets with h2o and rsparkling that predict arrhythmia of the heart
Last week, I introduced how to run machine learning applications on Spark from within R, using the sparklyr package. This week, I am showing how to build feed-forward deep neural networks or multilayer perceptrons. The models in this example are built to classify ECG data into being either from healthy hearts or from someone suffering from arrhyt...
14213 sym R (17426 sym/67 pcs) 26 img
Hyper-parameter Tuning with Grid Search for Deep Learning
Last week I showed how to build a deep neural network with h2o and rsparkling. As we could see there, it is not trivial to optimize the hyper-parameters for modeling. Hyper-parameter tuning with grid search allows us to test different combinations of hyper-parameters and find one with improved accuracy. Keep in mind though, that hyper-parameter t...
7280 sym R (9597 sym/16 pcs) 8 img
Plotting trees from Random Forest models with ggraph
Today, I want to show how I use Thomas Lin Pederson’s awesome ggraph package to plot decision trees from Random Forest models. I am very much a visual person, so I try to plot as much of my results as possible because it helps me get a better feel for what is going on with my data. A nice aspect of using tree-based machine learning, like Random...
2523 sym R (6158 sym/6 pcs) 4 img
Building meaningful machine learning models for disease prediction
Webinar for the ISDS R Group This document presents the code I used to produce the example analysis and figures shown in my webinar on building meaningful machine learning models for disease prediction. My webinar slides are available on Github Description: Dr Shirin Glander will go over her work on building machine-learning models to predict th...
5660 sym R (36009 sym/124 pcs) 68 img
Dealing with unbalanced data in machine learning
In my last post, where I shared the code that I used to produce an example analysis to go along with my webinar on building meaningful models for disease prediction, I mentioned that it is advised to consider over- or under-sampling when you have unbalanced data sets. Because my focus in this webinar was on evaluating model performance, I did not...
7136 sym R (6673 sym/23 pcs) 4 img
Data on tour: Plotting 3D maps and location tracks
Recently, I was on Gran Canaria for a vacation. So, what better way to keep up the holiday spirit a while longer than to visualize all the places we went in R!? I am combining location data collected by our car GPS, Google location data from my phone and the hiking tracks we followed. Obviously, the data itself is not public this time but the p...
4494 sym R (8615 sym/17 pcs) 20 img