Publications by Shirin's playgRound

Happy EasteR: Plotting hare populations in Germany

15.04.2017

For Easter, I wanted to have a look at the number of hares in Germany. Wild hare populations have been rapidly declining over the last 10 years but during the last three years they have at least been stable. This plot shows the 16 federal states of Germany. Their fill color shows the proportion of hares per square kilometer in 2015/2016. The blac...

5165 sym R (18645 sym/35 pcs) 2 img

Does money buy happiness after all? Machine Learning with One Rule

22.04.2017

This week, I am exploring Holger K. von Jouanne-Diedrich’s OneR package for machine learning. I am running an example analysis on world happiness data and compare the results with other machine learning models (decision trees, random forest, gradient boosting trees and neural nets). Conclusions All in all, based on this example, I would confir...

9442 sym R (19058 sym/40 pcs) 24 img

Explaining complex machine learning models with LIME

22.04.2017

The classification decisions made by machine learning models are usually difficult – if not impossible – to understand by our human brains. The complexity of some of the most accurate classifiers, like neural networks, is what makes them perform so well – often with better results than achieved by humans. But it also makes them inherently h...

6857 sym R (7535 sym/19 pcs) 6 img

Autoencoders and anomaly detection with machine learning in fraud analytics

30.04.2017

All my previous posts on machine learning have dealt with supervised learning. But we can also use machine learning for unsupervised learning. The latter are e.g. used for clustering and (non-linear) dimensionality reduction. For this task, I am using Kaggle’s credit card fraud dataset from the following study: Andrea Dal Pozzolo, Olivier Cael...

10154 sym R (18501 sym/60 pcs) 18 img

Update to autoencoders and anomaly detection with machine learning in fraud analytics

01.05.2017

This is a reply to Wojciech Indyk’s comment on yesterday’s post on autoencoders and anomaly detection with machine learning in fraud analytics: “I think you can improve the detection of anomalies if you change the training set to the deep-autoencoder. As I understand the train_unsupervised contains both class 0 and class 1. If you put only ...

2387 sym R (5897 sym/16 pcs) 2 img

Network analysis of Game of Thrones family ties

14.05.2017

In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones. Not surprisingly, we learn that House Stark (specifically Ned and Sansa) and House Lannister (especially Tyrion) are the most important family connections in Game of Thrones; they also connect many of the storylines and are centr...

13084 sym R (19000 sym/85 pcs) 18 img

New R Users group in Münster!

19.05.2017

This is to announce that Münster now has its very own R users group! If you are from the area, come join us (or if you happen to know someone who is and who might be interested, please forward the info). You can find us on meetup.com: https://www.meetup.com/Munster-R-Users-Group/ and we are also on the R User groups list. Code for the logo, wh...

788 sym R (346 sym/1 pcs) 2 img

Data Science for Business – Time Series Forecasting Part 1: EDA & Data Preparation

27.05.2017

Data Science is a fairly broad term and encompasses a wide range of techniques from data visualization to statistics and machine learning models. But the techniques are only tools in a – sometimes very messy – toolbox. And while it is important to know and understand these tools, here, I want to go at it from a different angle: What is the ta...

11938 sym R (20922 sym/48 pcs) 40 img

Data Science for Business – Time Series Forecasting Part 2: Forecasting with timekit

08.06.2017

In my last post, I prepared and visually explored time series data. Now, I will use this data to test the timekit package for time series forecasting with machine learning. Forecasting In time series forecasting, we use models to predict future time points based on past observations. As mentioned in timekit’s vignette, “as with most machine ...

9319 sym R (9048 sym/31 pcs) 18 img

Data Science for Business – Time Series Forecasting Part 3: Forecasting with Facebook’s Prophet

12.06.2017

In my last two posts (Part 1 and Part 2), I explored time series forecasting with the timekit package. In this post, I want to compare how Facebook’s prophet performs on the same dataset. Predicting future events/sales/etc. isn’t trivial for a number of reasons and different algorithms use different approaches to handle these problems. Time ...

4225 sym R (4579 sym/13 pcs) 6 img