Publications by Data * Science + R

Benchmarking distance calculation in R

18.10.2012

A typical step in a lot of data mining methods is the calculation of a distance between entities. For example using the nearest-neighbor method it is crucial to do this calculation very efficiently because it is the most time-consuming step of the procedure. Just imagine you want to compute the Euclidean distance between a constantly changing da...

3285 sym R (5146 sym/10 pcs) 2 img

Matching clustering solutions using the ‘Hungarian method’

19.11.2012

Some time ago I stumbled upon a problem connected with the labels of a clustering. The partition an instance belongs to, is mostly labeled through an integer ranging from 1 to K, where k is the number of clusters. The task at that time was to plot a map of the results from the clustering of spatial polygons where every cluster is represented by ...

4444 sym R (4175 sym/4 pcs) 4 img

Predictive Modeling using R and the OpenScoring-Engine – a PMML approach

13.12.2012

On November, the 27th, a special post took my interest. Scott Mutchler presented a small framework for predictive analytics based on the PMML (Predictive Model Markup Language) and a Java-based REST-Interface. PMML is a XML based standard for the description and exchange of analytical models. The idea is that every piece of software which support...

6236 sym R (7888 sym/8 pcs) 10 img

The Wisdom of Crowds – Clustering Using Evidence Accumulation Clustering (EAC)

24.02.2013

Today’s blog post is about a problem known by most of the people using cluster algorithms on datasets without given true labels (unsupervised learning). The challenge here is the “freedom of choice” over a broad range of different cluster algorithms and how to determine the right parameter values. The difficulty is the following: Every clu...

9427 sym R (7854 sym/10 pcs) 14 img

Venue Recommendation – A Simple Use Case Connecting R and Neo4j

07.04.2013

Last month I attended the CeBIT trade fair in Hannover. Besides the so called “shareconomy” there was also another main topic across all expedition halls – Big Data. This subject is not completely new and I think that a lot of you also have experiences with some of the tools associated with Big Data. But due to the great number of databases...

9248 sym R (11678 sym/7 pcs) 2 img

Time Is on My Side – A Small Example for Text Analytics on a Stream

23.06.2013

Introduction and Background While my last posting was about recommendation in the context of Location Based Social Networks there are also other interesting topics regarding the analysis of unstructured data. The most established one is probably Text Analytics/Mining focusing on all sorts of text data.For me, coming from spatial analysis, these t...

14202 sym R (12660 sym/9 pcs) 16 img 2 tbl

Dream Team – combining Tableau and R

03.11.2013

Last quarter was a bit too busy to write some new blog post because of a new job. And changing the job often come along with changing the tools you work with. That was my way to Tableau. Tableau is one of the new stars in the BI/Analytics world and definitely worth a look. The people at Tableau describe their tool as an instrument that combines i...

9392 sym R (972 sym/5 pcs) 14 img

“Show me the way to the next whiskey bar” (The Doors – Alabama Song) – Interactive Location Recommendation using Tableau

02.02.2014

Since I started using Tableau I’m quite fascinated about the capabilities of this piece of software. Before Christmas I was looking how I could build an interactive visualization that helps me to explore the relationships between different objects in a form that shows which objects are very close to each other according to some similarity measu...

9708 sym R (3306 sym/1 pcs) 26 img

“The Winner Takes It All” – Tuning and Validating R Recommendation Models Inside Tableau

04.05.2014

Introduction My last blog article shows how to build an interactive recommendation engine in Tableau using a simple model utilizing the cosine similarity measure. While this can be a good way to explore unknown data, it is wise to validate any model before using it for recommendation in practice in order to get an estimate of how the model perfor...

16836 sym R (5767 sym/4 pcs) 10 img

Automatically Exporting Multiple Cross Tables from Tableau Server into Excel

10.05.2014

Introduction The following blog post is based on a classical reporting task most of the people working in BI got frequently: Besides all the nice dashboards you create in Tableau, from time to time people will approach you with a request for a “data extract” – reports that typically looking like cross tables. Ideally, the extract comes as ...

11693 sym 18 img