Publications by Data * Science + R
Benchmarking distance calculation in R
A typical step in a lot of data mining methods is the calculation of a distance between entities. For example using the nearest-neighbor method it is crucial to do this calculation very efficiently because it is the most time-consuming step of the procedure. Just imagine you want to compute the Euclidean distance between a constantly changing da...
3285 sym R (5146 sym/10 pcs) 2 img
Matching clustering solutions using the ‘Hungarian method’
Some time ago I stumbled upon a problem connected with the labels of a clustering. The partition an instance belongs to, is mostly labeled through an integer ranging from 1 to K, where k is the number of clusters. The task at that time was to plot a map of the results from the clustering of spatial polygons where every cluster is represented by ...
4444 sym R (4175 sym/4 pcs) 4 img
Predictive Modeling using R and the OpenScoring-Engine – a PMML approach
On November, the 27th, a special post took my interest. Scott Mutchler presented a small framework for predictive analytics based on the PMML (Predictive Model Markup Language) and a Java-based REST-Interface. PMML is a XML based standard for the description and exchange of analytical models. The idea is that every piece of software which support...
6236 sym R (7888 sym/8 pcs) 10 img
The Wisdom of Crowds – Clustering Using Evidence Accumulation Clustering (EAC)
Today’s blog post is about a problem known by most of the people using cluster algorithms on datasets without given true labels (unsupervised learning). The challenge here is the “freedom of choice” over a broad range of different cluster algorithms and how to determine the right parameter values. The difficulty is the following: Every clu...
9427 sym R (7854 sym/10 pcs) 14 img
Venue Recommendation – A Simple Use Case Connecting R and Neo4j
Last month I attended the CeBIT trade fair in Hannover. Besides the so called “shareconomy” there was also another main topic across all expedition halls – Big Data. This subject is not completely new and I think that a lot of you also have experiences with some of the tools associated with Big Data. But due to the great number of databases...
9248 sym R (11678 sym/7 pcs) 2 img
Time Is on My Side – A Small Example for Text Analytics on a Stream
Introduction and Background While my last posting was about recommendation in the context of Location Based Social Networks there are also other interesting topics regarding the analysis of unstructured data. The most established one is probably Text Analytics/Mining focusing on all sorts of text data.For me, coming from spatial analysis, these t...
14202 sym R (12660 sym/9 pcs) 16 img 2 tbl
Dream Team – combining Tableau and R
Last quarter was a bit too busy to write some new blog post because of a new job. And changing the job often come along with changing the tools you work with. That was my way to Tableau. Tableau is one of the new stars in the BI/Analytics world and definitely worth a look. The people at Tableau describe their tool as an instrument that combines i...
9392 sym R (972 sym/5 pcs) 14 img
“Show me the way to the next whiskey bar” (The Doors – Alabama Song) – Interactive Location Recommendation using Tableau
Since I started using Tableau I’m quite fascinated about the capabilities of this piece of software. Before Christmas I was looking how I could build an interactive visualization that helps me to explore the relationships between different objects in a form that shows which objects are very close to each other according to some similarity measu...
9708 sym R (3306 sym/1 pcs) 26 img
“The Winner Takes It All” – Tuning and Validating R Recommendation Models Inside Tableau
Introduction My last blog article shows how to build an interactive recommendation engine in Tableau using a simple model utilizing the cosine similarity measure. While this can be a good way to explore unknown data, it is wise to validate any model before using it for recommendation in practice in order to get an estimate of how the model perfor...
16836 sym R (5767 sym/4 pcs) 10 img
Automatically Exporting Multiple Cross Tables from Tableau Server into Excel
Introduction The following blog post is based on a classical reporting task most of the people working in BI got frequently: Besides all the nice dashboards you create in Tableau, from time to time people will approach you with a request for a “data extract” – reports that typically looking like cross tables. Ideally, the extract comes as ...
11693 sym 18 img