Publications by schochastics
Predicting Player Positions of FIFA 18 Players
In this post, I will use the results of the exploratory analysis from the previous post and try to predict the position of players in FIFA 18 using different machine learning algorithms. As a quick reminder, these were the figures we obtained using PCA, t-SNE and a self organizing map. #used packages library(tidyverse) # for data wrangling libr...
6427 sym R (14564 sym/14 pcs) 10 img 3 tbl
A wild R package appears! Pokemon/Gameboy inspired plots in R
I have to comute quite long every day and I always try to keep occupied with little projects. One of my first projects was to increase my knowledge on how to create R packages. The result of it is Rokemon, a Pokemon/Game Boy inspired package. In this post, I will briefly introduce some functionalities of the package and illustrate how incredible ...
2273 sym R (5429 sym/10 pcs) 14 img
Traveling Beerdrinker Problem
Whenever I participate in a Science Slam, I try to work in an analysis of something typical for the respective city. My next gig will be in Munich, so there are two natural options: beer or football. In the end I choose both, but here I will focus on the former. #used packages library(tidyverse) # for data wrangling library(TSP) #solving Traveli...
3946 sym R (3515 sym/9 pcs) 4 img
SOMs and ggplot
#used packages library(tidyverse) # for data wrangling library(stringr) # for string manipulations library(kohonen) # implements self organizing maps library(ggforce) # for additional ggplot features I introduced self-organizing maps (SOM) in a previous post and since then I am using the kohonen package on a daily basis. However, I pref...
3237 sym R (7431 sym/15 pcs) 14 img
Sample Entropy with Rcpp
Entropy. I still shiver when I hear that word, since I never fully understood that concept. Today marks the first time I was kind of forced to look into it in more detail. And by “in detail”, I mean I found a StackOverflow question that had something to do with a problem I am having (sound familiar?). The problem was is about complexity of ti...
2098 sym R (1418 sym/6 pcs)
Using UMAP in R with rPython
I wrote about dimensionality reduction methods before and now, there seems to be a new rising star in that field, namely the Uniform Manifold Approximation and Projection, short UMAP. The paper can be found here, but be warned: It is really math-heavy. From the abstract: UMAP is constructed from a theoretical framework based in Riemannian geomet...
2421 sym R (1004 sym/5 pcs) 4 img
Analyzing NBA Player Data I: Getting Data
As a football (soccer) data enthusiast, I have always been jealous of the amount of available data for American sports. While much of the interesting football data is proprietary, you can can get virtually anything of interest for the NBA, MLB, NFL or NHL. I have decided to move away from football for a moment and write a little series on Analyzi...
5098 sym R (10565 sym/10 pcs)
Analyzing NBA Player Data II: Clustering Players
This is the second post of my little series Analyzing NBA player data. The first part was concerned with scraping and cleaning player statistics from any NBA season. This post is dealing with gaining some inside in the player stats. In particular, clustering players according to their stats to produce a new set of player positions. #used librarie...
5949 sym R (1380 sym/9 pcs) 8 img
Analyzing NBA Player Data III: Similarity Networks
This is the last part of the mini series Analysing NBA Player data. The first part was concerned with scraping and cleaning player statistics from any NBA season. The second part showed how to use principal component analysis and k means clustering to “revolutionize” player positions. Which kind of failed. Anyway, this third part is now deali...
6094 sym R (2975 sym/12 pcs) 14 img
Fast Fiedler Vector Computation
This is a short post on how to quickly calculate the Fiedler vector for large graphs with the igraph package. #used libraries library(igraph) # for network data structures and tools library(microbenchmark) # for benchmark results Fiedler Vector with eigen My goto approach at the start was using the eigen() function to compute the whole spe...
1805 sym R (1519 sym/6 pcs)