Publications by Learning Machines
Learning Data Science: Predicting Income Brackets
As promised in the post Learning Data Science: Modelling Basics we will now go a step further and try to predict income brackets with real world data and different modelling approaches. We will learn a thing or two along the way, e.g. about the so-called Accuracy-Interpretability Trade-Off, so read on… The data we will use is from here: Marketi...
4844 sym R (12938 sym/8 pcs) 20 img
Inverse Statistics – and how to create Gain-Loss Asymmetry plots in R
Asset returns have certain statistical properties, also called stylized facts. Important ones are: Absence of autocorrelation: basically the direction of the return of one day doesn’t tell you anything useful about the direction of the next day. Fat tails: returns are not normal, i.e. there are many more extreme events than there would be if r...
3249 sym R (1296 sym/3 pcs) 10 img
Symbolic Regression, Genetic Programming… or if Kepler had R
A few weeks ago we published a post about using the power of the evolutionary method for optimization (see Evolution works!). In this post we will go a step further, so read on… A problem researchers often face is that they have an amount of data and need to find some functional form, e.g. some kind of physical law, for it. The standard approac...
3668 sym R (3671 sym/9 pcs) 16 img
Separating the Signal from the Noise: Robust Statistics for Pedestrians
One of the problems of navigating an autonomous car through a city is to extract robust signals in the face of all the noise that is present in the different sensors. Just taking something like an arithmetic mean of all the data points could possibly end in a catastrophe: if a part of a wall looks similar to the street and the algorithm calculate...
4516 sym R (2363 sym/3 pcs) 6 img
Base Rate Fallacy – or why No One is justified to believe that Jesus rose
In this post we are talking about one of the most unintuitive results of statistics: the so called false positive paradox which is an example of the so called base rate fallacy. It describes a situation where a positive test result of a very sensitive medical test shows that you have the respective disease… yet you are most probably healthy! Th...
4590 sym R (1436 sym/4 pcs) 40 img
Google’s Eigenvector… or how a Random Surfer finds the most relevant Webpages
Like most people you will have used a search engine lately, like Google. But have you ever thought about how it manages to give you the most fitting results? How does it order the results so that the best are on top? Read on to find out! The earliest search engines either had human curated indices, like Yahoo! or used some simple heuristic like t...
6246 sym R (3671 sym/9 pcs) 26 img
The Rich didn’t earn their Wealth, they just got Lucky
Tomorrow, on the First of May, many countries celebrate the so called International Workers’ Day (or Labour Day): time to talk about the unequal distribution of wealth again! A few months ago I posted a piece with the title “If wealth had anything to do with intelligence…” where I argued that ability, e.g. intelligence, as an input has no...
5668 sym R (453 sym/2 pcs) 8 img
Backtest Trading Strategies like a real Quant
R is one of the best choices when it comes to quantitative finance. Here we will show you how to load financial data, plot charts and give you a step-by-step template to backtest trading strategies. So, read on… We begin by just plotting a chart of the Standard & Poor’s 500 (S&P 500), an index of the 500 biggest companies in the US. To get th...
4307 sym R (5046 sym/6 pcs) 16 img
Was the Bavarian Abitur too hard this time?
Bavaria is known for its famous Oktoberfest… and within Germany also for its presumably difficult Abitur, a qualification granted by university-preparatory schools in Germany. A mandatory part for all students is maths. This year many students protested that the maths part was way too hard, they even started an online petition with more than se...
2633 sym R (265 sym/3 pcs) 11 img
Learning R: The Ultimate Introduction (incl. Machine Learning!)
There are a million reasons to learn R (see e.g. Why R for data science – and not Python?), but where to start? I present to you the ultimate introduction to bring you up to speed! So read on… I call it ultimate because it is the essence of many years of teaching R… or put differently: it is the kind of introduction I would have liked to ha...
5697 sym R (18958 sym/26 pcs) 10 img