Publications by George Pipis
How to Build a Predictive Soccer Model
We will provide you an example of how you can start building your predictive sport model, specifically for soccer, but you can extend the logic to other sports as well. We will provide the steps that we need to follow: Get the Historical Data Regularly The first thing that we need to do is to get the historical data of the past games, including t...
4076 sym R (2079 sym/5 pcs) 8 img
How to Scrape Data from Euroleague
We will provide you an example of how you can get the results of the Euroleague games in a structured form. The example is from the 2016-2017 season but you can adapt it for any season. What you need is to get the corresponding URL for each team in Euroleague and also to define the period. Let’s start coding: library(tidyverse) library(rvest) ...
834 sym R (3683 sym/1 pcs) 2 img
How to Test for Randomness
I have been contacted by many people asking me to predict the outcome of some events that in theory are random. For example, they want me to predict lottery games like Keno, Lotto, Casino Roulette numbers and so on so forth. My answer is that you cannot predict something which is supposed to be random. No model can give you a better estimate than...
3819 sym R (335 sym/8 pcs) 16 img
Simpson’s Paradox and Misleading Statistical Inference
Back in 2001 when I entered university to study Statistics, our professor told us that “Statistics is a perfect way to tell lies“. This “quote” got my attention and I totally agree with that. I can confirm that I have seen many statistical analyses with a totally opposite statistical inference, sometimes the misleading statistical infere...
6119 sym R (1318 sym/5 pcs) 16 img
St. Petersburg Paradox
The fair premium in lottery games can be defined as the expected pay-off. For example, consider the game where you roll the die once and you get paid the face value in dollars. So, if you roll 1 you get $1, if you roll 2 you get $2, and so on. Then, the fair price to enter the game is the expected payoff which is: \(E[X] = \sum_{i=1}^{6} x \times...
4391 sym R (274 sym/2 pcs)
Linear Regression and Type I Error
Linear Regression Linear regression is a basic approach to modelling the linear relationship between a dependent variable y and one or more independent variables X. The equation of the linear regression is: for each observation i=1,2,…,n. When we run a linear regression model, we conduct hypothesis testing on the regression coefficients. The...
4193 sym R (640 sym/2 pcs) 6 img
Contingency Tables in R
A common way to represent and analyze categorical data is through contingency tables. In this tutorial, we will provide some examples of how you can analyze two-way (r x c) and three-way (r x c x k) contingency tables in R. Dataset For this tutorial, we will work with the Wage dataset from the ISLR package. We will create another column of the Wa...
5579 sym R (4074 sym/24 pcs) 6 img
Pricing of European Options with Monte Carlo
We will show how we can price the European Options with Monte Carlo simulation using R. Recall that the European options are a version of an options contract that limits execution to its expiration date. We will focus on the call and put options only. We assume that the reader is familiar with European Options and the Black Scholes formula. Black...
5524 sym R (3931 sym/8 pcs) 4 img
Example of Pairs Trading
Introduction Statistical arbitrage trading is a quantitative and computational approach to equity trading which is widely applied by hedge funds to produce market-neutral returns. The simplest and most popular version of the strategy is known as pairs trading and involves the identification of pairs of assets that are believed to have some long-r...
5517 sym R (2046 sym/4 pcs) 8 img
How to Report the Distribution of Attributes per Cluster
Let’s say that you have applied your Clustering algorithm and you would like to report the distribution of the categorical variables per cluster in a “tidy” report. Below you can see a suggestion of how you can do it in R. Generate the Data Let’s assume that we came up with 3 clusters such as “C1, C2 and C3” and that we have 3 attribu...
865 sym R (2394 sym/2 pcs)