Publications by George Pipis
How to Generate Correlated Data in R
Sometimes we need to generate correlated data for exhibition purposes, technical assessments, testing etc. We have provided a walk-through example of how to generate correlated data in Python using the scikit-learn library. In R, as far as I know, there is not any library that allows us to generate correlated data. For that reason, we will work w...
2542 sym R (1363 sym/5 pcs) 6 img
10 Tips And Tricks For Data Scientists Vol.7
We have started a series of articles on tips and tricks for data scientists (mainly in Python and R). In case you have missed: Vol.1Vol.2Vol.3Vol.4Vol.5Vol.6 Python 1.Differences Between Numpy Arrays and Python Lists There are some differences between Numpy Arrays and Python Lists. We will provide some examples of algebraic operators. ‘+’ ...
5884 sym R (5826 sym/37 pcs) 4 img
How to Compare Nested Models in R
Using R and the anova function we can easily compare nested models. Where we are dealing with regression models, then we apply the F-Test and where we are dealing with logistic regression models, then we apply the Chi-Square Test. By nested, we mean that the independent variables of the simple model will be a subset of the more complex model. In ...
2549 sym R (335 sym/5 pcs) 14 img
Tips And Tricks For Data Scientists Vol.8
We have started a series of articles on tips and tricks for data scientists (mainly in Python and R). In case you have missed: Vol.1Vol.2Vol.3Vol.4Vol.5Vol.6Vol.7 R 1.How To Remove The Correlated Variables From A Data Frame When we build predictive models, we use to remove the high correlated variables (multi-collinearity). The point is to kee...
3329 sym R (2875 sym/16 pcs) 12 img
10 Tips and Tricks for Data Scientists Vol.9
We have started a series of articles on tips and tricks for data scientists (mainly in Python and R). In case you have missed: Vol.1Vol.2Vol.3Vol.4Vol.5Vol.6Vol.7Vol.8 R 1.How To Write File Paths If we want to write file paths that work in every operating system, like Linux, OS, Microsoft, we can work with the file.path() command. Let’s sa...
4660 sym R (2740 sym/33 pcs) 10 img
Who is going to Win the Euro 2020
We have reached the knock-out phase of Euro 2020 (or 2021) where the final-16 teams and the games can be shown below: The question is who is going to be the Euro 2020 Winner. Although we cannot predict the Winner, we can estimate the probabilities of each team to win the Euro. The Methodology This is a very simple model that is based on UEFA R...
1830 sym R (4447 sym/1 pcs) 8 img 1 tbl
Euro 2020 Predictive Model based on FIFA Ranking System
In a previous post, we built a Predictive Model based on FIFA Ranking and making the assumption that the points follow a normal distribution. If we look closer at FIFA’s Ranking Model we will see that it is based on the ELO System where the expected result of the game can be extracted from the following formula: Simulate the Final-16 Phase Bas...
973 sym R (4295 sym/2 pcs) 4 img
Get the Odds of Euro 2020 Games based on FIFA World Ranking
We will provide an example of how you can estimate the outcome of a Euro 2020 Game based on FIFA World Ranking. The current calculation method applied on 10 June 2018 and is based on the Elo rating system and after each game points will be added to or subtracted from a team’s rating according to the formula: The Expected Result of a Game The ...
2402 sym R (557 sym/8 pcs) 10 img
10 Tips and Tricks for Data Scientists Vol.10
We have started a series of articles on tips and tricks for data scientists (mainly in Python and R). In case you have missed: Vol.1Vol.2Vol.3Vol.4Vol.5Vol.6Vol.7Vol.8Vol.9 Python 1.How to Get The Key of the Maximum Value in a Dictionary d={"a":3,"b":5,"c":2} (max(d, key=d.get)) b 2.How to Sort a Dictionary by Values Assume that we have the ...
4643 sym R (1500 sym/16 pcs) 16 img
Euro Semi-Finals: England is the Favorite!
Using the FIFA World Ranking and the Elo rating system we will try to estimate the probability of England winning its first Euro in history! The expected result of a game is given by the formula: where dr is the difference between two teams’ ratings before the game. Let’s see the function of the Winning Probability versus the Ranking Dif...
2189 sym R (383 sym/4 pcs) 4 img