Publications by George Pipis
R Exercises – Interview Questions (Stats and Simulation)
For Data Science positions that require some knowledge of Statistics and Programming skills, is common to ask questions like those below. Question 1 Suppose an urn contains 40 red, 25 green and 35 blue balls. Balls are drawn from the urn one-by-one, at random and without replacement. Let \(N\) denote the draw at which the first blue ball appears...
1470 sym R (550 sym/2 pcs)
Avoid apply() function in large datasets
When we are dealing with large datasets and there is a need to calculate some values like the row/column min/max/rank/mean etc we should avoid the apply function because it takes a lot of time. Instead, we can use the matrixStats package and its corresponding functions. Let’s provide some comparisons. Example of Minimum value per Row Assume tha...
1088 sym R (533 sym/2 pcs) 2 img
Interview questions about Stats and Probabilities
For Data Science positions, during the interview process, is common to ask questions about Statistics and Probabilities. We will provide some potential interview questions and their indicative solutions. Question 1: Assuming that X follows the Normal Distribution with Mean=0.545 and Standard Deviation=0.155 find the probability that X exceeds ...
2450 sym R (161 sym/4 pcs)
How to Backtest your Crypto Trading Strategies in R
Few words about Trading Strategies One of the biggest challenges is to predict the Market. Many people have developed their own trading strategies, some of them are advanced based on Machine Learning and Artificial Intelligence algorithms like LSTM, xgBoost, Random Forest etc, some others are based on Statistical models like ARIMA, and some othe...
3572 sym R (2388 sym/2 pcs) 1 tbl
Covid19: Correlation Between Confirmed Cases and Deaths
What is the daily correlation of Confirmed versus Death Cases in Covid-19. In other words, the people who have passed away, on average, how many days ago they have been reported (i.e. “Confirmed”) as Covid-19 new cases. To answer this question, we can take the correlation between the Daily Confirmed vs Daily Deaths and trying different lag v...
2297 sym R (2331 sym/4 pcs) 8 img 4 tbl
Permutations in R
During the interview process for Data Science positions, it is likely to be asked to calculate Combinations or Permutations. Today we will provide an example of how we can solve numerically permutation problems in R. Find all Permutations of the word baboon Mathematically we can approach this question as follows: \(P=\frac{n!}{n_1! n_2! n_3!…...
1817 sym R (518 sym/2 pcs)
Multi-Armed Bandit with Thompson Sampling
Few words about Thompson Sampling Thompson Sampling is an algorithm for decision problems where actions are taken in sequence balancing between exploitation which maximizes immediate performance and exploration which accumulates new information that may improve future performance. There is always a trade-off between exploration and exploitation i...
6162 sym R (1496 sym/4 pcs) 6 img
How to Connect R with SQL
Need to Connect R with SQL It is common for Data Analysts/Scientists to connect R with SQL. For that reason, there exist many different packages designed for different Databases like PostgreSQL, MySQL etc. My suggestion is to work with the DBI package which is compatible with almost all the Databases. Example of How to Connect R with SQL As al...
3146 sym R (2400 sym/10 pcs) 2 img
The fastest way to Read and Writes file in R
Compare Read and Write files time When we are dealing with large datasets, and we need to write many csv files or when the csv filethat we hand to read is huge, then the speed of the read and write command is important. We will compare the required time to write and read files of the following cases: base packagedata.tablereadr Compare the Write ...
1497 sym R (555 sym/2 pcs) 6 img
How to Convert Continuous variables into Categorical by Creating Bins
A very common task in data processing is the transformation of the numeric variables (continuous, discrete etc) to categorical by creating bins. For example, is quite ofter to convert the age to the age group. Let’s see how we can easily do that in R. We will consider a random variable from the Poisson distribution with parameter λ=20 library(...
1587 sym R (646 sym/5 pcs) 10 img