Publications by George Pipis

How to Share your Notebooks as static websites with AWS S3

16.01.2021

Data Scientists use to work with notebooks like Jupyter and RMarkdown. Through notebooks, they can easily share their analysis in HTML format. But what about when there is a need to share the notebooks publicly? In this case, the most convenient way is to configure an Amazon S3 bucket to function as a static website. In this tutorial, we will pro...

2233 sym R (908 sym/2 pcs) 14 img

How to Compare Objects in R

22.01.2021

In 2020, the guru Hadley Wickham, built a new package called waldo for comparing complex R objects and making it easy to detect the key differences. You can find detailed examples in tidyverse.org as well as at the github. Let’s provide a simple example by comparing two data frames in R with waldo. Compare 2 Data Frames in R Let’s create the...

1353 sym R (279 sym/4 pcs) 2 img

How to Share your Machine Learning Models with Shiny

27.01.2021

We have provided an example of how to Build and Deploy a Machine Learning Web App using Python and Flask. In this tutorial, we will show how you can share your models with your clients and/or colleagues using Shiny. The Scenario Assume that you have built a Machine Learning model and you other people to be able to interact with it. So, let’s sa...

3596 sym R (1927 sym/2 pcs) 6 img

Rolling Regression and Pairs Trading in R

30.01.2021

In a previous post, we have provided an example of Rolling Regression in Python to get the market beta coefficient. We have also provided an example of pairs trading in R. In this post, we will provide an example of rolling regression in R working with the rollRegres package. We will provide an example of getting the beta coefficient between two ...

1788 sym R (1166 sym/3 pcs) 10 img

How to Share Flask APIs with Shiny as Applications

07.02.2021

As a Data Scientist, you may work in both R and Python and it is common to prefer one language over the other for some specific tasks. For example, you may prefer R for Statistics, Data Cleansing and Data Visualizations and you may prefer Python for NLP tasks and Deep Learning. Also, when it comes to Restful APIs, Python Flask APIs have an advan...

3293 sym R (4542 sym/5 pcs) 10 img

How to Split Randomly a Userbase using Modulo

17.02.2021

In many cases, there is a need to split a userbase into 2 or more buckets. For example: UCG: Many companies that run promotional campaigns, in order to quantify and evaluate the performance of the campaigns, create a Universal Control Group (UCG) which is a random sample of the userbase and does not receive any offer or message.Bucketize: For tes...

2843 sym R (1296 sym/9 pcs) 4 img

How to run Logistic Regression on Aggregate Data in R

19.02.2021

We will provide an example of how you can run a logistic regression in R when the data are grouped. Let’s provide some random sample data of 200 observations. library(tidyverse) set.seed(5) df<-tibble(Gender = as.factor(sample(c("m","f"), 200, replace = TRUE, prob=c(0.6,0.4))), Age_Group = as.factor(sample(c("[<30]","[30-65]", "[65...

1542 sym R (6097 sym/14 pcs)

How to Build a Predictive Model for NBA Games

07.03.2021

Introduction In this tutorial, we will provide an example of how you can build a starting predictive model for NBA Games. The steps are the following: Scrape the game results from the ESPN for each team.Transform the data, generate some features and get the running totals of each team per game.Build the Predictive ModelMake Predictions Scrape th...

3728 sym R (5102 sym/7 pcs) 20 img

How to Visualize Multivariate Data Analysis

18.03.2021

In this tutorial, we will work with the factoextra R package and we will consider the Country dataset. Let’s start: library(factoextra) df<-read.csv("DataCountries.txt", sep="\t") head(df) PCA Analysis Now we will run a PCA analysis on our dataset. Note that we need to include only the numeric variables. We will also set as row names the col...

1339 sym R (1155 sym/9 pcs) 16 img

Candlestick Charts in R

18.03.2021

We have written many posts related to stocks. A good way to represent the stock prices as time series is with Candlestick Charts. Let’s see how we can easily produce candlestick charts with R. We will work with the quantmod library and with the AMZN stock. library("quantmod") # get the Amazon Stock data Amazon<-getSymbols("AMZN", ...

667 sym R (391 sym/3 pcs) 8 img