Publications by George Pipis

The Benjamini-Hochberg procedure (FDR) and P-Value Adjusted Explained

13.07.2023

In this tutorial, we will show you how to apply the Benjamini-Hochberg procedure in order to calculate the False Discovery Rate (FDR) and the P-Value Adjusted. The Benjamini-Hochberg procedure, also known as the False Discovery Rate (FDR) procedure, is a statistical method used in multiple hypothesis testing to control the expected proportion of fa...

4915 sym Python (1032 sym/8 pcs) 2 img

How to apply the Mann-Whitney U Test in R

08.03.2020

In statistics, the Mann–Whitney U test (also called Wilcoxon rank-sum test) is a nonparametric test of the null hypothesis that it is equally likely that a randomly selected value from one population will be less than or greater than a randomly selected value from a second population. This test can be used to investigate whether two independent...

3059 sym R (2442 sym/1 pcs) 1 tbl

Report Coronavirus (COVID-19) in R

08.03.2020

This post is about COVID-19 and we will an example of how you can get the data of the daily “confirmed”, “recovered” and “death” cases by country. In essence, we will show you how you can have access to the data used by Johns Hopkins Report and you can easily run your own reports and analysis. The coronavirus package provides detaile...

1943 sym R (7010 sym/10 pcs) 8 img 1 tbl

Web Scraping worldometers for Coronavirus

12.04.2020

One of the most popular web pages about Covid-19 is the worldometers which provides a detailed report about Coronavirus cases by country. Today, we will show how we can use R to Web Scrape the summary table of the site. library(tidyverse) library(rvest) url <- "https://www.worldometers.info/coronavirus/" my_table<-url%>%read_html()%>%html_table...

788 sym R (881 sym/2 pcs)

How to Impute Missing Values in R

18.04.2020

In the real data world, it is quite common to deal with Missing Values (known as NAs). Sometimes, there is a need to impute the missing values where the most common approaches are: Numerical Data: Impute Missing Values with mean or medianCategorical Data: Impute Missing Values with mode Let’s give an example of how we can impute dynamically dep...

1382 sym R (1821 sym/3 pcs)

R: How To Assign Values Based On Multiple Conditions Of Different Columns

12.05.2020

In the previous post, we showed how we can assign values in Pandas Data Frames based on multiple conditions of different columns. Again we will work with the famous titanic dataset and our scenario is the following: If the Age is NAand Pclass=1 then the Age=40If the Age is NAand Pclass=2 then the Age=30If the Age is NAand Pclass=3 the...

1022 sym R (568 sym/3 pcs) 2 img

How to determine the number of Clusters for K-Means in R

17.05.2020

We will work with the Breast Cancer Wisconsin dataset, where we will apply the K-Means algorithm to the individual’s features ignoring the dependent variable diagnosis. Notice that all the features are numeric. library(tidyverse) # the column names of the dataset names <- c('id_number', 'diagnosis', 'radius_mean', 'texture_mean'...

3300 sym R (3294 sym/8 pcs) 16 img

How to build Stacked Ensemble Models in R

21.05.2020

At this post, we will show you how you easily apply Stacked Ensemble Models in R using the H2O package. The models can treat both Classification and Regression problems. For this example, we will apply a classification problem, using the Breast Cancer Wisconsin dataset which can be found here. Description of the Stacked Ensemble Models The step...

2705 sym R (2631 sym/1 pcs) 2 img 1 tbl

How to Apply Text Distances and Fuzzy Joins

25.05.2020

Edit Distance for measuring the Text Distance Today we will talk about text similarities and how we can “calculate” a distance measure between two texts. For example, intuitively we know that the text “cats” is close to “rats” since they differ in one letter (Note: Forget about the meaning of the words). A popular approach of distance...

4344 sym R (2510 sym/8 pcs)

How to get Cryptocurrency prices in R

07.08.2020

Today, we will show how you can easily get the Cryptocurrency Prices using R. We will work with the crypto package. You can install the crypto package from GitHub: # Installing via Github devtools::install_github("jessevent/crypto") How to get the list of the Cryptocurrencies library(crypto) library(tidyverse) list_coins<-crypto_list() # Prin...

776 sym R (737 sym/3 pcs) 1 tbl