Publications by George Pipis
The Benjamini-Hochberg procedure (FDR) and P-Value Adjusted Explained
In this tutorial, we will show you how to apply the Benjamini-Hochberg procedure in order to calculate the False Discovery Rate (FDR) and the P-Value Adjusted. The Benjamini-Hochberg procedure, also known as the False Discovery Rate (FDR) procedure, is a statistical method used in multiple hypothesis testing to control the expected proportion of fa...
4915 sym Python (1032 sym/8 pcs) 2 img
How to apply the Mann-Whitney U Test in R
In statistics, the Mann–Whitney U test (also called Wilcoxon rank-sum test) is a nonparametric test of the null hypothesis that it is equally likely that a randomly selected value from one population will be less than or greater than a randomly selected value from a second population. This test can be used to investigate whether two independent...
3059 sym R (2442 sym/1 pcs) 1 tbl
Report Coronavirus (COVID-19) in R
This post is about COVID-19 and we will an example of how you can get the data of the daily “confirmed”, “recovered” and “death” cases by country. In essence, we will show you how you can have access to the data used by Johns Hopkins Report and you can easily run your own reports and analysis. The coronavirus package provides detaile...
1943 sym R (7010 sym/10 pcs) 8 img 1 tbl
Web Scraping worldometers for Coronavirus
One of the most popular web pages about Covid-19 is the worldometers which provides a detailed report about Coronavirus cases by country. Today, we will show how we can use R to Web Scrape the summary table of the site. library(tidyverse) library(rvest) url <- "https://www.worldometers.info/coronavirus/" my_table<-url%>%read_html()%>%html_table...
788 sym R (881 sym/2 pcs)
How to Impute Missing Values in R
In the real data world, it is quite common to deal with Missing Values (known as NAs). Sometimes, there is a need to impute the missing values where the most common approaches are: Numerical Data: Impute Missing Values with mean or medianCategorical Data: Impute Missing Values with mode Let’s give an example of how we can impute dynamically dep...
1382 sym R (1821 sym/3 pcs)
R: How To Assign Values Based On Multiple Conditions Of Different Columns
In the previous post, we showed how we can assign values in Pandas Data Frames based on multiple conditions of different columns. Again we will work with the famous titanic dataset and our scenario is the following: If the Age is NAand Pclass=1 then the Age=40If the Age is NAand Pclass=2 then the Age=30If the Age is NAand Pclass=3 the...
1022 sym R (568 sym/3 pcs) 2 img
How to determine the number of Clusters for K-Means in R
We will work with the Breast Cancer Wisconsin dataset, where we will apply the K-Means algorithm to the individual’s features ignoring the dependent variable diagnosis. Notice that all the features are numeric. library(tidyverse) # the column names of the dataset names <- c('id_number', 'diagnosis', 'radius_mean', 'texture_mean'...
3300 sym R (3294 sym/8 pcs) 16 img
How to build Stacked Ensemble Models in R
At this post, we will show you how you easily apply Stacked Ensemble Models in R using the H2O package. The models can treat both Classification and Regression problems. For this example, we will apply a classification problem, using the Breast Cancer Wisconsin dataset which can be found here. Description of the Stacked Ensemble Models The step...
2705 sym R (2631 sym/1 pcs) 2 img 1 tbl
How to Apply Text Distances and Fuzzy Joins
Edit Distance for measuring the Text Distance Today we will talk about text similarities and how we can “calculate” a distance measure between two texts. For example, intuitively we know that the text “cats” is close to “rats” since they differ in one letter (Note: Forget about the meaning of the words). A popular approach of distance...
4344 sym R (2510 sym/8 pcs)
How to get Cryptocurrency prices in R
Today, we will show how you can easily get the Cryptocurrency Prices using R. We will work with the crypto package. You can install the crypto package from GitHub: # Installing via Github devtools::install_github("jessevent/crypto") How to get the list of the Cryptocurrencies library(crypto) library(tidyverse) list_coins<-crypto_list() # Prin...
776 sym R (737 sym/3 pcs) 1 tbl