Publications by Greski, Len

Reading Excel files in R

12.06.2020

Reading Excel Files: A comparison of R packages Background Recently a person posed a question on Stackoverflow about four of the packages that are used to read Microsoft Excel files, including: readxl, openxlsx, xlsx, and XLConnect. The person who wrote the question wanted to know about functionality that was unique to a single package, such as...

6749 sym R (2110 sym/5 pcs) 2 tbl

Calculating rates per million U.S. State population

20.06.2020

A user on Stackoverflow recently asked about how to convert a data set containing crime statistics (e.g. auto accidents, shootings, etc) into rates per million population by U.S. state. To calculate crime rates one needs to merge a source of U.S. state level population data with the data frame containing the event by state data. Fortunately, Unit...

2110 sym R (1826 sym/5 pcs) 2 img

R Objects, S Objects, and Lexical Scoping

27.06.2020

S Objects, R Objects, and Lexical Scoping Two key R design principles related to objects and lexical scoping are summarized in the following quote from John Chambers: To understand computations in R, two slogans are helpful:    – Everything that exists is an object, and    – Everything that happens is a function call. John Chambers, ...

9194 sym 6 img

Estimating Runtime for an R script

30.06.2020

Estimating the runtime of an R script Background Recently a person on StackOverflow asked a question about how to estimate the runtime of an R script. She was attempting to produce corelation tests for 60 questions in a survey, using the corr.test() function from the psych package. The answers from each question were coded as 5 point scales from ...

6256 sym R (4115 sym/11 pcs) 1 tbl

Plotting multiple time series in a single plot

04.09.2020

Plotting multiple time series in a single plot Recently a person posed a question on Stackoverflow about how to combine multiple time series into a single plot within the ggplot2 package. The question referenced another Stackoverflow answer for a similar type of question, but the person who posted the new question wasn’t able to apply the other...

4581 sym R (3193 sym/9 pcs) 4 img

Monte Carlo Simulation of Bernoulli Trials in R

26.11.2020

Background A user on Stackoverflow recently asked a question about a program to generate Monte Carlo simulations on Bernoulli trials to calculate coverage percentages using Wald confidence intervals. One of the problems in the code is that probability value calculations are executed on individual observations rather than sums of successes and fai...

10696 sym R (10323 sym/23 pcs) 2 img

caret::createFolds() vs. createMultiFolds()

27.11.2020

Summary Recently a user posted a question on Stackoverflow, asserting that caret::createFolds() behaves differently than createMultiFolds(). The questioner argued that while createFolds() samples without replacement, createMultiFolds() samples with replacement. Our analysis demonstrates that the two functions behave consistently, creating k fold...

3302 sym R (1841 sym/10 pcs)