Publications by matloff

The “Secret Sauce” Used in Many qeML Functions

22.11.2023

In writing an R package, it is often useful to build up some function call in string form, then “execute” the string. To give a really simple example: > s <- '1+1' > eval(parse(text=s)) [1] 2 Quite a lot of trouble to go to just to find that 1+1 = 2? Yes, but this trick can be extremely useful, as we’ll see here. data(svcensus) z <- qePCA(svc...

2383 sym R (467 sym/6 pcs)

qeML Example: Issues of Overfitting, Dimension Reduction Etc.

21.11.2023

What about variable selection? Which predictor variables/features should we use? No matter what anyone tells you, this is an unsolved problem. But there are lots of useful methods. See the qeML vignettes on feature selection and overfitting for detailed background on the issues involved. We note at the outset what our concluding statement will be: ...

3956 sym R (554 sym/6 pcs)

New Package, New Book!

18.11.2023

Sorry I haven’t been very active on this blog lately, but now that I have more time, that will change. I’ve got myriad things to say. To begin with, then, I’ll announce a major new R package, and my new book. qeML package (“quick and easy machine learning”) Featured aspects: Now on CRAN, https://cran.r-project.org/package=qeML. See GitH...

1528 sym

Just How Good Is ChatGPT in Data Science?

04.12.2022

Many of you may have heard of ChatGPT, a dazzling new AI tool. We are hearing lots of gushing praise for the tool. Well, how well does it do in data science contexts? I tried a few queries here, and found interesting results. I first requested, “Write an R function that returns every other element of a vector x, starting with the third.” I wo...

3308 sym 2 img

New Statistics Tutorial

30.12.2022

I’ve recently completed fastStat, https://github.com/matloff/fastStat,a quick introduction to statistics for those who’ve had a calculus-based probability course. Many such people later need to do statistics, and this will give them quick access. It is modeled after my R tutorial, https://github.com/matloff/fasteR, a quick introduction to R....

2074 sym

A New Approach to Fairness in Machine Learning

15.08.2022

During the last year or so, I’ve been quite interested in the issue of fairness in machine learning. This area is more personal for me, as it is the confluence of several interests of mine: My lifelong activity in probability theory, math stat and stat methodology (in which I include ML).My lifelong activism aimed at achieving social justice.My...

1613 sym

Base-R and Tidyverse Code, Side-by-Side

24.08.2022

I have a new short writeup, showing common R design patterns, implemented side-by-side in base-R and Tidy. As readers of this blog know, I strongly believe that Tidy is a poor tool for teaching R learners who have no coding background. Relative to learning in a base-R environment, learners using Tidy take longer to become proficient, and once pro...

1255 sym

Base-R Is Alive and Well

06.08.2022

As many readers of this blog know, I strongly believe that R learners should be taught base-R, not the tidyverse. Eventually the students may settle on using a mix of the two paradigms, but at the learning stage they will benefit from the fact that base-R is simple and more powerful. I’ve written my thoughts in a detailed essay. One of the most...

2663 sym 1 tbl

Valuable Webinar in Parallel Computing in R

10.08.2022

George Ostrouchov, one of R’s top parallel computing experts, will run a workshop on cluster parallel computing in R next week. BTW, even a multicore laptop is a “cluster,” so anyone can apply this material to their own work even if they don’t have access to a larger multimachine cluster. Related To leave a comment for the author, pleas...

691 sym

Use of Differential Privacy in the US Census–All for Nothing?

01.09.2022

The field of data privacy has long been of broad interest. In a medical database, for instance, how can administrators enable statistical analysis by medical researchers, while at the same time protecting the privacy of individual patients? Over the years, many methods have been proposed and used. I’ve done some work in the area myself. But in ...

2733 sym