Publications by matloff

More on the Heteroscedasticity Issue

22.09.2015

In my last post, I dsciussed R software, including mine, that handles heteroscedastic settings for linear and nonlinear regression models. Several readers had interesting comments and questions, which I will address here. To review: Though most books and software assume homoscedasticity, i.e. constancy of the variance of the response variable at ...

3427 sym R (206 sym/1 pcs) 4 img

Unbalanced Data Is a Problem? No, BALANCED Data Is Worse

29.09.2015

Say we are doing classification analysis with classes labeled 0 through m-1. Let Ni be the number of observations in class i. There is much handwringing in the machine learning literature over situations in which there is a wide variation among the Ni. I will argue here, though, that the problem is much worse in the case in which there is — art...

4046 sym 4 img

A New Method for Statistical Disclosure Limitation, I

15.10.2015

The Statistical Disclosure Limitation (SDL) problem involves modifying a data set in such a manner that statistical analysis on the modified data is reasonably close to that performed on the original data, while preserving the privacy of individuals in the data set. For instance, we might have a medical data set on which we want to allow research...

5840 sym R (263 sym/2 pcs) 4 img 1 tbl

Partools, Recommender Systems and More

15.11.2015

Recently I attended a talk by Stanford’s Art Owen, presenting work done with his student, Katelyn Gao. This talk touched on a number of my interests, both mathematical and computational. What particularly struck me was that Art and Katelyn are applying a very old — many would say very boring — method to a very modern, trendy application: re...

7806 sym R (516 sym/2 pcs) 4 img

Back to the BLAS Issue

21.11.2015

A few days ago, I wrote here about how some researchers, such Art Owen and Katelyn Gao at Stanford and Patrick Perry at NYU, have been using an old, old statistical technique — random effects models — for a new, new application — recommender systems. In addition to describing their approach to that problem, I also used this setting as an ex...

3335 sym 4 img

OVA vs. AVA in Classification Problems, via regtools

02.12.2015

OVA and AVA? Huh? These stand for One vs. All and All vs. All, in classification problems with more than 2 classes. To illustrate the idea, I’ll use the UCI Vertebral Column data and Letter Recognition Data, and analyze them using my regtools package. As some of you know, I’m developing the latter in conjunction with a book I’m writing on ...

4412 sym R (952 sym/2 pcs) 4 img

The Method of Boosting

08.12.2015

One of the techniques that has caused the most excitement in the machine learning community is boosting, which in essence is a process of iteratively refining, e.g. by reweighting, of estimated regression and classification functions (though it has primarily been applied to the latter), in order to improve predictive ability. Much has been made o...

6064 sym R (163 sym/1 pcs) 4 img

The Generalized Method of Moments and the gmm package

20.12.2015

An almost-as-famous alternative to the famous Maximum Likelihood Estimation is the Method of Moments. MM has always been a favorite of mine because it often requires fewer distributional assumptions than MLE, and also because MM is much easier to explain than MLE to students and consulting clients. CRAN has a package gmm that does MM, actually th...

2809 sym R (588 sym/4 pcs) 6 img

Some Comments on Donaho’s “50 Years of Data Science”

23.01.2016

An old friend recently called my attention to a thoughtful essay by Stanford statistics professor David Donaho, titled “50 Years of Data Science.” Given the keen interest these days in data science, the essay is quite timely. The work clearly shows that Donaho is not only a grandmaster theoretician, but also a statistical philosopher. The pap...

6675 sym 4 img

50% Draft of Forthcoming Book Available

01.03.2016

As I’ve mentioned here a couple of times, I am in the midst of writing a book, From Linear Models to Machine Learning: Regression and Classification, with Examples in R. As has been my practice with past books, I have now placed a 50% rough draft of the book on the Web. You will see even from this partial version that I take a very different ap...

995 sym 4 img