Publications by matloff

Update on Snowdoop, a MapReduce Alternative

29.05.2015

In blog posts a few months ago, I proposed an alternative to MapReduce, e.g. to Hadoop, which I called “Snowdoop.” I pointed out that systems like Hadoop and Spark are very difficult to install and configure, are either too primitive (Hadoop)  or too abstract (Spark) to program, and above all, are SLOW. Spark is of course a great improvement...

4425 sym R (715 sym/4 pcs) 4 img

Discovered Two Great Web Sites Today

03.06.2015

Today is my lucky day.  I learned of two very interesting Web pages, both of them quite informative and the first of them rather provocative (yay!). I have some comments on both, in some cases consisting of mild disagreement, which I may post later, but in any event, I highly recommend both.  Here they are: Drew Schmidt’s take on parallel co...

882 sym 4 img

Macros in R

05.06.2015

In programming, sometimes it’s useful to write a macro rather than a function. (Don’t worry if you’ve never heard the term before.) In this post, I’ll give an example of use of macros in R. using the gtools package on CRAN. I wanted to write some utility code to help me reuse my earlier R commands during an interactive R session. Most (...

3263 sym R (129 sym/2 pcs) 4 img

Heteroscedasticity in Regression — It Matters!

07.06.2015

R’s main linear and nonlinear regression functions, lm() and nls(), report standard errors for parameter estimates under the assumption of homoscedasticity, a fancy word for a situation that rarely occurs in practice. The assumption is that the (conditional) variance of the response variable is the same at any set of values of the predictor var...

3760 sym R (1293 sym/4 pcs) 4 img

CACM Highlights R

23.07.2015

The Association for Computing Machinery is the main professional organization for computer science, largely for academia but still with a broad membership. ACM publishes a number of journals, most of them for research but its flagship publication is a magazine, the Communications of the ACM. The current issue of the CACM includes an article, “B...

2638 sym 4 img

partools: a Sensible R Package for Large Data Sets

05.08.2015

As I mentioned recently, the new, greatly extended version of my partools package is now on CRAN. (The current version on CRAN is 1.1.3, whereas at the time of my previous announcement it was only 1.1.1. Note that Unix is NOT required.) It is my contention that for most R users who work with large data,  partools — or methods like it — is a...

4265 sym 4 img

Partools 1.1.4

21.08.2015

Partools 1.1.4 is now on GitHub. The main change this time is enhancement of the debugging facilities (which work not only for partools but also the cluster-based portion of R’s parallel package in general). As some of you know, I place huge importance on debugging, so much so that I wrote a book on it (The Art of Debugging with GDB, DDD, and E...

2198 sym 4 img

Exciting userR! 2016 Conference

12.09.2015

The 2016 meeting of the annual useR! conference will be held in June at Stanford University. This is a fantastic venue, and we believe it may be the largest useR! meeting to date. See the above link for details! Related To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist. R-bloggers.com ...

616 sym 4 img

New R Software/Methodology for Handling Missing Dat

16.09.2015

I’ve added some missing-data software to my regtools package on GitHub. In this post, I’ll give an overview of missing-data methodology, and explain what the software does. For details, see my JSM paper, jointly authored with my student Xiao (Max) Gu. There is a long history of development of techniques for handling missing data. See the fam...

4532 sym 4 img 1 tbl

Can You Say “Heteroscedasticity” 3 Times Fast?

18.09.2015

Most books on regression analysis assume homoscedasticity, the situation in which Var(Y | X = t), for a response variable Y and vector of predictor variables X, is the same for all t. Yet, needless to say, almost all data in real life is heteroscedastic. For Y = human weight and X = height, say, we know that the assumption of homoscedasticity can...

2299 sym R (514 sym/2 pcs) 4 img