Publications by matloff

R beats Python! R beats Julia! Anyone else wanna challenge R?

21.05.2014

Before I left for China a few weeks ago, I said my next post would be on our Rth parallel R package. It’s not quite ready yet, so today I’ll post one of the topics I spoke on last night at the Berkeley R Language Beginners Study Group. Thanks to the group for inviting me, and thanks to Allan Miller for suggesting I address this topic. A cou...

6490 sym R (440 sym/1 pcs) 4 img

Rth: a Flexible Parallel Computation Package for R

17.06.2014

I’ve been mentioning here that I’ll be discussing a new package, Rth, developed by me and Drew Schmidt, the latter of pbdR fame.  It’s now ready for use!  In this post, I’ll explain what goals Rth has, and how to use it. Platform Flexibility The key feature of Rth is in the word flexible in the title of this post, which refers to the fa...

4915 sym 4 img

A Handy Trick for Remote Graphics

22.07.2014

I often create plots that require quite a bit of computation.  Ideally I would run this on what I’ll call Machine A, which is a very fast machine, but I am often far away, on Machine B.  So, I’d like to run my computation on B but display it on A. For the platforms I use (Linux, Mac), I can simply use X11 forwarding, by typing ssh -Y at B t...

2271 sym 4 img

Code Snippet: Extracting a Subsample from a Large File

01.08.2014

Last week a reader of the r-help mailing list posted a query titled “Importing random subsets of a data file.”  With a very large file, it is often much easier and faster–and really, just as good–to just work with a much smaller subset of the data. Fellow readers then posted rather sophisticated solutions, such as storing the file in a ...

2660 sym R (476 sym/1 pcs) 4 img

New freqparcoord Example

05.08.2014

In my JSM talk this morning, I spoke about work done by Yingkang Xie and myself, on a novel approach to the parallel coordinates method of visualization.  I’ve made several posts to this blog in the past on freqparcoord, our implemention of our method. My talk this morning used some recently-available NYC taxi data.  You may find the discover...

829 sym 4 img

A Matrix Powers Package, and Some General Edifying Material on R

16.08.2014

Here I will introduce matpow, a package to flexibly and conveniently compute matrix powers.  But even if you are not interested in matrices, I think many of you will find that this post contains much general material on R that you’ll find useful.  Indeed, most of this post will be about general R issues, not so much about matrices per se.  S...

8549 sym R (194 sym/3 pcs) 4 img

Statistics: Losing Ground to CS, Losing Image Among Students

26.08.2014

The American Statistical Association (ASA)  leadership, and many in Statistics academia. have been undergoing a period of angst the last few years,  They worry that the field of Statistics is headed for a future of reduced national influence and importance, with the feeling that: The field is to a large extent being usurped by other discipline...

14616 sym 4 img

Good for TI, Good for Schools, Bad for Kids, Bad for Stat

06.09.2014

In my last post, I agreed with Prof. Xiao-Li Meng that Advanced Placement (AP) Statistics courses turn off many students to the statistics field, by being structured in a manner that makes for a boring class.  I cited as one of the problems the fact that the course officially requires TI calculators.  This is a sad waste of resources, as the ma...

4138 sym 4 img

Count Your BLAS-ings

19.11.2014

One nice thing about open-source software is that users often have a lot of choices.  Such is the case with R, for instance the thousands of contributed packages available on CRAN.  My focus here is on BLAS, the core of matrix operations in R, where again there are interesting choices available to users who wish to take advantage of them. The B...

5045 sym R (75 sym/1 pcs) 4 img

How About a “Snowdoop” Package?

26.11.2014

Along with all the hoopla on Big Data in recent years came a lot of hype on Hadoop.  This eventually spread to the R world, with sophisticated packages being developed such as rmr to run on top of Hadoop. Hadoop made it convenient to process data in very large distributed databases, and also convenient to create them, using the Hadoop Distribu...

3584 sym R (1169 sym/3 pcs)