Publications by Ron Pearson (aka TheNoodleDoodler)

Data Science, Data Analysis, R and Python

15.12.2012

The October 2012 issue of Harvard Business Review prominently features the words “Getting Control of Big Data” on the cover, and the magazine includes these three related articles:“Big Data: The Management Revolution,” by Andrew McAfee and Erik Brynjolfsson, pages 61 – 68;“Data Scientist: The Sexiest Job of the 21st Century,” by Tho...

17171 sym

Finding outliers in numerical data

16.02.2013

One of the topics emphasized in Exploring Data in Engineering, the Sciences and Medicine is the damage outliers can do to traditional data characterizations.  Consequently, one of the procedures to be included in the ExploringData package is FindOutliers, described in this post.  Given a vector of numeric values, this procedure supports four di...

11693 sym 4 img

Classification Tree Models

13.04.2013

On March 26, I attended the Connecticut R Meetup in New Haven, which featured a talk by Illya Mowerman on decision trees in R.  I have gone to these Meetups before, and I have always found them to be interesting and informative.  Attendees range from those who are just starting to explore R to those who have multiple CRAN packages to their cred...

10008 sym 2 img

Assessing the precision of classification tree model predictions

06.08.2013

My last post focused on the use of the ctree procedure in the R package party to build classification tree models.  These models map each record in a dataset into one of M mutually exclusive groups, which are characterized by their average response.  For responses coded as 0 or 1, this average may be regarded as an estimate of the probability t...

10438 sym 6 img

A question of model uncertainty

09.03.2014

It has been several months since my last post on classification tree models, because two things have been consuming all of my spare time.  The first is that I taught a night class for the University of Connecticut’s Graduate School of Business, introducing R to students with little or no prior exposure to either R or programming.  My hope is ...

13468 sym 8 img