Publications by Chuck Powell
Writing better R functions part three – April 13, 2018
In my last post I worked on two functions that took pairs of variables from a dataset and produced some nice useful ggplot plots from them. We started with the simplest case, plotting counts of how two variables cross-tabulate. Then we worked our way up to being able to automate the process of plotting lots of pairings of variables from the same ...
5810 sym R (8659 sym/22 pcs) 28 img
Writing better R functions part four – April 17, 2018
In my last four posts I have been working at automating a process, that I am likely to repeat many times, by turning it into a proper R function. In my last post I overcame some real performance problems, combined two sub-functions into one and generally had a workable piece of code. In the final post in this series today I’ll accomplish two mo...
10709 sym R (14164 sym/37 pcs) 60 img
Announcing CGPfunctions 0.3 – April 20, 2018
As I continue to learn and grow in using R I have been trying to develop the habit of being more formal in documenting and maintaining the various functions and pieces of code I write. It’s not that I think they are major inventions but they are useful and I like having them stored in one place that I can keep track of. So I started building th...
2651 sym Python (174 sym/1 pcs) 2 img
CHAID and R – When you need explanation – May 15, 2018
A modern data scientist using R has access to an almost bewildering number of tools, libraries and algorithms to analyze the data. In my next two posts I’m going to focus on an in depth visit with CHAID (Chi-square automatic interaction detection). The title should give you a hint for why I think CHAID is a good “tool” for your analytical t...
17926 sym R (52115 sym/28 pcs) 28 img 2 tbl
Slopegraphs and R – A pleasant diversion – May 26, 2018
I try to at least scan the R-bloggers feed everyday. Not every article is of interest to me, but I often have one of two different reactions to at least one article. Sometimes it is an “ah ha” moment because the article is right on point for a problem I have now or have had in the past and the article provides a (better) solution. Other times...
11778 sym R (20690 sym/14 pcs) 16 img 1 tbl
CHAID and caret – a good combo – June 6, 2018
In an earlier post I focused on an in depth visit with CHAID (Chi-square automatic interaction detection). There are lots of tools that can help you predict an outcome, or classify, but CHAID is especially good at helping you explain to any audience how the model arrives at it’s prediction or classification. It’s also incredibly robust from a...
16881 sym R (50007 sym/25 pcs) 20 img 1 tbl
Announcing another slopegraph plotting function – June 14, 2018
A couple of weeks ago I wrote a blog post about slopegraphs. There was some polite interest and it was a good chance to practice my functional programming skills so I decided to see if I could make a decent R function from what I had learned. It’s in pretty good shape so I just pushed an update to CRAN (it will take awhile to process). You c...
2648 sym Python (174 sym/1 pcs) 2 img
Creating Slopegraphs with R
Presenting data results in the most informative and compelling manner is part of the role of the data scientist. It's all well and good to master the arcana of some algorithm, to manipulate and master the numbers and bend them to your will to produce a “solution” that is both accurate and useful. But, those activities are typically in pursui...
8905 sym R (3843 sym/9 pcs) 16 img
CHAID v ranger v xgboost – a comparison – July 27, 2018
In an earlier post, I focused on an in depth visit with CHAID (Chi-square automatic interaction detection). Quoting myself, I said “As the name implies it is fundamentally based on the venerable Chi-square test – and while not the most powerful (in terms of detecting the smallest possible differences) or the fastest, it really is easy to mana...
22972 sym R (33340 sym/35 pcs) 16 img 2 tbl
CHAID vs. ranger vs. xgboost — a comparison
In an earlier post, I focused on an in-depth visit with CHAID (Chi-square automatic interaction detection). Quoting myself, I said “As the name implies it is fundamentally based on the venerable Chi-square test – and while not the most powerful (in terms of detecting the smallest possible differences) or the fastest, it really is easy to mana...
23287 sym R (33966 sym/35 pcs) 14 img 1 tbl