Publications by Chuck Powell

Writing better R functions part three – April 13, 2018

12.04.2018

In my last post I worked on two functions that took pairs of variables from a dataset and produced some nice useful ggplot plots from them. We started with the simplest case, plotting counts of how two variables cross-tabulate. Then we worked our way up to being able to automate the process of plotting lots of pairings of variables from the same ...

5810 sym R (8659 sym/22 pcs) 28 img

Writing better R functions part four – April 17, 2018

16.04.2018

In my last four posts I have been working at automating a process, that I am likely to repeat many times, by turning it into a proper R function. In my last post I overcame some real performance problems, combined two sub-functions into one and generally had a workable piece of code. In the final post in this series today I’ll accomplish two mo...

10709 sym R (14164 sym/37 pcs) 60 img

Announcing CGPfunctions 0.3 – April 20, 2018

19.04.2018

As I continue to learn and grow in using R I have been trying to develop the habit of being more formal in documenting and maintaining the various functions and pieces of code I write. It’s not that I think they are major inventions but they are useful and I like having them stored in one place that I can keep track of. So I started building th...

2651 sym Python (174 sym/1 pcs) 2 img

CHAID and R – When you need explanation – May 15, 2018

14.05.2018

A modern data scientist using R has access to an almost bewildering number of tools, libraries and algorithms to analyze the data. In my next two posts I’m going to focus on an in depth visit with CHAID (Chi-square automatic interaction detection). The title should give you a hint for why I think CHAID is a good “tool” for your analytical t...

17926 sym R (52115 sym/28 pcs) 28 img 2 tbl

Slopegraphs and R – A pleasant diversion – May 26, 2018

24.05.2018

I try to at least scan the R-bloggers feed everyday. Not every article is of interest to me, but I often have one of two different reactions to at least one article. Sometimes it is an “ah ha” moment because the article is right on point for a problem I have now or have had in the past and the article provides a (better) solution. Other times...

11778 sym R (20690 sym/14 pcs) 16 img 1 tbl

CHAID and caret – a good combo – June 6, 2018

05.06.2018

In an earlier post I focused on an in depth visit with CHAID (Chi-square automatic interaction detection). There are lots of tools that can help you predict an outcome, or classify, but CHAID is especially good at helping you explain to any audience how the model arrives at it’s prediction or classification. It’s also incredibly robust from a...

16881 sym R (50007 sym/25 pcs) 20 img 1 tbl

Announcing another slopegraph plotting function – June 14, 2018

13.06.2018

A couple of weeks ago I wrote a blog post about slopegraphs. There was some polite interest and it was a good chance to practice my functional programming skills so I decided to see if I could make a decent R function from what I had learned. It’s in pretty good shape so I just pushed an update to CRAN (it will take awhile to process). You c...

2648 sym Python (174 sym/1 pcs) 2 img

Creating Slopegraphs with R

22.06.2018

Presenting data results in the most informative and compelling manner is part of the role of the data scientist. It's all well and good to master the arcana of some algorithm, to manipulate and master the numbers and bend them to your will to produce a “solution” that is both accurate and useful. But, those activities are typically in pursui...

8905 sym R (3843 sym/9 pcs) 16 img

CHAID v ranger v xgboost – a comparison – July 27, 2018

26.07.2018

In an earlier post, I focused on an in depth visit with CHAID (Chi-square automatic interaction detection). Quoting myself, I said “As the name implies it is fundamentally based on the venerable Chi-square test – and while not the most powerful (in terms of detecting the smallest possible differences) or the fastest, it really is easy to mana...

22972 sym R (33340 sym/35 pcs) 16 img 2 tbl

CHAID vs. ranger vs. xgboost — a comparison

29.07.2018

In an earlier post, I focused on an in-depth visit with CHAID (Chi-square automatic interaction detection). Quoting myself, I said “As the name implies it is fundamentally based on the venerable Chi-square test – and while not the most powerful (in terms of detecting the smallest possible differences) or the fastest, it really is easy to mana...

23287 sym R (33966 sym/35 pcs) 14 img 1 tbl