Publications by The Clerk
Golf Scramble Simulation in R
Golf Scramble Simulation Golf Scramble SimulationThis is a simulation of a standard best-ball golf scramble. Conventional wisdom has it that the best golfer (A) should hit last, the idea being that one of the lesser golfers may have a decent shot already so the best golfer can take a risky shot. This simulation suggests that the worst golfer shou...
1256 sym R (3620 sym/7 pcs) 4 img
Insight from FIFA 14’s Player Attributes (Using R)
FIFA 14 is a video game by EA Sports that mimics the experience of managing and playing for a soccer team. The game uses the likenesses and attributes of real players and this is part of the appeal. Although I rarely play video games, I am an avid soccer player and got curious about what could be learned by taking a closer look at the game-assign...
5390 sym 20 img 2 tbl
NFL Player Tree (Using R)
NFL Player TreeInspired by the soccer player tree in an earlier post, I pulled some recent National Football League player attributes from EA Sports’s Madden 25 (players ranked at least 95):I used R to do the heavy statistical lifting and used the package ggplot2 to get the nice tree plot.Without getting into all of the nuances that I did with ...
1145 sym 2 img
The Meta- State of the Union 2014
Max Ghenis has a nice text analysis of Martin Luther King Jr.’s famous “I Have a Dream” speech. You can read about his methodology here: Statistics meets rhetoric: A text analysis of “I Have a Dream” in R.This got me wondering about the President Obama’s 2014 State of the Union speech. Using his template, you can see bel...
2957 sym 6 img
Assign n Email Addresses to x Cells, Intrinsically
Assign n Email Addresses to x Cells, Intrinsically Assign n Email Addresses to x Cells, IntrinsicallySample Use Case:Marketing requests that an email address list be divided randomly into a given number of cells so that each cell would receive a different version of copy. Below is a technique that takes n email addresses and pseudo-randomly assi...
1641 sym R (1041 sym/7 pcs)
Assign n Email Addresses to x Cells, Intrinsically (Part II)
Part I showed the concept and general technique of a method of assigning n email addresses to x cells pseudo-randomly, without the need for maintaining a log of each assignment.The earlier post considered the basic case of each cell being assigned approximately the same quantity of email addresses. In practice, cell sizes often vary. Below is a ...
2298 sym R (1085 sym/5 pcs)
R is short for SSIS
R is Short for SSIS Data scientists often identify a need to join data from different, unlinked servers. One standard tool for accomplishing this is an SSIS package to consolidate the data onto one of the servers. For the analyst who wants to keep everything in one file for simplicity and repeatabililty, there is another option: the RODBC...
1318 sym R (800 sym/5 pcs)
A Look at Random Seeds in R… Or: “85, why can’t you be more like 548?”
Have you ever wondered whether the set.seed() function in R has any quirkiness? This analysis was inspired by a Stack Overflow posting by Wolfgang and I incorporate some of his code.For each seed (1-1000, for this analysis), I took the mean and standard deviation of the first 1,000 random numbers. Then I get the percent of the density...
2283 sym 4 img 1 tbl
FIFA 15 Analysis with R
Several months ago, I used R to analyze professional soccer players based on their attributes from the video game, FIFA14. Now that FIFA15 is upon us, let’s take a similar look.FIFA 15 is a video game by EA Sports that mimics the experience of managing and playing for a soccer team. The game uses the likenesses and attributes of real players a...
5705 sym R (6210 sym/1 pcs) 18 img 2 tbl
First Day of the Month, Using R
Future-proofing is an important concept when designing automated reports. One thing that can get out of hand over time is when you accumulate so many periods of data that your charts start to look overcrowded. You can solve for this by limiting the number of periods to, say, 13 (I like 13 for monthly data, because you get a full year of data, plu...
1265 sym 2 img