Publications by by : Chris at Savvy Analytics
Monty Hall Simulation
Goal : The “Monty Hall Problem” is a famous probability puzzle named after the host of the american game show “Let’s Make a Deal” (original run 1963-1976). The main premise is this: “Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, s...
3157 sym Python (2965 sym/5 pcs) 3 img 2 tbl
Visualizing Regression Metrics
Purpose : As analysts, we often use regression analytics as a descriptive tool to explain the relationship between two or more variables. However there can be a disconnect between reviewing the regression metrics and intuitively understanding them. To help bolster that intuitive understanding, I will create a series of visualizations and then...
5555 sym Python (9078 sym/6 pcs) 3 img 2 tbl
Central Limit Theorem Simulations
Background In a recent presentation with some customers, they asked why we made a lot of assumptions about data that had to do with it being normally distributed. They made the correct observation that not all data is normally distributed. The Central Limit Theorem helps us overcome this problem. It says that even if the original data is not no...
2097 sym Python (2642 sym/3 pcs) 3 img
Diversified Portfolios Versus The S&P500
Project Goal Traditional investing advise suggests that diversifying your long term growth portfolio across stocks, bonds and international markets will lead to either favorable returns, favorable risk or both. Let’s take some simple portfolios and see if we can prove that theory with historical data. # Environment and Functions Section Q...
4096 sym Python (15045 sym/12 pcs) 6 img
PGA Tour Driving Distance Versus Success
Project Goal : Fans of the PGA Tour and golf in general know that driving distance has increased significantly over the past few decades. One of my friends, David and I were discussing this development and we both commented that everybody seems to be a long hitter these days and we had a few questions that we hope the statistics can answer: W...
4548 sym Python (17365 sym/9 pcs) 7 img 1 tbl
Mining Data From PDF Files with R
Background : There is often useful data that is only available via a text or PDF report. This can be publicly available data on the internet or data from legacy systems that produce printable reports but do not allow access to the underlying data. Goal : Rather than retype this data, let’s build a pipeline to get the data into a usable format...
3965 sym R (6384 sym/5 pcs) 2 img 2 tbl
Finding Hidden Predictors
Overview Machine learning for practical prediction and analysis Machine learning isn't just for serving up Ads on the internet or predicting your viewing preferences on Netflix. It can be a powerful tool for common business tasks. In this example we will see how it can be applied to rank job applicants Demonstration We created some data for new h...
3605 sym 8 img
A Monte Carlo Simulation of Historical S&P500 Returns
Background : If you’ve looked into investing at all you’ve probably heard about the historical returns of the S&P 500. Reliable news sources have reported them at anywhere from 8% to 11%. That’s quite a range so what’s the real answer? Goal : Let’s calculate the historical returns for ourselves. We will look at a simple average return...
5917 sym R (11260 sym/9 pcs) 11 img 3 tbl
Maximize Your US National Parks Pass
Background : Recently a my daughter bought a National Parks pass. This is quite a value, allowing her and guests to visit any park in the US for $80/year versus the normal $25/day. Challenge : Imagine that you were able to work remotely and therefore live about anywhere. Where would you locate yourself so that you could enjoy the most parks dur...
2574 sym R (6810 sym/6 pcs) 4 img 1 tbl
NBA Points Gained Shooting 3 Pointers
At the Covid induced pause in the 2019-2020 NBA season, my favorite NBA team, the Indiana Pacers, seemed to loose a lot of their games to other teams who were better at the Three Point Shot. Exactly how bad are the Pacers compared to other teams? While there are any number of published NBA stats we could turn to, I wanted to get creative and anal...
7169 sym R (15795 sym/9 pcs) 5 img