Publications by Daniel Oehm

Probability of Selecting Matched Pairs and the Hypergeometric Distribution

08.08.2018

The problem Consider a case where we have a bag of marbles of size . The bag consists of black marbles and white marbles. In this example meaning there is less black marbles than white. Each black marble has a number between 1 and is paired with a white marble which are labeled between 1 and . From the bag of marbles we take a sample of size ...

7778 sym R (3856 sym/25 pcs) 146 img

Cribbage: Optimal Hand (part 1)

08.09.2018

Cribbage is one of my favourite card games and have been playing it ever since I could count. It is a unique card game, in fact it is often said there are 3 types of card games, trick style games, rummy style games and cribbage! What I aim to do in a series of posts is take a data science approach to the game. I will simulate the game and eventua...

10192 sym R (11455 sym/27 pcs) 14 img

Simple Parallel Processing in R

15.09.2018

I recently purchased a new laptop with an Intel i7-8750 6 core CPU with multi-threading meaning I have 12 logical processes at my disposal. Seemed like a good opportunity to try out some parallel processing packages in R. There are a few packages in R for the job with the most popular being parallel, doParallel and foreach package. First we need ...

3731 sym R (1840 sym/22 pcs) 2 img

Bayesian Network Example with the bnlearn Package

30.09.2018

Bayesian Networks are probabilistic graphical models and they have some neat features which make them very useful for many problems. They are structured in a way which allows you to calculate the conditional probability of an event given the evidence. The graphical representation makes it easy to understand the relationships between the variables...

9008 sym R (6841 sym/24 pcs) 10 img

Hidden Markov Model example in r with the depmixS4 package

06.11.2018

Recently I developed a solution using a Hidden Markov Model and was quickly asked to explain myself. What are they and why do they work so well? I can answer the first part, the second we just have to take for granted. HMM's are for modelling sequences of data whether they are derived from continuous or discrete probability distributions. They ar...

6702 sym R (5362 sym/8 pcs) 10 img

Liar’s Dice in R

23.11.2018

I have been playing Red Dead Redemption 2, immersing myself in the Old West as I did with the first game. It’s an incredibly impressive game and there are many side activities that can keep you entertained in the world such as playing Poker in the saloon, Five Finger Fillet and Domino’s. I was disappointed to find out that Liars Dice is not i...

10651 sym R (3431 sym/17 pcs) 22 img

Q-learning example with Liar’s Dice in R

20.12.2018

In my last post I coded Liar’s Dice in R and some brainless bots to play against. I build on that post by using Q-learning to train an agent to play Liar’s Dice well. Spoiler alert: The brainless bots aren’t actually that brainless! More on that later. Note – The full code is too long to share here so you can find it on Github. I’ll onl...

10811 sym R (4420 sym/16 pcs) 54 img

Generating Synthetic Data Sets with ‘synthpop’ in R

12.01.2019

Synthpop – A great music genre and an aptly named R package for synthesising population data. I recently came across this package while looking for an easy way to synthesise unit record data sets for public release. The goal is to generate a data set which contains no real units, therefore safe for public release and retains the structure of th...

8427 sym R (16190 sym/15 pcs) 42 img

Synthesising Multiple Linked Data Sets and Sequences in R

03.02.2019

In my last post I looked at generating synthetic data sets with the ‘synthpop’ package, some of the challenges and neat things the package can do. It is simple to use which is great when you have a single data set with independent features. This post will build on the last post by tackling other complications when attempting to synthesise dat...

10274 sym R (11865 sym/7 pcs) 18 img

The Most Amount of Rain over a 10 Day Period on Record

14.02.2019

Townsville, Qld, has been inundated with torrential rain and has broken the record of the largest rainfall over a 10 day period. It has been devastating for the farmers and residents of Townsville. I looked at Townsville’s weather data to understand how significant this event was and if there have been comparable events in the past. Data from �...

4063 sym R (6104 sym/10 pcs) 12 img