Publications by Wesley

Binomial Confidence Intervals

22.01.2013

This stems from a couple of binomial distribution projects I have been working on recently.  It’s widely known that there are many different flavors of confidence intervals for the binomial distribution.  The reason for this is that there is a coverage problem with these intervals (see Coverage Probability).  A 95% confidence interval isn’...

1161 sym R (1495 sym/1 pcs) 6 img

Bootstrap Confidence Intervals

01.02.2013

Here is an example of nonparametric bootstrapping.  It’s a powerful technique that is similar to the Jackknife. With the bootstrap, however, the approach uses re-sampling. It’s clearly not as good as parametric approaches but it gets the job done. This can be used in a variety of situations ranging from variance estimation to model selectio...

1587 sym R (1604 sym/1 pcs) 6 img

Distribution of T-Scores

02.03.2013

Like most of my post these code snippets derive from various other projects.  In this example it shows a simulation of how one can determine if a set of t statistics are distributed properly.  This can be useful when sampling known populations (e.g. U.S. census or hospital populations) or populations that will soon be known (e.g. pre-election, ...

1259 sym R (527 sym/1 pcs)

Simulating Random Multivariate Correlated Data (Continuous Variables)

11.03.2013

This is a repost of an example that I posted last year but at the time I only had the PDF document (written in ).  I’m reposting it directly into WordPress and I’m including the graphs. From time-to-time a researcher needs to develop a script or an application to collect and analyze data. They may also need to test their application under ...

1923 sym R (405 sym/1 pcs) 6 img

Simulating Random Multivariate Correlated Data (Categorical Variables)

11.03.2013

This is a repost of the second part of an example that I posted last year but at the time I only had the PDF document (written in ). This is the second example to generate multivariate random associated data. This example shows how to generate ordinal, categorical, data. It is a little more complex than generating continuous data in that the ...

1430 sym R (801 sym/1 pcs) 8 img

Significant P-Values and Overlapping Confidence Intervals

25.03.2013

There are all sorts of problems with p-values and confidence intervals and I have no intention (or the time) to cover all those problems right now.  However, a big problem is that most people have no idea what p-values really mean. Here is one example of a common problem with p-values and how it relates to confidence intervals.  The problem ari...

1751 sym Python (2672 sym/1 pcs) 4 img

Dirichlet Process, Infinite Mixture Models, and Clustering

07.04.2013

The Dirichlet process provides a very interesting approach to understand group assignments and models for clustering effects.   Often time we encounter the k-means approach.  However, it is necessary to have a fixed number of clusters.  Often we encounter situations where we don’t know how many fixed clusters we need.  Suppose we’re tryin...

2905 sym R (4422 sym/4 pcs) 32 img

Finding the Distribution Parameters

09.04.2013

This is a brief description on one way to determine the distribution of given data. There are several ways to accomplish this in R especially if one is trying to determine if the data comes from a normal distribution. Rather than focusing on hypothesis testing and determining if a distribution is actually the said distribution this example shows ...

962 sym R (1062 sym/1 pcs) 4 img

Simulating the Gambler’s Ruin

14.04.2013

The gambler’s ruin problem is one where a player has a probability p of winning  and probability q of losing. For example let’s take a skill game where the player x can beat player y with probability 0.6 by getting closer to target. The game play begins with player x being allotted 5 points and player y allotted 10 points. After each rou...

1579 sym R (1383 sym/1 pcs) 8 img

Amazon AWS Summit 2013

18.04.2013

I was fortunate enough to have been able to attend the Amazon AWS Summit in NYC and to listen to Werner Vogels give the keynote.  I will share a few of my thoughts on the AWS 2013 Summit and some of my take-aways.  I attended sessions that focused on two products in particular: Redshift and DynamoDB.  Amazon AWS seems to be a good option for p...

3179 sym 2 img