Publications by Wesley
Binomial Confidence Intervals
This stems from a couple of binomial distribution projects I have been working on recently. It’s widely known that there are many different flavors of confidence intervals for the binomial distribution. The reason for this is that there is a coverage problem with these intervals (see Coverage Probability). A 95% confidence interval isn’...
1161 sym R (1495 sym/1 pcs) 6 img
Bootstrap Confidence Intervals
Here is an example of nonparametric bootstrapping. It’s a powerful technique that is similar to the Jackknife. With the bootstrap, however, the approach uses re-sampling. It’s clearly not as good as parametric approaches but it gets the job done. This can be used in a variety of situations ranging from variance estimation to model selectio...
1587 sym R (1604 sym/1 pcs) 6 img
Distribution of T-Scores
Like most of my post these code snippets derive from various other projects. In this example it shows a simulation of how one can determine if a set of t statistics are distributed properly. This can be useful when sampling known populations (e.g. U.S. census or hospital populations) or populations that will soon be known (e.g. pre-election, ...
1259 sym R (527 sym/1 pcs)
Simulating Random Multivariate Correlated Data (Continuous Variables)
This is a repost of an example that I posted last year but at the time I only had the PDF document (written in ). I’m reposting it directly into WordPress and I’m including the graphs. From time-to-time a researcher needs to develop a script or an application to collect and analyze data. They may also need to test their application under ...
1923 sym R (405 sym/1 pcs) 6 img
Simulating Random Multivariate Correlated Data (Categorical Variables)
This is a repost of the second part of an example that I posted last year but at the time I only had the PDF document (written in ). This is the second example to generate multivariate random associated data. This example shows how to generate ordinal, categorical, data. It is a little more complex than generating continuous data in that the ...
1430 sym R (801 sym/1 pcs) 8 img
Significant P-Values and Overlapping Confidence Intervals
There are all sorts of problems with p-values and confidence intervals and I have no intention (or the time) to cover all those problems right now. However, a big problem is that most people have no idea what p-values really mean. Here is one example of a common problem with p-values and how it relates to confidence intervals. The problem ari...
1751 sym Python (2672 sym/1 pcs) 4 img
Dirichlet Process, Infinite Mixture Models, and Clustering
The Dirichlet process provides a very interesting approach to understand group assignments and models for clustering effects. Often time we encounter the k-means approach. However, it is necessary to have a fixed number of clusters. Often we encounter situations where we don’t know how many fixed clusters we need. Suppose we’re tryin...
2905 sym R (4422 sym/4 pcs) 32 img
Finding the Distribution Parameters
This is a brief description on one way to determine the distribution of given data. There are several ways to accomplish this in R especially if one is trying to determine if the data comes from a normal distribution. Rather than focusing on hypothesis testing and determining if a distribution is actually the said distribution this example shows ...
962 sym R (1062 sym/1 pcs) 4 img
Simulating the Gambler’s Ruin
The gambler’s ruin problem is one where a player has a probability p of winning and probability q of losing. For example let’s take a skill game where the player x can beat player y with probability 0.6 by getting closer to target. The game play begins with player x being allotted 5 points and player y allotted 10 points. After each rou...
1579 sym R (1383 sym/1 pcs) 8 img
Amazon AWS Summit 2013
I was fortunate enough to have been able to attend the Amazon AWS Summit in NYC and to listen to Werner Vogels give the keynote. I will share a few of my thoughts on the AWS 2013 Summit and some of my take-aways. I attended sessions that focused on two products in particular: Redshift and DynamoDB. Amazon AWS seems to be a good option for p...
3179 sym 2 img